1.聚合运算
(1)使用内置的聚合运算函数进行计算
1>内置的聚合运算函数
sum(),mean(),max(),min(),size(),describe()...等等
2>应用聚合运算函数进行计算
import numpy as npimport pandas as pd#创建df对象dict_data = { 'key1':['a','b','c','d','a','b','c','d'], 'key2':['one','two','three','one','two','three','one','two'], 'data1':np.random.randint(1,10,8), 'data2':np.random.randint(1,10,8)} df = pd.DataFrame(dict_data)print(df)''' data1 data2 key1 key20 3 4 a one1 7 9 b two2 5 7 c three3 3 4 d one4 8 7 a two5 4 7 b three6 8 9 c one7 4 4 d two'''#根据key1分组,进行sum()运算df = df.groupby('key1').sum()print(df)'''key1 a 12 10b 8 5c 8 11d 16 13'''#内置的聚合函数print(df.groupby('key1').sum())print('*'*50)print(df.groupby('key1').max())print('*'*50)print(df.groupby('key1').min())print('*'*50)print(df.groupby('key1').mean())print('*'*50)print(df.groupby('key1').size())print('*'*50)#分组中非Nan数据的数量print(df.groupby('key1').count())print('*'*50)print(df.groupby('key1').describe())
(2)自定义聚合函数进行计算
在使用自定义聚合函数的时候,需要用到一个agg()函数
#自定义聚合函数#最大值-最小值def peak_range(df): #返回数据范围差值 return df.max()**2 - df.min()**2 #agg() 可以将聚合计算的结果祖闯成一个dataframe对象返回 print(df.groupby('key1').agg(peak_range)) #lambdaprint(df.groupby('key1').agg(lambda df:df.max()-df.min()))
(3)应用多个聚合函数,默认列索引为函数名
#应用多个聚合函数,默认列索引为函数名#通过元素重新命名列索引('列索引',函数)print(df.groupby('key1').agg(['sum','std','mean',('range',peak_range)]))''' data1 data2 sum std mean range sum std mean rangekey1 a 10 2.828427 5.0 40 12 2.828427 6.0 48b 10 5.656854 5.0 80 8 1.414214 4.0 16c 6 1.414214 3.0 12 9 0.707107 4.5 9d 15 0.707107 7.5 15 8 2.828427 4.0 32'''
(4)指定每一列使用某个聚合运算函数
#指定每一列使用某个聚合运算函数print(df.groupby('key1').agg({'data1':'mean','data2':'sum'}))''' data1 data2key1 a 5.0 12b 5.0 8c 3.0 9d 7.5 8'''
2.分组运算
(1)进行分组运算,并在运算后的结果列索引前加前缀
加前缀用到add_prefix('前缀')函数
#创建df对象dict_data = { 'key1':['a','b','c','d','a','b','c','d'], 'key2':['one','two','three','one','two','three','one','two'], 'data1':np.random.randint(1,10,8), 'data2':np.random.randint(1,10,8)} df = pd.DataFrame(dict_data)print(df)''' data1 data2 key1 key20 1 5 a one1 9 3 b two2 3 6 c three3 6 9 d one4 8 4 a two5 5 5 b three6 9 6 c one7 4 1 d two'''#按照key1分组,进行sum()运算#在运算结果的列索引前添加前缀k1_sum = df.groupby('key1').sum().add_prefix('sum_')print(k1_sum)''' sum_data1 sum_data2key1 a 9 9b 14 8c 12 12d 10 10'''
(2)进行分组运算,并把原始数据和结果数据合并
#创建df对象dict_data = { 'key1':['a','b','c','d','a','b','c','d'], 'key2':['one','two','three','one','two','three','one','two'], 'data1':np.random.randint(1,10,8), 'data2':np.random.randint(1,10,8)} df = pd.DataFrame(dict_data)print(df)''' data1 data2 key1 key20 1 5 a one1 9 3 b two2 3 6 c three3 6 9 d one4 8 4 a two5 5 5 b three6 9 6 c one7 4 1 d two'''#按照key1分组,进行sum()运算#在运算结果的列索引前添加前缀k1_sum = df.groupby('key1').sum().add_prefix('sum_')print(k1_sum)''' sum_data1 sum_data2key1 a 9 9b 14 8c 12 12d 10 10'''#将运算结果和原始数据拼接到一起#参数1:原始数据#参数2:运算结果数据pd.merge(df,k1_sum,left_on='key1',right_index=True)
(3)使用transform()函数,将计算结果按照原始数据排序成一个DataFrame对象
#创建df对象dict_data = { 'key1':['a','b','c','d','a','b','c','d'], 'key2':['one','two','three','one','two','three','one','two'], 'data1':np.random.randint(1,10,8), 'data2':np.random.randint(1,10,8)} df = pd.DataFrame(dict_data)print(df)''' data1 data2 key1 key20 1 5 a one1 9 3 b two2 3 6 c three3 6 9 d one4 8 4 a two5 5 5 b three6 9 6 c one7 4 1 d two'''#按照key1分组,进行sum()运算#在运算结果的列索引前添加前缀k1_sum = df.groupby('key1').sum().add_prefix('sum_')print(k1_sum)''' sum_data1 sum_data2key1 a 9 9b 14 8c 12 12d 10 10'''#transform() 计算 会将计算的结果按照原始数据的排序组装成一个dataframe对象k1_sum_tf = df.groupby('key1').transform(np.sum).add_prefix('sum_')# print(k1_sum_tf.columns) #把运算结果数据拼接到原始数据后df[k1_sum_tf.columns] = k1_sum_tfprint(df)''' data1 data2 key1 key2 sum_data1 sum_data2 sum_key20 5 4 a one 9 12 onetwo1 3 3 b two 5 12 twothree2 9 2 c three 14 9 threeone3 6 5 d one 11 9 onetwo4 4 8 a two 9 12 onetwo5 2 9 b three 5 12 twothree6 5 7 c one 14 9 threeone7 5 4 d two 11 9 onetwo'''
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。