dataframe
Rename a single pandas DataFrame column without knowing column name
Should work: drugInfo.rename(columns = {list(drugInfo)[1]: ‘col_1_new_name’}, inplace = True) Example: In [18]: df = pd.DataFrame({‘a’:randn(5), ‘b’:randn(5), ‘c’:randn(5)}) df Out[18]: a b c 0 -1.429509 -0.652116 0.515545 1 0.563148 -0.536554 -1.316155 2 1.310768 -3.041681 -0.704776 3 -1.403204 1.083727 -0.117787 4 -0.040952 0.108155 -0.092292 In [19]: df.rename(columns={list(df)[1]:’col1_new_name’}, inplace=True) df Out[19]: a col1_new_name c 0 -1.429509 -0.652116 0.515545 … Read more
How to re-order the columns based on another dataframe with the same columns but different order
Try this: df2 = df2[df1.columns] Demo: In [1]: df1 = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list(‘abcd’)) In [2]: df2 = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list(‘badc’)) In [3]: df1 Out[3]: a b c d 0 8 3 9 6 1 0 6 4 7 2 7 2 0 7 3 0 5 1 8 4 6 2 5 4 … Read more
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance
Aware that this might be a reply that some will find highly controversial, I’m still posting my opinion here… Proposed answer: Ignore the warning. If the user thinks/observes that the code suffers from poor performance, it’s the user’s responsibility to fix it, not the module’s responsibility to propose code refactoring steps. Rationale for this harsh … Read more
Create multiindex from existing dataframe
You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns. df.groupby([‘user_id’, ‘account_num’, ‘dates’]).sales.sum().to_frame() You should also be able to simply do this: df.set_index([‘user_id’, ‘account_num’, ‘dates’]) Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num … Read more
How to surface plot/3d plot from dataframe
.plot_surface() takes 2D arrays as inputs, not 1D DataFrame columns. This has been explained quite well here, along with the below code that illustrates how one could arrive at the required format using DataFrame input. Reproduced below with minor modifications like additional comments. Alternatively, however, there is .plot_trisurf() which uses 1D inputs. I’ve added an … Read more
Pandas dataframe – running sum with reset
You can use 2 times cumsum(): # reset val desired_col #0 0 1 1 #1 0 5 6 #2 0 4 10 #3 1 2 2 #4 1 -1 -1 #5 0 6 5 #6 0 4 9 #7 1 2 2 df[‘cumsum’] = df[‘reset’].cumsum() #cumulative sums of groups to column des df[‘des’]= df.groupby([‘cumsum’])[‘val’].cumsum() print … Read more
How to read multiple json files into pandas dataframe?
Change the last line to: temp = temp.append(data, ignore_index = True) The reason we have to do this is because the append doesn’t happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation. Edit: Since writing this answer I … Read more
Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)
agg with a dict of functions Create a dict of functions and pass it to agg. You’ll also need as_index=False to prevent the group columns from becoming the index in your output. f = {‘NET_AMT’: ‘sum’, ‘QTY_SOLD’: ‘sum’, ‘UPC_DSC’: ‘first’} df.groupby([‘month’, ‘UPC_ID’], as_index=False).agg(f) month UPC_ID UPC_DSC NET_AMT QTY_SOLD 0 2017.02 111 desc1 10 2 1 … Read more
Aggregation over Partition in pandas
You can use pandas transform() method for within group aggregations like “OVER(partition by …)” in SQL: import pandas as pd import numpy as np #create dataframe with sample data df = pd.DataFrame({‘group’:[‘A’,’A’,’A’,’B’,’B’,’B’],’value’:[1,2,3,4,5,6]}) #calculate AVG(value) OVER (PARTITION BY group) df[‘mean_value’] = df.groupby(‘group’).value.transform(np.mean) df: group value mean_value A 1 2 A 2 2 A 3 2 B … Read more