dataframe – Page 4 – Tarik Billa

Change stringsAsFactors settings for data.frame

January 7, 2024 by Tarik

Rename a single pandas DataFrame column without knowing column name

January 6, 2024 by Tarik

Should work: drugInfo.rename(columns = {list(drugInfo)[1]: ‘col_1_new_name’}, inplace = True) Example: In [18]: df = pd.DataFrame({‘a’:randn(5), ‘b’:randn(5), ‘c’:randn(5)}) df Out[18]: a b c 0 -1.429509 -0.652116 0.515545 1 0.563148 -0.536554 -1.316155 2 1.310768 -3.041681 -0.704776 3 -1.403204 1.083727 -0.117787 4 -0.040952 0.108155 -0.092292 In [19]: df.rename(columns={list(df)[1]:’col1_new_name’}, inplace=True) df Out[19]: a col1_new_name c 0 -1.429509 -0.652116 0.515545 … Read more

How to re-order the columns based on another dataframe with the same columns but different order

January 6, 2024 by Tarik

Try this: df2 = df2[df1.columns] Demo: In [1]: df1 = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list(‘abcd’)) In [2]: df2 = pd.DataFrame(np.random.randint(0, 10, (5,4)), columns=list(‘badc’)) In [3]: df1 Out[3]: a b c d 0 8 3 9 6 1 0 6 4 7 2 7 2 0 7 3 0 5 1 8 4 6 2 5 4 … Read more

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance

January 6, 2024 by Tarik

Aware that this might be a reply that some will find highly controversial, I’m still posting my opinion here… Proposed answer: Ignore the warning. If the user thinks/observes that the code suffers from poor performance, it’s the user’s responsibility to fix it, not the module’s responsibility to propose code refactoring steps. Rationale for this harsh … Read more

Create multiindex from existing dataframe

January 5, 2024 by Tarik

You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns. df.groupby([‘user_id’, ‘account_num’, ‘dates’]).sales.sum().to_frame() You should also be able to simply do this: df.set_index([‘user_id’, ‘account_num’, ‘dates’]) Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num … Read more

How to surface plot/3d plot from dataframe

January 5, 2024 by Tarik

.plot_surface() takes 2D arrays as inputs, not 1D DataFrame columns. This has been explained quite well here, along with the below code that illustrates how one could arrive at the required format using DataFrame input. Reproduced below with minor modifications like additional comments. Alternatively, however, there is .plot_trisurf() which uses 1D inputs. I’ve added an … Read more

Pandas dataframe – running sum with reset

January 4, 2024 by Tarik

You can use 2 times cumsum(): # reset val desired_col #0 0 1 1 #1 0 5 6 #2 0 4 10 #3 1 2 2 #4 1 -1 -1 #5 0 6 5 #6 0 4 9 #7 1 2 2 df[‘cumsum’] = df[‘reset’].cumsum() #cumulative sums of groups to column des df[‘des’]= df.groupby([‘cumsum’])[‘val’].cumsum() print … Read more

How to read multiple json files into pandas dataframe?

January 4, 2024 by Tarik

Change the last line to: temp = temp.append(data, ignore_index = True) The reason we have to do this is because the append doesn’t happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation. Edit: Since writing this answer I … Read more