How to sort a pandas dataFrame by two or more columns?

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same: df.sort_values([‘a’, ‘b’], ascending=[True, False]) You can use the ascending argument of sort: df.sort([‘a’, ‘b’], ascending=[True, False]) For example: In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=[‘a’,’b’]) … Read more

how to get rid of pandas converting large numbers in excel sheet to exponential?

The way scientific notation is applied is controled via pandas’ display options: pd.set_option(‘display.float_format’, ‘{:.2f}’.format) df = pd.DataFrame({‘Traded Value’:[67867869890077.96,78973434444543.44], ‘Deals’:[789797, 789878]}) print(df) Traded Value Deals 0 67867869890077.96 789797 1 78973434444543.44 789878 If this is simply for presentational purposes, you may convert your data to strings while formatting them on a column-by-column basis: df = pd.DataFrame({‘Traded Value’:[67867869890077.96,78973434444543.44], … Read more

python pandas: how to calculate derivative/gradient

pd.Series.diff() only takes the differences. It doesn’t divide by the delta of the index as well. This gets you the answer recv.diff() / recv.index.to_series().diff().dt.total_seconds() 2017-01-20 20:00:00 NaN 2017-01-20 20:05:00 4521.493333 2017-01-20 20:10:00 4533.760000 2017-01-20 20:15:00 4557.493333 2017-01-20 20:20:00 4536.053333 2017-01-20 20:25:00 4567.813333 2017-01-20 20:30:00 4406.160000 2017-01-20 20:35:00 4366.720000 2017-01-20 20:40:00 4407.520000 2017-01-20 20:45:00 4421.173333 Freq: … Read more

Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want? >>> data = pd.DataFrame({‘user_id’ : [‘a1’, ‘a1’, ‘a1’, ‘a2′,’a2′,’a2′,’a3′,’a3′,’a3’], ‘product_id’ : [‘p1′,’p1′,’p2′,’p1′,’p1′,’p1′,’p2′,’p2′,’p3’]}) >>> count_series = data.groupby([‘user_id’, ‘product_id’]).size() >>> count_series user_id product_id a1 p1 2 p2 1 a2 p1 3 a3 p2 2 p3 1 dtype: int64 >>> new_df = count_series.to_frame(name=”size”).reset_index() >>> new_df user_id product_id size 0 a1 p1 2 … Read more

How to get rid of multilevel index after using pivot table pandas?

You need remove only index name, use rename_axis (new in pandas 0.18.0): print (reshaped_df) sale_product_id 1 8 52 312 315 sale_user_id 1 1 1 1 5 1 print (reshaped_df.index.name) sale_user_id print (reshaped_df.rename_axis(None)) sale_product_id 1 8 52 312 315 1 1 1 1 5 1 Another solution working in pandas below 0.18.0: reshaped_df.index.name = None print … Read more

How do I change a single index value in pandas dataframe?

@EdChum’s solution looks good. Here’s one using rename, which would replace all these values in the index. energy.rename(index={‘Republic of Korea’:’South Korea’},inplace=True) Here’s an example >>> example = pd.DataFrame({‘key1’ : [‘a’,’a’,’a’,’b’,’a’,’b’], ‘data1’ : [1,2,2,3,nan,4], ‘data2’ : list(‘abcdef’)}) >>> example.set_index(‘key1’,inplace=True) >>> example data1 data2 key1 a 1.0 a a 2.0 b a 2.0 c b 3.0 d … Read more

tech