data-analysis – Tarik Billa

How to sort a pandas dataFrame by two or more columns?

February 2, 2024 by Tarik

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same: df.sort_values([‘a’, ‘b’], ascending=[True, False]) You can use the ascending argument of sort: df.sort([‘a’, ‘b’], ascending=[True, False]) For example: In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=[‘a’,’b’]) … Read more

how to get rid of pandas converting large numbers in excel sheet to exponential?

December 27, 2023 by Tarik

The way scientific notation is applied is controled via pandas’ display options: pd.set_option(‘display.float_format’, ‘{:.2f}’.format) df = pd.DataFrame({‘Traded Value’:[67867869890077.96,78973434444543.44], ‘Deals’:[789797, 789878]}) print(df) Traded Value Deals 0 67867869890077.96 789797 1 78973434444543.44 789878 If this is simply for presentational purposes, you may convert your data to strings while formatting them on a column-by-column basis: df = pd.DataFrame({‘Traded Value’:[67867869890077.96,78973434444543.44], … Read more

Plot pandas dataframe containing NaNs

December 26, 2023 by Tarik

The reason your not seeing anything is because the default plot style is only a line. But the line gets interupted at NaN’s so only multiple consequtive values will be plotted. And the latter doesnt happen in your case. You need to change the style of plotting, which depends on what you want to see. … Read more

python pandas: how to calculate derivative/gradient

December 24, 2023 by Tarik

pd.Series.diff() only takes the differences. It doesn’t divide by the delta of the index as well. This gets you the answer recv.diff() / recv.index.to_series().diff().dt.total_seconds() 2017-01-20 20:00:00 NaN 2017-01-20 20:05:00 4521.493333 2017-01-20 20:10:00 4533.760000 2017-01-20 20:15:00 4557.493333 2017-01-20 20:20:00 4536.053333 2017-01-20 20:25:00 4567.813333 2017-01-20 20:30:00 4406.160000 2017-01-20 20:35:00 4366.720000 2017-01-20 20:40:00 4407.520000 2017-01-20 20:45:00 4421.173333 Freq: … Read more

R and SPSS difference

September 4, 2023 by Tarik

Group by two columns and count the occurrences of each combination in Pandas

July 25, 2023 by Tarik

Maybe this is what you want? >>> data = pd.DataFrame({‘user_id’ : [‘a1’, ‘a1’, ‘a1’, ‘a2′,’a2′,’a2′,’a3′,’a3′,’a3’], ‘product_id’ : [‘p1′,’p1′,’p2′,’p1′,’p1′,’p1′,’p2′,’p2′,’p3’]}) >>> count_series = data.groupby([‘user_id’, ‘product_id’]).size() >>> count_series user_id product_id a1 p1 2 p2 1 a2 p1 3 a3 p2 2 p3 1 dtype: int64 >>> new_df = count_series.to_frame(name=”size”).reset_index() >>> new_df user_id product_id size 0 a1 p1 2 … Read more

How to get rid of multilevel index after using pivot table pandas?

June 8, 2023 by Tarik

You need remove only index name, use rename_axis (new in pandas 0.18.0): print (reshaped_df) sale_product_id 1 8 52 312 315 sale_user_id 1 1 1 1 5 1 print (reshaped_df.index.name) sale_user_id print (reshaped_df.rename_axis(None)) sale_product_id 1 8 52 312 315 1 1 1 1 5 1 Another solution working in pandas below 0.18.0: reshaped_df.index.name = None print … Read more

How do I change a single index value in pandas dataframe?

January 24, 2023 by Tarik

@EdChum’s solution looks good. Here’s one using rename, which would replace all these values in the index. energy.rename(index={‘Republic of Korea’:’South Korea’},inplace=True) Here’s an example >>> example = pd.DataFrame({‘key1’ : [‘a’,’a’,’a’,’b’,’a’,’b’], ‘data1’ : [1,2,2,3,nan,4], ‘data2’ : list(‘abcdef’)}) >>> example.set_index(‘key1’,inplace=True) >>> example data1 data2 key1 a 1.0 a a 2.0 b a 2.0 c b 3.0 d … Read more

How do I sum values in a column that match a given condition using pandas?

January 22, 2023 by Tarik

The essential idea here is to select the data you want to sum, and then sum them. This selection of data can be done in several different ways, a few of which are shown below. Boolean indexing Arguably the most common way to select the values is to use Boolean indexing. With this method, you … Read more