pandas – Page 8 – Tarik Billa

Create a day-of-week column

February 14, 2024 by Tarik

Pandas 0.23+ Use pandas.Series.dt.day_name(), since pandas.Timestamp.weekday_name has been deprecated: import pandas as pd df = pd.DataFrame({‘my_dates’:[‘2015-01-01′,’2015-01-02′,’2015-01-03′],’myvals’:[1,2,3]}) df[‘my_dates’] = pd.to_datetime(df[‘my_dates’]) df[‘day_of_week’] = df[‘my_dates’].dt.day_name() Output: my_dates myvals day_of_week 0 2015-01-01 1 Thursday 1 2015-01-02 2 Friday 2 2015-01-03 3 Saturday Pandas 0.18.1+ As user jezrael points out below, dt.weekday_name was added in version 0.18.1 Pandas Docs import … Read more

Filtering multiple items in a multi-index Pandas dataframe

February 14, 2024 by Tarik

You can get_level_values in conjunction with Boolean slicing. In [50]: print df[np.in1d(df.index.get_level_values(1), [‘Lake’, ‘River’, ‘Upland’])] Area NSRCODE PBL_AWI CM Lake 57124.819333 River 1603.906642 LBH Lake 258046.508310 River 44262.807900 The same idea can be expressed in many different ways, such as df[df.index.get_level_values(‘PBL_AWI’).isin([‘Lake’, ‘River’, ‘Upland’])] Note that you have ‘upland’ in your data instead of ‘Upland’

Why use loc in Pandas?

February 13, 2024 by Tarik

Explicit is better than implicit. df[boolean_mask] selects rows where boolean_mask is True, but there is a corner case when you might not want it to: when df has boolean-valued column labels: In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df Out[229]: False True 0 3 1 1 4 2 2 5 3 You might want to use df[[True]] … Read more

How to remove timezone from a Timestamp column in pandas

February 13, 2024 by Tarik

The column must be a datetime dtype, for example after using pd.to_datetime. Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None: testdata[‘time’].dt.tz_localize(None) Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.

How to select rows in a DataFrame between two values

February 12, 2024 by Tarik

Consider Series.between: df = df[df[‘closing_price’].between(99, 101)]

How to sort a pandas dataFrame by two or more columns?

February 2, 2024 by Tarik

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same: df.sort_values([‘a’, ‘b’], ascending=[True, False]) You can use the ascending argument of sort: df.sort([‘a’, ‘b’], ascending=[True, False]) For example: In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=[‘a’,’b’]) … Read more

Whether to use apply vs transform on a group object, to subtract two columns and get mean

January 18, 2024 by Tarik

Two major differences between apply and transform There are two major differences between the transform and apply groupby methods. Input: apply implicitly passes all the columns for each group as a DataFrame to the custom function. while transform passes each column for each group individually as a Series to the custom function. Output: The custom … Read more

Add a string prefix to each value in a pandas string column

January 18, 2024 by Tarik

df[‘col’] = ‘str’ + df[‘col’].astype(str) Example: >>> df = pd.DataFrame({‘col’:[‘a’,0]}) >>> df col 0 a 1 0 >>> df[‘col’] = ‘str’ + df[‘col’].astype(str) >>> df col 0 stra 1 str0

How to sort pandas dataframe by one column

January 10, 2024 by Tarik

Use sort_values to sort the df by a specific column’s values: In [18]: df.sort_values(‘2’) Out[18]: 0 1 2 4 85.6 January 1.0 3 95.5 February 2.0 7 104.8 March 3.0 0 354.7 April 4.0 8 283.5 May 5.0 6 238.7 June 6.0 5 152.0 July 7.0 1 55.4 August 8.0 11 212.7 September 9.0 10 … Read more