How to use rolling functions for GroupBy objects

For the Googlers who come upon this old question: Regarding @kekert’s comment on @Garrett’s answer to use the new df.groupby(‘id’)[‘x’].rolling(2).mean() rather than the now-deprecated df.groupby(‘id’)[‘x’].apply(pd.rolling_mean, 2, min_periods=1) curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply … Read more

Create a day-of-week column

Pandas 0.23+ Use pandas.Series.dt.day_name(), since pandas.Timestamp.weekday_name has been deprecated: import pandas as pd df = pd.DataFrame({‘my_dates’:[‘2015-01-01′,’2015-01-02′,’2015-01-03′],’myvals’:[1,2,3]}) df[‘my_dates’] = pd.to_datetime(df[‘my_dates’]) df[‘day_of_week’] = df[‘my_dates’].dt.day_name() Output: my_dates myvals day_of_week 0 2015-01-01 1 Thursday 1 2015-01-02 2 Friday 2 2015-01-03 3 Saturday Pandas 0.18.1+ As user jezrael points out below, dt.weekday_name was added in version 0.18.1 Pandas Docs import … Read more

Filtering multiple items in a multi-index Pandas dataframe

You can get_level_values in conjunction with Boolean slicing. In [50]: print df[np.in1d(df.index.get_level_values(1), [‘Lake’, ‘River’, ‘Upland’])] Area NSRCODE PBL_AWI CM Lake 57124.819333 River 1603.906642 LBH Lake 258046.508310 River 44262.807900 The same idea can be expressed in many different ways, such as df[df.index.get_level_values(‘PBL_AWI’).isin([‘Lake’, ‘River’, ‘Upland’])] Note that you have ‘upland’ in your data instead of ‘Upland’

Why use loc in Pandas?

Explicit is better than implicit. df[boolean_mask] selects rows where boolean_mask is True, but there is a corner case when you might not want it to: when df has boolean-valued column labels: In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df Out[229]: False True 0 3 1 1 4 2 2 5 3 You might want to use df[[True]] … Read more

How to remove timezone from a Timestamp column in pandas

The column must be a datetime dtype, for example after using pd.to_datetime. Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None: testdata[‘time’].dt.tz_localize(None) Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.

How to sort a pandas dataFrame by two or more columns?

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same: df.sort_values([‘a’, ‘b’], ascending=[True, False]) You can use the ascending argument of sort: df.sort([‘a’, ‘b’], ascending=[True, False]) For example: In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=[‘a’,’b’]) … Read more

Whether to use apply vs transform on a group object, to subtract two columns and get mean

Two major differences between apply and transform There are two major differences between the transform and apply groupby methods. Input: apply implicitly passes all the columns for each group as a DataFrame to the custom function. while transform passes each column for each group individually as a Series to the custom function. Output: The custom … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)