summing the number of occurrences per day pandas

If your timestamp index is a DatetimeIndex: import io import pandas as pd content=””‘\ timestamp score 2013-06-29 00:52:28+00:00 -0.420070 2013-06-29 00:51:53+00:00 -0.445720 2013-06-28 16:40:43+00:00 0.508161 2013-06-28 15:10:30+00:00 0.921474 2013-06-28 15:10:17+00:00 0.876710 ”’ df = pd.read_table(io.BytesIO(content), sep=’\s{2,}’, parse_dates=[0], index_col=[0]) print(df) so df looks like this: score timestamp 2013-06-29 00:52:28 -0.420070 2013-06-29 00:51:53 -0.445720 2013-06-28 16:40:43 0.508161 … Read more

Conditionally format Python pandas cell

From the style docs: You can apply conditional formatting, the visual styling of a DataFrame depending on the data within, by using the DataFrame.style property. import pandas as pd df = pd.DataFrame([[2,3,1], [3,2,2], [2,4,4]], columns=list(“ABC”)) df.style.apply(lambda x: [“background: red” if v > x.iloc[0] else “” for v in x], axis = 1) Edit: to format … Read more

What are levels in a pandas DataFrame?

I stumbled across this question while analyzing the answer to my own question, but I didn’t find the John’s answer satisfying enough. After a few experiments though I think I understood the levels and decided to share: Short answer: Levels are parts of the index or column. Long answer: I think this multi-column DataFrame.groupby example … Read more

Boxplot with pandas groupby multiindex, for specified sublevels from multiindex

this code: data[‘2013-08-17′].boxplot(by=’SPECIES’) Will not work, as boxplot is a function for a DataFrame and not a Series. While in Pandas > 0.18.1 the boxplot function has the argument columns which defines from what column the data is taken from. So data.boxplot(column=’2013-08-17′,by=’SPECIES’) should return the desired result. An example with the Iris dataset: import pandas … Read more

Confusion about pandas copy of slice of dataframe warning

izmir = pd.read_excel(filepath) izmir_lim = izmir[[‘Gender’,’Age’,’MC_OLD_M>=60′,’MC_OLD_F>=60′, ‘MC_OLD_M>18′,’MC_OLD_F>18′,’MC_OLD_18>M>5’, ‘MC_OLD_18>F>5′,’MC_OLD_M_Child<5′,’MC_OLD_F_Child<5’, ‘MC_OLD_M>0<=1′,’MC_OLD_F>0<=1′,’Date to Delivery’, ‘Date to insert’,’Date of Entery’]] izmir_lim is a view/copy of izmir. You subsequently attempt to assign to it. This is what is throwing the error. Use this instead: izmir_lim = izmir[[‘Gender’,’Age’,’MC_OLD_M>=60′,’MC_OLD_F>=60′, ‘MC_OLD_M>18′,’MC_OLD_F>18′,’MC_OLD_18>M>5’, ‘MC_OLD_18>F>5′,’MC_OLD_M_Child<5′,’MC_OLD_F_Child<5’, ‘MC_OLD_M>0<=1′,’MC_OLD_F>0<=1′,’Date to Delivery’, ‘Date to insert’,’Date of Entery’]].copy() Whenever you ‘create’ … Read more

Pandas: getting rid of the multiindex

I think you need if is necessary convert MultiIndex to Index: df.columns = df.columns.map(”.join) Or if need remove level use droplevel: df.columns = df.columns.droplevel(0) If need access to values is possible use xs: df = df.xs(‘CID’, axis=1, level=1) You can also check: What is the difference between size and count in pandas? EDIT: For remove … Read more

Read all but last line of CSV file in pandas

Pass on_bad_lines=”skip” and it will skip this line automatically df = pd.read_csv(filename, on_bad_lines=”skip”) The advantage of on_bad_lines=”skip” is it will skip and not bork on any erroneous lines. But if the last line is always duff then skipfooter=1 is better. Thanks to @DexterMorgan for pointing out that skipfooter option forces the engine to use the … Read more