Pandas: Creating aggregated column in DataFrame

In [20]: df = pd.DataFrame({‘A’:[1,1,2,2],’B’:[1,2,1,2],’values’:np.arange(10,30,5)}) In [21]: df Out[21]: A B values 0 1 1 10 1 1 2 15 2 2 1 20 3 2 2 25 In [22]: df[‘sum_values_A’] = df.groupby(‘A’)[‘values’].transform(np.sum) In [23]: df Out[23]: A B values sum_values_A 0 1 1 10 25 1 1 2 15 25 2 2 1 20 … Read more

summing the number of occurrences per day pandas

If your timestamp index is a DatetimeIndex: import io import pandas as pd content=””‘\ timestamp score 2013-06-29 00:52:28+00:00 -0.420070 2013-06-29 00:51:53+00:00 -0.445720 2013-06-28 16:40:43+00:00 0.508161 2013-06-28 15:10:30+00:00 0.921474 2013-06-28 15:10:17+00:00 0.876710 ”’ df = pd.read_table(io.BytesIO(content), sep=’\s{2,}’, parse_dates=[0], index_col=[0]) print(df) so df looks like this: score timestamp 2013-06-29 00:52:28 -0.420070 2013-06-29 00:51:53 -0.445720 2013-06-28 16:40:43 0.508161 … Read more

Expand pandas DataFrame column into multiple rows

You could use df.itertuples to iterate through each row, and use a list comprehension to reshape the data into the desired form: import pandas as pd df = pd.DataFrame( {“name” : [“John”, “Eric”], “days” : [[1, 3, 5, 7], [2,4]]}) result = pd.DataFrame([(d, tup.name) for tup in df.itertuples() for d in tup.days]) print(result) yields 0 … Read more

Conditionally format Python pandas cell

From the style docs: You can apply conditional formatting, the visual styling of a DataFrame depending on the data within, by using the DataFrame.style property. import pandas as pd df = pd.DataFrame([[2,3,1], [3,2,2], [2,4,4]], columns=list(“ABC”)) df.style.apply(lambda x: [“background: red” if v > x.iloc[0] else “” for v in x], axis = 1) Edit: to format … Read more

Converting OHLC stock data into a different timeframe with python and pandas

With a more recent version of Pandas, there is a resample method. It is very fast and is useful to accomplish the same task: ohlc_dict = { ‘Open’: ‘first’, ‘High’: ‘max’, ‘Low’: ‘min’, ‘Close’: ‘last’, ‘Volume’: ‘sum’, } df.resample(‘5T’, closed=’left’, label=”left”).apply(ohlc_dict)

What are levels in a pandas DataFrame?

I stumbled across this question while analyzing the answer to my own question, but I didn’t find the John’s answer satisfying enough. After a few experiments though I think I understood the levels and decided to share: Short answer: Levels are parts of the index or column. Long answer: I think this multi-column DataFrame.groupby example … Read more