dataframe – Page 19 – Tarik Billa

Changing multiple column names but not all of them – Pandas Python

September 14, 2023 by Tarik

say you have a dictionary of the new column names and the name of the column they should replace: df.rename(columns={‘old_col’:’new_col’, ‘old_col_2′:’new_col_2′}, inplace=True) But, if you don’t have that, and you only have the indices, you can do this: column_indices = [1,4,5,6] new_names = [‘a’,’b’,’c’,’d’] old_names = df.columns[column_indices] df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

How can I “unpivot” specific columns from a pandas DataFrame?

September 12, 2023 by Tarik

This can be done with pd.melt(): # value_name is ‘value’ by default, but setting it here to make it clear pd.melt(x, id_vars=[‘farm’, ‘fruit’], var_name=”year”, value_name=”value”) Result: farm fruit year value 0 A apple 2014 10 1 B apple 2014 12 2 A pear 2014 6 3 B pear 2014 8 4 A apple 2015 11 … Read more

In Pandas, does .iloc method give a copy or view?

September 12, 2023 by Tarik

You are starting with a DataFrame that has two columns with two different dtypes: df.dtypes Out: age int64 name object dtype: object Since different dtypes are stored in different numpy arrays under the hood, you have two different blocks for them: df.blocks Out: {‘int64’: age student1 21 student2 24, ‘object’: name student1 Marry student2 John} … Read more

access data frame column using variable

September 11, 2023 by Tarik

Pandas rolling apply using multiple columns

September 10, 2023 by Tarik

How about this: def masscenter(ser): print(df.loc[ser.index]) return 0 rol = df.price.rolling(window=2) rol.apply(masscenter, raw=False) It uses the rolling logic to get subsets from an arbitrary column. The raw=False option provides you with index values for those subsets (which are given to you as Series), then you use those index values to get multi-column slices from your … Read more

duplicates in multiple columns

September 10, 2023 by Tarik

Replace NaN with empty list in a pandas dataframe

September 9, 2023 by Tarik

This works using isnull and loc to mask the series: In [90]: d.loc[d.isnull()] = d.loc[d.isnull()].apply(lambda x: []) d Out[90]: 0 [1, 2, 3] 1 [1, 2] 2 [] 3 [] dtype: object In [91]: d.apply(len) Out[91]: 0 3 1 2 2 0 3 0 dtype: int64 You have to do this using apply in order … Read more

What is the fastest and most efficient way to append rows to a DataFrame?

September 8, 2023 by Tarik

As Mohit Motwani suggested fastest way is to collect data into dictionary then load all into data frame. Below some speed measurements examples: import pandas as pd import numpy as np import time import random end_value = 10000 Measurement for creating a list of dictionaries and at the end load all into data frame start_time … Read more

use multiple columns as variables with sapply

September 8, 2023 by Tarik

Not Found