dataframe – Tarik Billa

How can I iterate over rows in a Pandas DataFrame?

April 12, 2024 by Tarik

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more

Must have equal len keys and value when setting with an iterable

April 11, 2024 by Tarik

You can use apply to index into leader and exchange values with DatasetLabel, although it’s not very pretty. One issue is that Pandas won’t let us index with NaN. Converting to str provides a workaround. But that creates a second issue, namely, column 9 is of type float (because NaN is float), so 5 becomes … Read more

How to make two rows in a pandas dataframe into column headers

April 10, 2024 by Tarik

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns: df = pandas.read_csv(‘DollarUnitSales.csv’, header=[0,1]) You can also use more than two rows, or non-consecutive rows, to specify the column … Read more

How to convert pandas dataframe to nested dictionary

April 10, 2024 by Tarik

I think you were very close. Use groupby and to_dict: df = df.groupby(‘Name’)[[‘Chain’,’Food’,’Healthy’]] .apply(lambda x: x.set_index(‘Chain’).to_dict(orient=”index”)) .to_dict() print (df) {‘George’: {‘KFC’: {‘Healthy’: False, ‘Food’: ‘chicken’}, ‘McDonalds’: {‘Healthy’: False, ‘Food’: ‘burger’}}, ‘John’: {‘McDonalds’: {‘Healthy’: True, ‘Food’: ‘salad’}, ‘Wendys’: {‘Healthy’: False, ‘Food’: ‘burger’}}}

Pandas – transpose one column [duplicate]

April 10, 2024 by Tarik

Since you aren’t performing an aggregation, pd.DataFrame.pivot should be preferred to groupby / pivot_table: res = df.pivot(index=’date’, columns=”name”, values=”quantity”) print(res) name A B C date 1/1/2018 5 6 7 1/2/2018 9 8 6 If you wish you can use reset_index to elevate date to a column.

Pandas: Remove NaN only at beginning and end of dataframe

April 9, 2024 by Tarik

Use the built in first_valid_index and last_valid_index they are designed specifically for this and slice your df: In [5]: first_idx = df.first_valid_index() last_idx = df.last_valid_index() print(first_idx, last_idx) df.loc[first_idx:last_idx] 1950 1954 Out[5]: sum 1950 5 1951 3 1952 NaN 1953 4 1954 8

DataFrame modified inside a function

April 9, 2024 by Tarik

def test(df): df = df.copy(deep=True) df[‘tt’] = np.nan return df If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by … Read more

How to check if a pandas dataframe contains only numeric values column-wise?

April 9, 2024 by Tarik

You can check that using to_numeric and coercing errors: pd.to_numeric(df[‘column’], errors=”coerce”).notnull().all() For all columns, you can iterate through columns or just use apply df.apply(lambda s: pd.to_numeric(s, errors=”coerce”).notnull().all()) E.g. df = pd.DataFrame({‘col’ : [1,2, 10, np.nan, ‘a’], ‘col2’: [‘a’, 10, 30, 40 ,50], ‘col3’: [1,2,3,4,5.0]}) Outputs col False col2 False col3 True dtype: bool

Checking for identical columns in a data frame in R

April 8, 2024 by Tarik

Pandas lookup from one of multiple columns, based on value

April 7, 2024 by Tarik

There is a built in lookup function that can handle this type of situation (looks up by row/column). I don’t know how optimized it is, but may be faster than the apply solution. In [9]: df[‘value’] = df.lookup(df.index, df[‘best’]) In [10]: df Out[10]: Date best a b c d value 0 1990 a 5 4 … Read more