dataframe – Page 116

UnicodeDecodeError when reading CSV file in Pandas with Python

September 10, 2022 by Tarik

read_csv takes an encoding option to deal with files in different formats. I mostly use read_csv(‘file’, encoding = “ISO-8859-1”), or alternatively encoding = “utf-8” for reading, and generally utf-8 for to_csv. You can also use one of several alias options like ‘latin’ or ‘cp1252’ (Windows) instead of ‘ISO-8859-1’ (see python docs, also for numerous other … Read more

How can I get a value from a cell of a dataframe?

September 10, 2022 by Tarik

If you have a DataFrame with only one row, then access the first (only) row as a Series using iloc, and then the value using the column name: In [3]: sub_df Out[3]: A B 2 -0.133653 -0.030854 In [4]: sub_df.iloc[0] Out[4]: A -0.133653 B -0.030854 Name: 2, dtype: float64 In [5]: sub_df.iloc[0][‘A’] Out[5]: -0.13365288513107493

Converting a Pandas GroupBy output from Series to DataFrame

September 10, 2022 by Tarik

g1 here is a DataFrame. It has a hierarchical index, though: In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([(‘Alice’, ‘Seattle’), (‘Bob’, ‘Seattle’), (‘Mallory’, ‘Portland’), (‘Mallory’, ‘Seattle’)], dtype=object) Perhaps you want something like this? In [21]: g1.add_suffix(‘_Count’).reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory … Read more

How to check if any value is NaN in a Pandas DataFrame

September 10, 2022 by Tarik

jwilner’s response is spot on. I was exploring to see if there’s a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster: df.isnull().values.any() import numpy as np import pandas as pd import perfplot def setup(n): df = pd.DataFrame(np.random.randn(n)) df[df > 0.9] = np.nan return df def … Read more

How to apply a function to two columns of Pandas dataframe

September 10, 2022 by Tarik

There is a clean, one-line way of doing this in Pandas: df[‘col_3’] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1) This allows f to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns. Example with data (based on original question): import pandas as pd … Read more

Difference between map, applymap and apply methods in Pandas

September 9, 2022 by Tarik

Straight from Wes McKinney’s Python for Data Analysis book, pg. 132 (I highly recommended this book): Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this: In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list(‘bde’), index=[‘Utah’, ‘Ohio’, ‘Texas’, ‘Oregon’]) In [117]: frame Out[117]: b d e … Read more

Constructing pandas DataFrame from values in variables gives “ValueError: If using all scalar values, you must pass an index”

September 9, 2022 by Tarik

The error message says that if you’re passing scalar values, you have to pass an index. So you can either not use scalar values for the columns — e.g. use a list: >>> df = pd.DataFrame({‘A’: [a], ‘B’: [b]}) >>> df A B 0 2 3 or use scalar values and pass an index: >>> … Read more

Convert pandas dataframe to NumPy array

September 9, 2022 by Tarik

Use df.to_numpy() It’s better than df.values, here’s why.* It’s time to deprecate your usage of values and as_matrix(). pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects: to_numpy(), which is defined on Index, Series, and DataFrame objects, and array, which is defined on Index and Series objects only. If you visit … Read more