pandas – Tarik Billa

How can I iterate over rows in a Pandas DataFrame?

April 12, 2024 by Tarik

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more

Pandas – Writing an excel file containing unicode – IllegalCharacterError

April 12, 2024 by Tarik

The same problem happened to me. I solved it as follows: First, install python package xlsxwriter: pip install xlsxwriter Second, replace the default engine ‘openpyxl’ with ‘xlsxwriter’: df.to_excel(“test.xlsx”, engine=”xlsxwriter”)

Modify output from Python Pandas describe

April 12, 2024 by Tarik

.describe() attribute generates a Dataframe where count, std, max … are values of the index, so according to the documentation you should use .loc to retrieve just the index values desired: df.describe().loc[[‘count’,’max’]]

Using conditional to generate new column in pandas dataframe

April 12, 2024 by Tarik

You can define a function which returns your different states “Full”, “Partial”, “Empty”, etc and then use df.apply to apply the function to each row. Note that you have to pass the keyword argument axis=1 to ensure that it applies the function to rows. import pandas as pd def alert(row): if row[‘used’] == 1.0: return … Read more

How to decrease density of tick labels in subplots

April 11, 2024 by Tarik

A general approach is to tell matplotlib the desired number of ticks: plt.locator_params(nbins=10) Edit by comments from @Daniel Power: to change for a single axis (e.g. ‘x’) on an axis, use: ax.locator_params(nbins=10, axis=”x”)

Making Int64 the default integer dtype instead of standard int64 in pandas

April 11, 2024 by Tarik

You could use a function like this: def nan_ints(df, convert_strings=False, subset=None): types = [“int64”, “float64”] if subset is None: subset = list(df) if convert_strings: types.append(“object”) for col in subset: if df[col].dtype in types: df[col] = ( df[col].astype(float, errors=”ignore”).astype(“Int64″, errors=”ignore”) ) return df It iterates through each column and coverts it to an Int64 if it … Read more

Must have equal len keys and value when setting with an iterable

April 11, 2024 by Tarik

You can use apply to index into leader and exchange values with DatasetLabel, although it’s not very pretty. One issue is that Pandas won’t let us index with NaN. Converting to str provides a workaround. But that creates a second issue, namely, column 9 is of type float (because NaN is float), so 5 becomes … Read more

How do I make a progress bar for loading pandas DataFrame from a large xlsx file?

April 11, 2024 by Tarik

The following is a one-liner solution utilizing tqdm: import pandas as pd from tqdm import tqdm df = pd.concat([chunk for chunk in tqdm(pd.read_csv(file_name, chunksize=1000), desc=”Loading data”)]) If you know the total lines to be loaded, you can add that information with the parameter total to the tqdm fuction, resulting in a percentage output.

Pandas read sql integer became float

April 11, 2024 by Tarik

Problem is your data contains NaN values, so int is automatically cast to float. I think you can check NA type promotions: When introducing NAs into an existing Series or DataFrame via reindex or some other means, boolean and integer types will be promoted to a different dtype in order to store the NAs. These … Read more

Change Series inplace in DataFrame after applying function on it

April 11, 2024 by Tarik

Use loc: wanted_data.loc[:, ‘age’] = wanted_data.age.apply(lambda x: x + 1)