How can I iterate over rows in a Pandas DataFrame?

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more

Using conditional to generate new column in pandas dataframe

You can define a function which returns your different states “Full”, “Partial”, “Empty”, etc and then use df.apply to apply the function to each row. Note that you have to pass the keyword argument axis=1 to ensure that it applies the function to rows. import pandas as pd def alert(row): if row[‘used’] == 1.0: return … Read more

Making Int64 the default integer dtype instead of standard int64 in pandas

You could use a function like this: def nan_ints(df, convert_strings=False, subset=None): types = [“int64”, “float64”] if subset is None: subset = list(df) if convert_strings: types.append(“object”) for col in subset: if df[col].dtype in types: df[col] = ( df[col].astype(float, errors=”ignore”).astype(“Int64″, errors=”ignore”) ) return df It iterates through each column and coverts it to an Int64 if it … Read more

Must have equal len keys and value when setting with an iterable

You can use apply to index into leader and exchange values with DatasetLabel, although it’s not very pretty. One issue is that Pandas won’t let us index with NaN. Converting to str provides a workaround. But that creates a second issue, namely, column 9 is of type float (because NaN is float), so 5 becomes … Read more

How do I make a progress bar for loading pandas DataFrame from a large xlsx file?

The following is a one-liner solution utilizing tqdm: import pandas as pd from tqdm import tqdm df = pd.concat([chunk for chunk in tqdm(pd.read_csv(file_name, chunksize=1000), desc=”Loading data”)]) If you know the total lines to be loaded, you can add that information with the parameter total to the tqdm fuction, resulting in a percentage output.