pandas – Page 242 – Tarik Billa

Get a list from Pandas DataFrame column headers

August 30, 2022 by Tarik

You can get the values as a list by doing: list(my_dataframe.columns.values) Also you can simply use (as shown in Ed Chum’s answer): list(my_dataframe)

Change column type in pandas

August 29, 2022 by Tarik

You have four main options for converting types in pandas: to_numeric() – provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().) astype() – convert (almost) any type to (almost) any other type (even if it’s not necessarily sensible to do so). Also allows you to … Read more

How to change the order of DataFrame columns?

August 29, 2022 by Tarik

One easy way would be to reassign the dataframe with a list of the columns, rearranged as needed. This is what you have now: In [6]: df Out[6]: 0 1 2 3 4 mean 0 0.445598 0.173835 0.343415 0.682252 0.582616 0.445543 1 0.881592 0.696942 0.702232 0.696724 0.373551 0.670208 2 0.662527 0.955193 0.131016 0.609548 0.804694 0.632596 … Read more

Selecting multiple columns in a Pandas dataframe

August 28, 2022 by Tarik

The column names (which are strings) cannot be sliced in the manner you tried. Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []’s). df1 = … Read more

How do I get the row count of a Pandas DataFrame?

August 28, 2022 by Tarik

For a dataframe df, one can use any of the following: len(df.index) df.shape[0] df[df.columns[0]].count() (== number of non-NaN values in first column) Code to reproduce the plot: import numpy as np import pandas as pd import perfplot perfplot.save( “out.png”, setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)), n_range=[2**k for k in range(25)], kernels=[ lambda df: len(df.index), lambda … Read more

Delete a column from a Pandas DataFrame

August 27, 2022 by Tarik

The best way to do this in Pandas is to use drop: df = df.drop(‘column_name’, axis=1) where 1 is the axis number (0 for rows and 1 for columns.) To delete the column without having to reassign df you can do: df.drop(‘column_name’, axis=1, inplace=True) Finally, to drop by column number instead of by column label, … Read more

Renaming column names in Pandas

August 25, 2022 by Tarik

RENAME SPECIFIC COLUMNS Use the df.rename() function and refer the columns to be renamed. Not all the columns have to be renamed: df = df.rename(columns={‘oldName1’: ‘newName1’, ‘oldName2’: ‘newName2’}) # Or rename the existing DataFrame (rather than creating a copy) df.rename(columns={‘oldName1’: ‘newName1’, ‘oldName2’: ‘newName2’}, inplace=True) Minimal Code Example df = pd.DataFrame(‘x’, index=range(3), columns=list(‘abcde’)) df a b … Read more

How do I select rows from a DataFrame based on column values?

August 23, 2022 by Tarik

To select rows whose column value equals a scalar, some_value, use ==: df.loc[df[‘column_name’] == some_value] To select rows whose column value is in an iterable, some_values, use isin: df.loc[df[‘column_name’].isin(some_values)] Combine multiple conditions with &: df.loc[(df[‘column_name’] >= A) & (df[‘column_name’] <= B)] Note the parentheses. Due to Python’s operator precedence rules, & binds more tightly than … Read more

How to iterate over rows in a DataFrame in Pandas

August 21, 2022 by Tarik

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120