dataframe – Page 119

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

August 30, 2022 by Tarik

Don’t drop, just take the rows where EPS is not NA: df = df[df[‘EPS’].notna()]

How to add a new column to an existing DataFrame?

August 30, 2022 by Tarik

Edit 2017 As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign: df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values) Edit 2015 Some reported getting the SettingWithCopyWarning with this code. However, the code still runs perfectly with the current pandas … Read more

Create a Pandas Dataframe by appending one row at a time

August 30, 2022 by Tarik

You can use df.loc[i], where the row with index i will be what you specify it to be in the dataframe. >>> import pandas as pd >>> from numpy.random import randint >>> df = pd.DataFrame(columns=[‘lib’, ‘qty1’, ‘qty2’]) >>> for i in range(5): >>> df.loc[i] = [‘name’ + str(i)] + list(randint(10, size=2)) >>> df lib qty1 … Read more

Get a list from Pandas DataFrame column headers

August 30, 2022 by Tarik

You can get the values as a list by doing: list(my_dataframe.columns.values) Also you can simply use (as shown in Ed Chum’s answer): list(my_dataframe)

Change column type in pandas

August 29, 2022 by Tarik

You have four main options for converting types in pandas: to_numeric() – provides functionality to safely convert non-numeric types (e.g. strings) to a suitable numeric type. (See also to_datetime() and to_timedelta().) astype() – convert (almost) any type to (almost) any other type (even if it’s not necessarily sensible to do so). Also allows you to … Read more

How to change the order of DataFrame columns?

August 29, 2022 by Tarik

One easy way would be to reassign the dataframe with a list of the columns, rearranged as needed. This is what you have now: In [6]: df Out[6]: 0 1 2 3 4 mean 0 0.445598 0.173835 0.343415 0.682252 0.582616 0.445543 1 0.881592 0.696942 0.702232 0.696724 0.373551 0.670208 2 0.662527 0.955193 0.131016 0.609548 0.804694 0.632596 … Read more

Sort (order) data frame rows by multiple columns

August 28, 2022 by Tarik

You can use the order() function directly without resorting to add-on tools — see this simpler answer which uses a trick right from the top of the example(order) code: R> dd[with(dd, order(-z, b)), ] b x y z 4 Low C 9 2 2 Med D 3 1 1 Hi A 8 1 3 Hi … Read more

Selecting multiple columns in a Pandas dataframe

August 28, 2022 by Tarik

The column names (which are strings) cannot be sliced in the manner you tried. Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []’s). df1 = … Read more

How do I get the row count of a Pandas DataFrame?

August 28, 2022 by Tarik

For a dataframe df, one can use any of the following: len(df.index) df.shape[0] df[df.columns[0]].count() (== number of non-NaN values in first column) Code to reproduce the plot: import numpy as np import pandas as pd import perfplot perfplot.save( “out.png”, setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)), n_range=[2**k for k in range(25)], kernels=[ lambda df: len(df.index), lambda … Read more

Delete a column from a Pandas DataFrame

August 27, 2022 by Tarik

The best way to do this in Pandas is to use drop: df = df.drop(‘column_name’, axis=1) where 1 is the axis number (0 for rows and 1 for columns.) To delete the column without having to reassign df you can do: df.drop(‘column_name’, axis=1, inplace=True) Finally, to drop by column number instead of by column label, … Read more