dataframe – Page 111

Split (explode) pandas dataframe string entry to separate rows

September 27, 2022 by Tarik

UPDATE 3: it makes more sense to use Series.explode() / DataFrame.explode() methods (implemented in Pandas 0.25.0 and extended in Pandas 1.3.0 to support multi-column explode) as is shown in the usage example: for a single column: In [1]: df = pd.DataFrame({‘A’: [[0, 1, 2], ‘foo’, [], [3, 4]], …: ‘B’: 1, …: ‘C’: [[‘a’, ‘b’, … Read more

Add column to dataframe with constant value

September 26, 2022 by Tarik

df[‘Name’]=’abc’ will add the new column and set all rows to that value: In [79]: df Out[79]: Date, Open, High, Low, Close 0 01-01-2015, 565, 600, 400, 450 In [80]: df[‘Name’] = ‘abc’ df Out[80]: Date, Open, High, Low, Close Name 0 01-01-2015, 565, 600, 400, 450 abc

Detect and exclude outliers in a pandas DataFrame

September 26, 2022 by Tarik

If you have multiple columns in your dataframe and would like to remove all rows that have outliers in at least one column, the following expression would do that in one shot. df = pd.DataFrame(np.random.randn(100, 3)) import numpy as np from scipy import stats df[(np.abs(stats.zscore(df)) < 3).all(axis=1)] description: For each column, it first computes the … Read more

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

September 26, 2022 by Tarik

Try pd.ExcelFile: xls = pd.ExcelFile(‘path_to_file.xls’) df1 = pd.read_excel(xls, ‘Sheet1’) df2 = pd.read_excel(xls, ‘Sheet2′) As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn’t appear to be a way around this). This merely saves you from having to read the same file in each time you want to access … Read more

How to access the last value in a vector?

September 26, 2022 by Tarik

I use the tail function: tail(vector, n=1) The nice thing with tail is that it works on dataframes too, unlike the x[length(x)] idiom.

Pandas Replace NaN with blank/empty string

September 26, 2022 by Tarik

df = df.fillna(”) This will fill na’s (e.g. NaN’s) with ”. inplace is possible but should be avoided as it will be deprecated: df.fillna(”, inplace=True) To fill only a single column: df.column1 = df.column1.fillna(”) One can use df[‘column1’] instead of df.column1.

Convert a Pandas DataFrame to a dictionary

September 26, 2022 by Tarik

The to_dict() method sets the column names as dictionary keys so you’ll need to reshape your DataFrame slightly. Setting the ‘ID’ column as the index and then transposing the DataFrame is one way to achieve this. to_dict() also accepts an ‘orient’ argument which you’ll need in order to output a list of values for each … Read more

Select DataFrame rows between two dates

September 25, 2022 by Tarik

There are two possible solutions: Use a boolean mask, then use df.loc[mask] Set the date column as a DatetimeIndex, then use df[start_date : end_date] Using a boolean mask: Ensure df[‘date’] is a Series with dtype datetime64[ns]: df[‘date’] = pd.to_datetime(df[‘date’]) Make a boolean mask. start_date and end_date can be datetime.datetimes, np.datetime64s, pd.Timestamps, or even datetime strings: … Read more

Opposite of %in%: exclude rows with values specified in a vector

September 25, 2022 by Tarik

You can use the ! operator to basically make any TRUE FALSE and every FALSE TRUE. so: D2 = subset(D1, !(V1 %in% c(‘B’,’N’,’T’))) EDIT: You can also make an operator yourself: ‘%!in%’ <- function(x,y)!(‘%in%'(x,y)) c(1,3,11)%!in%1:10 [1] FALSE FALSE TRUE

Get column index from column name in python pandas

September 25, 2022 by Tarik

Sure, you can use .get_loc(): In [45]: df = DataFrame({“pear”: [1,2,3], “apple”: [2,3,4], “orange”: [3,4,5]}) In [46]: df.columns Out[46]: Index([apple, orange, pear], dtype=object) In [47]: df.columns.get_loc(“pear”) Out[47]: 2 although to be honest I don’t often need this myself. Usually access by name does what I want it to (df[“pear”], df[[“apple”, “orange”]], or maybe df.columns.isin([“orange”, “pear”])), … Read more