Split (explode) pandas dataframe string entry to separate rows

UPDATE 3: it makes more sense to use Series.explode() / DataFrame.explode() methods (implemented in Pandas 0.25.0 and extended in Pandas 1.3.0 to support multi-column explode) as is shown in the usage example: for a single column: In [1]: df = pd.DataFrame({‘A’: [[0, 1, 2], ‘foo’, [], [3, 4]], …: ‘B’: 1, …: ‘C’: [[‘a’, ‘b’, … Read more

Detect and exclude outliers in a pandas DataFrame

If you have multiple columns in your dataframe and would like to remove all rows that have outliers in at least one column, the following expression would do that in one shot. df = pd.DataFrame(np.random.randn(100, 3)) import numpy as np from scipy import stats df[(np.abs(stats.zscore(df)) < 3).all(axis=1)] description: For each column, it first computes the … Read more

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

Try pd.ExcelFile: xls = pd.ExcelFile(‘path_to_file.xls’) df1 = pd.read_excel(xls, ‘Sheet1’) df2 = pd.read_excel(xls, ‘Sheet2′) As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn’t appear to be a way around this). This merely saves you from having to read the same file in each time you want to access … Read more

Select DataFrame rows between two dates

There are two possible solutions: Use a boolean mask, then use df.loc[mask] Set the date column as a DatetimeIndex, then use df[start_date : end_date] Using a boolean mask: Ensure df[‘date’] is a Series with dtype datetime64[ns]: df[‘date’] = pd.to_datetime(df[‘date’]) Make a boolean mask. start_date and end_date can be datetime.datetimes, np.datetime64s, pd.Timestamps, or even datetime strings: … Read more

Get column index from column name in python pandas

Sure, you can use .get_loc(): In [45]: df = DataFrame({“pear”: [1,2,3], “apple”: [2,3,4], “orange”: [3,4,5]}) In [46]: df.columns Out[46]: Index([apple, orange, pear], dtype=object) In [47]: df.columns.get_loc(“pear”) Out[47]: 2 although to be honest I don’t often need this myself. Usually access by name does what I want it to (df[“pear”], df[[“apple”, “orange”]], or maybe df.columns.isin([“orange”, “pear”])), … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)