dataframe – Page 8 – Tarik Billa

Pandas DataFrame sort ignoring the case

December 22, 2023 by Tarik

Pandas 1.1.0 introduced the key argument as a more intuitive way to achieve this: df.sort_values(by=’Single’, inplace=True, key=lambda col: col.str.lower())

How to know the labels assigned by astype(‘category’).cat.codes?

December 22, 2023 by Tarik

You can generate dictionary: c = language.lang.astype(‘category’) d = dict(enumerate(c.cat.categories)) print (d) {0: ‘english’, 1: ‘spanish’} So then if necessary is possible map: language[‘code’] = language.lang.astype(‘category’).cat.codes language[‘level_back’] = language[‘code’].map(d) print (language) lang level code level_back 0 english intermediate 0 english 1 spanish intermediate 1 spanish 2 spanish basic 1 spanish 3 english basic 0 english … Read more

PANDAS drop a range of rows from df

December 21, 2023 by Tarik

Use slice to select the part you want: df[:-m] If you want to remove some middle rows, you can use drop: df.drop(df.index[3:5])

Convert JSON data from Request into Pandas DataFrame

December 21, 2023 by Tarik

Or, more simply: import requests import pandas as pd r = requests.get(‘http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php’) j = r.json() df = pd.DataFrame.from_dict(j)

Pandas Dataframe: Replacing NaN with row average

December 21, 2023 by Tarik

As commented the axis argument to fillna is NotImplemented. df.fillna(df.mean(axis=1), axis=1) Note: this would be critical here as you don’t want to fill in your nth columns with the nth row average. For now you’ll need to iterate through: m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # … Read more

How to calculate date difference in pyspark?

December 20, 2023 by Tarik

You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2: from pyspark.sql.functions import datediff, to_date, lit df.withColumn(“test”, datediff(to_date(lit(“2017-05-02”)), to_date(“low”,”yyyy/MM/dd”))).show() +———-+—-+——+—–+ | low|high|normal| test| +———-+—-+——+—–+ |1986/10/15| z| null|11157| |1986/10/15| z| null|11157| |1986/10/15| c| null|11157| |1986/10/15|null| null|11157| |1986/10/16|null| 4.0|11156| +———-+—-+——+—–+ Using < Spark 2.2, … Read more

Pandas filter dataframe rows with a specific year

December 20, 2023 by Tarik

You can use datetime accesor. import datetime as dt df[‘Date’] = pd.to_datetime(df[‘Date’]) include = df[df[‘Date’].dt.year == year] exclude = df[df[‘Date’].dt.year != year]

How to convert true false values in dataframe as 1 for true and 0 for false

December 20, 2023 by Tarik

First, if you have the strings ‘TRUE’ and ‘FALSE’, you can convert those to boolean True and False values like this: df[‘COL2’] == ‘TRUE’ That gives you a bool column. You can use astype to convert to int (because bool is an integral type, where True means 1 and False means 0, which is exactly … Read more

Convert text data from requests object to dataframe with pandas

December 19, 2023 by Tarik

Try this import requests import pandas as pd import io urlData = requests.get(url).content rawData = pd.read_csv(io.StringIO(urlData.decode(‘utf-8’)))

How to filter a dataframe of dates by a particular month/day?

December 19, 2023 by Tarik

Using pd.to_datetime & dt accessor The accepted answer is not the “pandas” way to approach this problem. To select only rows with month 11, use the dt accessor: # df[‘Date’] = pd.to_datetime(df[‘Date’]) — if column is not datetime yet df = df[df[‘Date’].dt.month == 11] Same works for days or years, where you can substitute dt.month … Read more