Pandas DataFrame sort ignoring the case
Pandas 1.1.0 introduced the key argument as a more intuitive way to achieve this: df.sort_values(by=’Single’, inplace=True, key=lambda col: col.str.lower())
Pandas 1.1.0 introduced the key argument as a more intuitive way to achieve this: df.sort_values(by=’Single’, inplace=True, key=lambda col: col.str.lower())
You can generate dictionary: c = language.lang.astype(‘category’) d = dict(enumerate(c.cat.categories)) print (d) {0: ‘english’, 1: ‘spanish’} So then if necessary is possible map: language[‘code’] = language.lang.astype(‘category’).cat.codes language[‘level_back’] = language[‘code’].map(d) print (language) lang level code level_back 0 english intermediate 0 english 1 spanish intermediate 1 spanish 2 spanish basic 1 spanish 3 english basic 0 english … Read more
Use slice to select the part you want: df[:-m] If you want to remove some middle rows, you can use drop: df.drop(df.index[3:5])
Or, more simply: import requests import pandas as pd r = requests.get(‘http://www.starcapital.de/test/Res_Stockmarketvaluation_FundamentalKZ_Tbl.php’) j = r.json() df = pd.DataFrame.from_dict(j)
As commented the axis argument to fillna is NotImplemented. df.fillna(df.mean(axis=1), axis=1) Note: this would be critical here as you don’t want to fill in your nth columns with the nth row average. For now you’ll need to iterate through: m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # … Read more
You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2: from pyspark.sql.functions import datediff, to_date, lit df.withColumn(“test”, datediff(to_date(lit(“2017-05-02”)), to_date(“low”,”yyyy/MM/dd”))).show() +———-+—-+——+—–+ | low|high|normal| test| +———-+—-+——+—–+ |1986/10/15| z| null|11157| |1986/10/15| z| null|11157| |1986/10/15| c| null|11157| |1986/10/15|null| null|11157| |1986/10/16|null| 4.0|11156| +———-+—-+——+—–+ Using < Spark 2.2, … Read more
You can use datetime accesor. import datetime as dt df[‘Date’] = pd.to_datetime(df[‘Date’]) include = df[df[‘Date’].dt.year == year] exclude = df[df[‘Date’].dt.year != year]
First, if you have the strings ‘TRUE’ and ‘FALSE’, you can convert those to boolean True and False values like this: df[‘COL2’] == ‘TRUE’ That gives you a bool column. You can use astype to convert to int (because bool is an integral type, where True means 1 and False means 0, which is exactly … Read more
Try this import requests import pandas as pd import io urlData = requests.get(url).content rawData = pd.read_csv(io.StringIO(urlData.decode(‘utf-8’)))
Using pd.to_datetime & dt accessor The accepted answer is not the “pandas” way to approach this problem. To select only rows with month 11, use the dt accessor: # df[‘Date’] = pd.to_datetime(df[‘Date’]) — if column is not datetime yet df = df[df[‘Date’].dt.month == 11] Same works for days or years, where you can substitute dt.month … Read more