How to know the labels assigned by astype(‘category’).cat.codes?

You can generate dictionary: c = language.lang.astype(‘category’) d = dict(enumerate(c.cat.categories)) print (d) {0: ‘english’, 1: ‘spanish’} So then if necessary is possible map: language[‘code’] = language.lang.astype(‘category’).cat.codes language[‘level_back’] = language[‘code’].map(d) print (language) lang level code level_back 0 english intermediate 0 english 1 spanish intermediate 1 spanish 2 spanish basic 1 spanish 3 english basic 0 english … Read more

Pandas Dataframe: Replacing NaN with row average

As commented the axis argument to fillna is NotImplemented. df.fillna(df.mean(axis=1), axis=1) Note: this would be critical here as you don’t want to fill in your nth columns with the nth row average. For now you’ll need to iterate through: m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # … Read more

How to calculate date difference in pyspark?

You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2: from pyspark.sql.functions import datediff, to_date, lit df.withColumn(“test”, datediff(to_date(lit(“2017-05-02”)), to_date(“low”,”yyyy/MM/dd”))).show() +———-+—-+——+—–+ | low|high|normal| test| +———-+—-+——+—–+ |1986/10/15| z| null|11157| |1986/10/15| z| null|11157| |1986/10/15| c| null|11157| |1986/10/15|null| null|11157| |1986/10/16|null| 4.0|11156| +———-+—-+——+—–+ Using < Spark 2.2, … Read more

How to convert true false values in dataframe as 1 for true and 0 for false

First, if you have the strings ‘TRUE’ and ‘FALSE’, you can convert those to boolean True and False values like this: df[‘COL2’] == ‘TRUE’ That gives you a bool column. You can use astype to convert to int (because bool is an integral type, where True means 1 and False means 0, which is exactly … Read more

How to filter a dataframe of dates by a particular month/day?

Using pd.to_datetime & dt accessor The accepted answer is not the “pandas” way to approach this problem. To select only rows with month 11, use the dt accessor: # df[‘Date’] = pd.to_datetime(df[‘Date’]) — if column is not datetime yet df = df[df[‘Date’].dt.month == 11] Same works for days or years, where you can substitute dt.month … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)