pandas – Page 6 – Tarik Billa

Pandas bar plot changes date format

January 31, 2023 by Tarik

The plotting code assumes that each bar in a bar plot deserves its own label. You could override this assumption by specifying your own formatter: ax.xaxis.set_major_formatter(formatter) The pandas.tseries.converter.TimeSeries_DateFormatter that Pandas uses to format the dates in the “good” plot works well with line plots when the x-values are dates. However, with a bar plot the … Read more

pandas dataframe convert column type to string or categorical

January 24, 2023 by Tarik

You need astype: df[‘zipcode’] = df.zipcode.astype(str) #df.zipcode = df.zipcode.astype(str) For converting to categorical: df[‘zipcode’] = df.zipcode.astype(‘category’) #df.zipcode = df.zipcode.astype(‘category’) Another solution is Categorical: df[‘zipcode’] = pd.Categorical(df.zipcode) Sample with data: import pandas as pd df = pd.DataFrame({‘zipcode’: {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, ‘bathrooms’: {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, … Read more

Merge two data frames based on common column values in Pandas

January 13, 2023 by Tarik

You can use pd.merge: import pandas as pd pd.merge(df1, df2, on=”movie_title”) Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2 where a matching key is available, you can use how=”left”: pd.merge(df1, … Read more

Pandas: add a column to a multiindex column dataframe

January 6, 2023 by Tarik

It’s actually pretty simple (FWIW, I originally thought to do it your way): df[‘bar’, ‘three’] = [0, 1, 2] df = df.sort_index(axis=1) print(df) bar baz one two three one two A -0.212901 0.503615 0 -1.660945 0.446778 B -0.803926 -0.417570 1 -0.336827 0.989343 C 3.400885 -0.214245 2 0.895745 1.011671

Export from pandas to_excel without row names (index)?

January 3, 2023 by Tarik

You need to set index=False in to_excel in order for it to not write the index column out, this semantic is followed in other Pandas IO tools, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html and http://pandas.pydata.org/pandas-docs/stable/io.html

IPython Notebook cell multiple outputs

December 30, 2022 by Tarik

have you tried the display command? from IPython.display import display display(salaries.head()) display(teams.head())

Replace None with NaN in pandas dataframe

November 22, 2022 by Tarik

You can use DataFrame.fillna or Series.fillna which will replace the Python object None, not the string ‘None’. import pandas as pd import numpy as np For dataframe: df = df.fillna(value=np.nan) For column or series: df.mycol.fillna(value=np.nan, inplace=True)

Why does it take ages to install Pandas on Alpine Linux

November 17, 2022 by Tarik

Debian based images use only python pip to install packages with .whl format: Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB) Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB) WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target … Read more

Make Pandas DataFrame apply() use all cores?

October 26, 2022 by Tarik

You may use the swifter package: pip install swifter (Note that you may want to use this in a virtualenv to avoid version conflicts with installed dependencies.) Swifter works as a plugin for pandas, allowing you to reuse the apply function: import swifter def some_function(data): return data * 10 data[‘out’] = data[‘in’].swifter.apply(some_function) It will automatically … Read more

How to split data into 3 sets (train, validation and test)?

October 8, 2022 by Tarik

Numpy solution. We will shuffle the whole dataset first (df.sample(frac=1, random_state=42)) and then split our data set into the following parts: 60% – train set, 20% – validation set, 20% – test set In [305]: train, validate, test = \ np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))]) In [306]: train Out[306]: A B C D E 0 0.046919 … Read more