Pandas bar plot changes date format

The plotting code assumes that each bar in a bar plot deserves its own label. You could override this assumption by specifying your own formatter: ax.xaxis.set_major_formatter(formatter) The pandas.tseries.converter.TimeSeries_DateFormatter that Pandas uses to format the dates in the “good” plot works well with line plots when the x-values are dates. However, with a bar plot the … Read more

pandas dataframe convert column type to string or categorical

You need astype: df[‘zipcode’] = df.zipcode.astype(str) #df.zipcode = df.zipcode.astype(str) For converting to categorical: df[‘zipcode’] = df.zipcode.astype(‘category’) #df.zipcode = df.zipcode.astype(‘category’) Another solution is Categorical: df[‘zipcode’] = pd.Categorical(df.zipcode) Sample with data: import pandas as pd df = pd.DataFrame({‘zipcode’: {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, ‘bathrooms’: {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, … Read more

Why does it take ages to install Pandas on Alpine Linux

Debian based images use only python pip to install packages with .whl format: Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB) Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB) WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target … Read more

Make Pandas DataFrame apply() use all cores?

You may use the swifter package: pip install swifter (Note that you may want to use this in a virtualenv to avoid version conflicts with installed dependencies.) Swifter works as a plugin for pandas, allowing you to reuse the apply function: import swifter def some_function(data): return data * 10 data[‘out’] = data[‘in’].swifter.apply(some_function) It will automatically … Read more

How to split data into 3 sets (train, validation and test)?

Numpy solution. We will shuffle the whole dataset first (df.sample(frac=1, random_state=42)) and then split our data set into the following parts: 60% – train set, 20% – validation set, 20% – test set In [305]: train, validate, test = \ np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))]) In [306]: train Out[306]: A B C D E 0 0.046919 … Read more