pandas – Page 2 – Tarik Billa

Custom transformer for sklearn Pipeline that alters both X and y

April 10, 2024 by Tarik

Modifying the sample axis, e.g. removing samples, does not (yet?) comply with the scikit-learn transformer API. So if you need to do this, you should do it outside any calls to scikit learn, as preprocessing. As it is now, the transformer API is used to transform the features of a given sample into something new. … Read more

sort pandas dataframe based on list

April 10, 2024 by Tarik

The improved support for Categoricals in pandas version 0.15 allows you to do this easily: df[‘LSE_cat’] = pd.Categorical( df[‘LSE’], categories=[‘Oands’,’Wetnds’,’Develd’,’Cn’,’Soys’,’Otherg’,’Wht’], ordered=True ) df.sort(‘LSE_cat’) Out[5]: Region LSE North South LSE_cat 3 3 Oands -47.986764 -32.324991 Oands 2 2 Wetnds -38.480206 -46.089908 Wetnds 1 1 Develd -36.157025 -27.669988 Develd 0 0 Cn 33.330367 9.178917 Cn 5 5 … Read more

pd.read_csv by default treats integers like floats

April 10, 2024 by Tarik

As root mentioned in the comments, this is a limitation of Pandas (and Numpy). NaN is a float and the empty values you have in your CSV are NaN. This is listed in the gotchas of pandas as well. You can work around this in a few ways. For the examples below I used the … Read more

How to make two rows in a pandas dataframe into column headers

April 10, 2024 by Tarik

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns: df = pandas.read_csv(‘DollarUnitSales.csv’, header=[0,1]) You can also use more than two rows, or non-consecutive rows, to specify the column … Read more

Group a multi-indexed pandas dataframe by one of its levels?

April 10, 2024 by Tarik

Yes, use the level parameter. Take a look here. Example: In [26]: s first second third bar doo one 0.404705 two 0.577046 baz bee one -1.715002 two -1.039268 foo bop one -0.370647 two -1.157892 qux bop one -1.344312 two 0.844885 dtype: float64 In [27]: s.groupby(level=[‘first’,’second’]).sum() first second bar doo 0.981751 baz bee -2.754270 foo bop … Read more

How to correctly read csv in Pandas while changing the names of the columns

April 10, 2024 by Tarik

You are right, something is odd with the name attributes. Seems to me that you can not use both in the same time. Either you set the name for every columns of the CSV file or you don’t set the name at all. So it seems that you can’t set the name when you are … Read more

Customizing the separator in pandas read_csv

April 10, 2024 by Tarik

Yes, you can use a simple regular expression like sep=’\s+’ to denote one or more spaces.

How to convert pandas dataframe to nested dictionary

April 10, 2024 by Tarik

I think you were very close. Use groupby and to_dict: df = df.groupby(‘Name’)[[‘Chain’,’Food’,’Healthy’]] .apply(lambda x: x.set_index(‘Chain’).to_dict(orient=”index”)) .to_dict() print (df) {‘George’: {‘KFC’: {‘Healthy’: False, ‘Food’: ‘chicken’}, ‘McDonalds’: {‘Healthy’: False, ‘Food’: ‘burger’}}, ‘John’: {‘McDonalds’: {‘Healthy’: True, ‘Food’: ‘salad’}, ‘Wendys’: {‘Healthy’: False, ‘Food’: ‘burger’}}}

Pandas – transpose one column [duplicate]

April 10, 2024 by Tarik

Since you aren’t performing an aggregation, pd.DataFrame.pivot should be preferred to groupby / pivot_table: res = df.pivot(index=’date’, columns=”name”, values=”quantity”) print(res) name A B C date 1/1/2018 5 6 7 1/2/2018 9 8 6 If you wish you can use reset_index to elevate date to a column.

figsize in matplotlib is not changing the figure size? [duplicate]

April 9, 2024 by Tarik

One option (as mentioned by @tda), and probably the best/most standard way, is to put the plt.figure before the plt.bar: import matplotlib.pyplot as plt plt.figure(figsize=(20,10)) plt.bar(x[‘user’], x[‘number’], color=”blue”) Another option, if you want to set the figure size after creating the figure, is to use fig.set_size_inches (note I used plt.gcf here to get the current … Read more