Custom transformer for sklearn Pipeline that alters both X and y

Modifying the sample axis, e.g. removing samples, does not (yet?) comply with the scikit-learn transformer API. So if you need to do this, you should do it outside any calls to scikit learn, as preprocessing. As it is now, the transformer API is used to transform the features of a given sample into something new. … Read more

sort pandas dataframe based on list

The improved support for Categoricals in pandas version 0.15 allows you to do this easily: df[‘LSE_cat’] = pd.Categorical( df[‘LSE’], categories=[‘Oands’,’Wetnds’,’Develd’,’Cn’,’Soys’,’Otherg’,’Wht’], ordered=True ) df.sort(‘LSE_cat’) Out[5]: Region LSE North South LSE_cat 3 3 Oands -47.986764 -32.324991 Oands 2 2 Wetnds -38.480206 -46.089908 Wetnds 1 1 Develd -36.157025 -27.669988 Develd 0 0 Cn 33.330367 9.178917 Cn 5 5 … Read more

How to make two rows in a pandas dataframe into column headers

If using pandas.read_csv() or pandas.read_table(), you can provide a list of indices for the header argument, to specify the rows you want to use for column headers. Python will generate the pandas.MultiIndex for you in df.columns: df = pandas.read_csv(‘DollarUnitSales.csv’, header=[0,1]) You can also use more than two rows, or non-consecutive rows, to specify the column … Read more

Group a multi-indexed pandas dataframe by one of its levels?

Yes, use the level parameter. Take a look here. Example: In [26]: s first second third bar doo one 0.404705 two 0.577046 baz bee one -1.715002 two -1.039268 foo bop one -0.370647 two -1.157892 qux bop one -1.344312 two 0.844885 dtype: float64 In [27]: s.groupby(level=[‘first’,’second’]).sum() first second bar doo 0.981751 baz bee -2.754270 foo bop … Read more

How to convert pandas dataframe to nested dictionary

I think you were very close. Use groupby and to_dict: df = df.groupby(‘Name’)[[‘Chain’,’Food’,’Healthy’]] .apply(lambda x: x.set_index(‘Chain’).to_dict(orient=”index”)) .to_dict() print (df) {‘George’: {‘KFC’: {‘Healthy’: False, ‘Food’: ‘chicken’}, ‘McDonalds’: {‘Healthy’: False, ‘Food’: ‘burger’}}, ‘John’: {‘McDonalds’: {‘Healthy’: True, ‘Food’: ‘salad’}, ‘Wendys’: {‘Healthy’: False, ‘Food’: ‘burger’}}}

figsize in matplotlib is not changing the figure size? [duplicate]

One option (as mentioned by @tda), and probably the best/most standard way, is to put the plt.figure before the plt.bar: import matplotlib.pyplot as plt plt.figure(figsize=(20,10)) plt.bar(x[‘user’], x[‘number’], color=”blue”) Another option, if you want to set the figure size after creating the figure, is to use fig.set_size_inches (note I used plt.gcf here to get the current … Read more

tech