dataframe – Page 113

How to add pandas data to an existing csv file?

September 21, 2022 by Tarik

You can specify a python write mode in the pandas to_csv function. For append it is ‘a’. In your case: df.to_csv(‘my_csv.csv’, mode=”a”, header=False) The default mode is ‘w’. If the file initially might be missing, you can make sure the header is printed at the first write using this variation: output_path=”my_csv.csv” df.to_csv(output_path, mode=”a”, header=not os.path.exists(output_path))

pandas: filter rows of DataFrame with operator chaining

September 20, 2022 by Tarik

I’m not entirely sure what you want, and your last line of code does not help either, but anyway: “Chained” filtering is done by “chaining” the criteria in the boolean index. In [96]: df Out[96]: A B C D a 1 4 9 1 b 4 5 0 2 c 5 5 1 0 d … Read more

Sample random rows in dataframe

September 20, 2022 by Tarik

First make some data: > df = data.frame(matrix(rnorm(20), nrow=10)) > df X1 X2 1 0.7091409 -1.4061361 2 -1.1334614 -0.1973846 3 2.3343391 -0.4385071 4 -0.9040278 -0.6593677 5 0.4180331 -1.2592415 6 0.7572246 -0.5463655 7 -0.8996483 0.4231117 8 -1.0356774 -0.1640883 9 -0.3983045 0.7157506 10 -0.9060305 2.3234110 Then select some rows at random: > df[sample(nrow(df), 3), ] X1 X2 … Read more

Normalize columns of a dataframe

September 20, 2022 by Tarik

one easy way by using Pandas: (here I want to use mean normalization) normalized_df=(df-df.mean())/df.std() to use min-max normalization: normalized_df=(df-df.min())/(df.max()-df.min()) Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.

How to reversibly store and load a Pandas dataframe to/from disk

September 19, 2022 by Tarik

The easiest way is to pickle it using to_pickle: df.to_pickle(file_name) # where to save it, usually as a .pkl Then you can load it back using: df = pd.read_pickle(file_name) Note: before 0.11.1 save and load were the only way to do this (they are now deprecated in favor of to_pickle and read_pickle respectively). Another popular … Read more

Convert DataFrame column type from string to datetime

September 18, 2022 by Tarik

The easiest way is to use to_datetime: df[‘col’] = pd.to_datetime(df[‘col’]) It also offers a dayfirst argument for European times (but beware this isn’t strict). Here it is in action: In [11]: pd.to_datetime(pd.Series([’05/23/2005′])) Out[11]: 0 2005-05-23 00:00:00 dtype: datetime64[ns] You can pass a specific format: In [12]: pd.to_datetime(pd.Series([’05/23/2005′]), format=”%m/%d/%Y”) Out[12]: 0 2005-05-23 dtype: datetime64[ns]

Extracting specific columns from a data frame

September 18, 2022 by Tarik

You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()), especially when programming in functions, packages, or applications. # data for reproducible example # (and to avoid confusion from trying to subset `stats::df`) df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5]) … Read more

Pandas conditional creation of a series/dataframe column

September 17, 2022 by Tarik

If you only have two choices to select from: df[‘color’] = np.where(df[‘Set’]==’Z’, ‘green’, ‘red’) For example, import pandas as pd import numpy as np df = pd.DataFrame({‘Type’:list(‘ABBC’), ‘Set’:list(‘ZZXY’)}) df[‘color’] = np.where(df[‘Set’]==’Z’, ‘green’, ‘red’) print(df) yields Set Type color 0 Z A green 1 Z B green 2 X B red 3 Y C red If … Read more