How to add pandas data to an existing csv file?

You can specify a python write mode in the pandas to_csv function. For append it is ‘a’. In your case: df.to_csv(‘my_csv.csv’, mode=”a”, header=False) The default mode is ‘w’. If the file initially might be missing, you can make sure the header is printed at the first write using this variation: output_path=”my_csv.csv” df.to_csv(output_path, mode=”a”, header=not os.path.exists(output_path))

Sample random rows in dataframe

First make some data: > df = data.frame(matrix(rnorm(20), nrow=10)) > df X1 X2 1 0.7091409 -1.4061361 2 -1.1334614 -0.1973846 3 2.3343391 -0.4385071 4 -0.9040278 -0.6593677 5 0.4180331 -1.2592415 6 0.7572246 -0.5463655 7 -0.8996483 0.4231117 8 -1.0356774 -0.1640883 9 -0.3983045 0.7157506 10 -0.9060305 2.3234110 Then select some rows at random: > df[sample(nrow(df), 3), ] X1 X2 … Read more

Normalize columns of a dataframe

one easy way by using Pandas: (here I want to use mean normalization) normalized_df=(df-df.mean())/df.std() to use min-max normalization: normalized_df=(df-df.min())/(df.max()-df.min()) Edit: To address some concerns, need to say that Pandas automatically applies colomn-wise function in the code above.

How to reversibly store and load a Pandas dataframe to/from disk

The easiest way is to pickle it using to_pickle: df.to_pickle(file_name) # where to save it, usually as a .pkl Then you can load it back using: df = pd.read_pickle(file_name) Note: before 0.11.1 save and load were the only way to do this (they are now deprecated in favor of to_pickle and read_pickle respectively). Another popular … Read more

Convert DataFrame column type from string to datetime

The easiest way is to use to_datetime: df[‘col’] = pd.to_datetime(df[‘col’]) It also offers a dayfirst argument for European times (but beware this isn’t strict). Here it is in action: In [11]: pd.to_datetime(pd.Series([’05/23/2005′])) Out[11]: 0 2005-05-23 00:00:00 dtype: datetime64[ns] You can pass a specific format: In [12]: pd.to_datetime(pd.Series([’05/23/2005′]), format=”%m/%d/%Y”) Out[12]: 0 2005-05-23 dtype: datetime64[ns]

Extracting specific columns from a data frame

You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()), especially when programming in functions, packages, or applications. # data for reproducible example # (and to avoid confusion from trying to subset `stats::df`) df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5]) … Read more

Pandas conditional creation of a series/dataframe column

If you only have two choices to select from: df[‘color’] = np.where(df[‘Set’]==’Z’, ‘green’, ‘red’) For example, import pandas as pd import numpy as np df = pd.DataFrame({‘Type’:list(‘ABBC’), ‘Set’:list(‘ZZXY’)}) df[‘color’] = np.where(df[‘Set’]==’Z’, ‘green’, ‘red’) print(df) yields Set Type color 0 Z A green 1 Z B green 2 X B red 3 Y C red If … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)