dataframe – Page 114

Sorting columns in pandas dataframe based on column name [duplicate]

September 16, 2022 by Tarik

df = df.reindex(sorted(df.columns), axis=1) This assumes that sorting the column names will give the order you want. If your column names won’t sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you’ll need to sort differently, but that has nothing to do with pandas.

Changing column names of a data frame

September 15, 2022 by Tarik

Use the colnames() function: R> X <- data.frame(bad=1:3, worse=rnorm(3)) R> X bad worse 1 1 -2.440467 2 2 1.320113 3 3 -0.306639 R> colnames(X) <- c(“good”, “better”) R> X good better 1 1 -2.440467 2 2 1.320113 3 3 -0.306639 You can also subset: R> colnames(X)[2] <- “superduper”

Selecting/excluding sets of columns in pandas [duplicate]

September 15, 2022 by Tarik

You can either Drop the columns you do not need OR Select the ones you need # Using DataFrame.drop df.drop(df.columns[[1, 2]], axis=1, inplace=True) # drop by Name df1 = df1.drop([‘B’, ‘C’], axis=1) # Select the ones you want df1 = df[[‘a’,’d’]]

How to sum a variable by group

September 15, 2022 by Tarik

Using aggregate: aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum) Category x 1 First 30 2 Second 5 3 Third 34 In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind: aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) … (embedding @thelatemail comment), aggregate has a formula interface too aggregate(Frequency … Read more

How can I use the apply() function for a single column?

September 14, 2022 by Tarik

Given a sample dataframe df as: a b 0 1 2 1 2 3 2 3 4 3 4 5 what you want is: df[‘a’] = df[‘a’].apply(lambda x: x + 1) that returns: a b 0 2 2 1 3 3 2 4 4 3 5 5

How do I create test and train samples from one dataframe with pandas?

September 14, 2022 by Tarik

Scikit Learn’s train_test_split is a good one. It will split both numpy arrays and dataframes. from sklearn.model_selection import train_test_split train, test = train_test_split(df, test_size=0.2)

Pandas read_csv: low_memory and dtype options

September 14, 2022 by Tarik

The deprecated low_memory option The low_memory option is not properly deprecated, but it should be, since it does not actually do anything differently[source] The reason you get this low_memory warning is because guessing dtypes for each column is very memory demanding. Pandas tries to determine what dtype to set by analyzing the data in each … Read more

How to reset index in a pandas dataframe? [duplicate]

September 14, 2022 by Tarik

DataFrame.reset_index is what you’re looking for. If you don’t want it saved as a column, then do: df = df.reset_index(drop=True) If you don’t want to reassign: df.reset_index(drop=True, inplace=True)

How to flatten a hierarchical index in columns

September 13, 2022 by Tarik

I think the easiest way to do this would be to set the columns to the top level: df.columns = df.columns.get_level_values(0) Note: if the to level has a name you can also access it by this, rather than 0. . If you want to combine/join your MultiIndex into one Index (assuming you have just string … Read more

Convert Python dict into a dataframe

September 13, 2022 by Tarik

The error here, is since calling the DataFrame constructor with scalar values (where it expects values to be a list/dict/… i.e. have multiple columns): pd.DataFrame(d) ValueError: If using all scalar values, you must must pass an index You could take the items from the dictionary (i.e. the key-value pairs): In [11]: pd.DataFrame(d.items()) # or list(d.items()) … Read more