dataframe – Page 112

How to drop columns by name in a data frame

September 25, 2022 by Tarik

You should use either indexing or the subset function. For example : R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8) R> df x y z u 1 1 2 3 4 2 2 3 4 5 3 3 4 5 6 4 4 5 6 7 5 5 6 7 8 Then you can use the … Read more

pandas get rows which are NOT in other dataframe

September 24, 2022 by Tarik

The currently selected solution produces incorrect results. To correctly solve this problem, we can perform a left-join from df1 to df2, making sure to first get just the unique rows for df2. First, we need to modify the original DataFrame to add the row with data [3, 10]. df1 = pd.DataFrame(data = {‘col1’ : [1, … Read more

Combining two Series into a DataFrame in pandas

September 23, 2022 by Tarik

I think concat is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them): In [1]: s1 = pd.Series([1, 2], index=[‘A’, ‘B’], name=”s1″) In [2]: s2 = pd.Series([3, 4], index=[‘A’, ‘B’], name=”s2″) In [3]: pd.concat([s1, s2], axis=1) Out[3]: s1 … Read more

Pandas index column title or name

September 23, 2022 by Tarik

You can just get/set the index via its name property In [7]: df.index.name Out[7]: ‘Index Title’ In [8]: df.index.name=”foo” In [9]: df.index.name Out[9]: ‘foo’ In [10]: df Out[10]: Column 1 foo Apples 1 Oranges 2 Puppies 3 Ducks 4

How does one reorder columns in a data frame?

September 23, 2022 by Tarik

Your dataframe has four columns like so df[,c(1,2,3,4)]. Note the first comma means keep all the rows, and the 1,2,3,4 refers to the columns. To change the order as in the above question do df2[,c(1,3,2,4)] If you want to output this file as a csv, do write.csv(df2, file=”somedf.csv”)

What does axis in pandas mean?

September 23, 2022 by Tarik

It specifies the axis along which the means are computed. By default axis=0. This is consistent with the numpy.mean usage when axis is specified explicitly (in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0 along the rows (namely, index in pandas), and axis=1 along the columns. … Read more

What is the most efficient way to loop through dataframes with pandas?

September 22, 2022 by Tarik

The newest versions of pandas now include a built-in function for iterating over rows. for index, row in df.iterrows(): # do some logic here Or, if you want it faster use itertuples() But, unutbu’s suggestion to use numpy functions to avoid iterating over rows will produce the fastest code.

Convert data.frame columns from factors to characters

September 22, 2022 by Tarik

Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement: bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE) This will convert all variables to class “character”, if you want to only convert factors, see Marek’s solution below. As @hadley points … Read more

Remove pandas rows with duplicate indices

September 21, 2022 by Tarik

I would suggest using the duplicated method on the Pandas Index itself: df3 = df3[~df3.index.duplicated(keep=’first’)] While all the other methods work, .drop_duplicates is by far the least performant for the provided example. Furthermore, while the groupby method is only slightly less performant, I find the duplicated method to be more readable. Using the sample data … Read more