dataframe – Page 110

How to take column-slices of dataframe in pandas

September 29, 2022 by Tarik

2017 Answer – pandas 0.20: .ix is deprecated. Use .loc See the deprecation in the docs .loc uses label based indexing to select both rows and columns. The labels being the values of the index or the columns. Slicing with .loc includes the last element. Let’s assume we have a DataFrame with the following columns: … Read more

Find row where values for column is maximal in a pandas DataFrame

September 29, 2022 by Tarik

Use the pandas idxmax function. It’s straightforward: >>> import pandas >>> import numpy as np >>> df = pandas.DataFrame(np.random.randn(5,3),columns=[‘A’,’B’,’C’]) >>> df A B C 0 1.232853 -1.979459 -0.573626 1 0.140767 0.394940 1.068890 2 0.742023 1.343977 -0.579745 3 2.125299 -0.649328 -0.211692 4 -0.187253 1.908618 -1.862934 >>> df[‘A’].idxmax() 3 >>> df[‘B’].idxmax() 4 >>> df[‘C’].idxmax() 1 Alternatively you … Read more

How to convert a data frame column to numeric type?

September 29, 2022 by Tarik

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven’t specified what type of vector you want to convert to numeric. I suggest that you should apply transform function in order to complete your task. Now I’m about to demonstrate certain “conversion anomaly”: # create dummy … Read more

Split data frame string column into multiple columns

September 28, 2022 by Tarik

Use stringr::str_split_fixed library(stringr) str_split_fixed(before$type, “_and_”, 2)

Convert Pandas column containing NaNs to dtype `int`

September 28, 2022 by Tarik

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values. Nullable Integer Data Type. Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass … Read more

How to split a dataframe string column into two columns?

September 27, 2022 by Tarik

TL;DR version: For the simple case of: I have a text column with a delimiter and I want two columns The simplest solution is: df[[‘A’, ‘B’]] = df[‘AB’].str.split(‘ ‘, 1, expand=True) You must use expand=True if your strings have a non-uniform number of splits and you want None to replace the missing values. Notice how, … Read more

Difference between DataFrame, Dataset, and RDD in Spark

September 27, 2022 by Tarik

First thing is DataFrame was evolved from SchemaRDD. Yes.. conversion between Dataframe and RDD is absolutely possible. Below are some sample code snippets. df.rdd is RDD[Row] Below are some of options to create dataframe. 1) yourrddOffrow.toDF converts to DataFrame. 2) Using createDataFrame of sql context val df = spark.createDataFrame(rddOfRow, schema) where schema can be from … Read more

Update a dataframe in pandas while iterating row by row

September 27, 2022 by Tarik

You can use df.at: for i, row in df.iterrows(): ifor_val = something if <condition>: ifor_val = something_else df.at[i,’ifor’] = ifor_val For versions before 0.21.0, use df.set_value: for i, row in df.iterrows(): ifor_val = something if <condition>: ifor_val = something_else df.set_value(i,’ifor’,ifor_val) If you don’t need the row values you could simply iterate over the indices of … Read more