Find row where values for column is maximal in a pandas DataFrame

Use the pandas idxmax function. It’s straightforward: >>> import pandas >>> import numpy as np >>> df = pandas.DataFrame(np.random.randn(5,3),columns=[‘A’,’B’,’C’]) >>> df A B C 0 1.232853 -1.979459 -0.573626 1 0.140767 0.394940 1.068890 2 0.742023 1.343977 -0.579745 3 2.125299 -0.649328 -0.211692 4 -0.187253 1.908618 -1.862934 >>> df[‘A’].idxmax() 3 >>> df[‘B’].idxmax() 4 >>> df[‘C’].idxmax() 1 Alternatively you … Read more

How to convert a data frame column to numeric type?

Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven’t specified what type of vector you want to convert to numeric. I suggest that you should apply transform function in order to complete your task. Now I’m about to demonstrate certain “conversion anomaly”: # create dummy … Read more

Simultaneously merge multiple data.frames in a list

Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below: x <- data.frame(i = c(“a”,”b”,”c”), j = 1:3, stringsAsFactors=FALSE) y <- data.frame(i = c(“b”,”c”,”d”), k = 4:6, stringsAsFactors=FALSE) z … Read more

Pandas DataFrame to List of Dictionaries

Use df.to_dict(‘records’) — gives the output without having to transpose externally. In [2]: df.to_dict(‘records’) Out[2]: [{‘customer’: 1L, ‘item1’: ‘apple’, ‘item2’: ‘milk’, ‘item3’: ‘tomato’}, {‘customer’: 2L, ‘item1’: ‘water’, ‘item2’: ‘orange’, ‘item3’: ‘potato’}, {‘customer’: 3L, ‘item1’: ‘juice’, ‘item2’: ‘mango’, ‘item3’: ‘chips’}]

Convert Pandas column containing NaNs to dtype `int`

In version 0.24.+ pandas has gained the ability to hold integer dtypes with missing values. Nullable Integer Data Type. Pandas can represent integer data with possibly missing values using arrays.IntegerArray. This is an extension types implemented within pandas. It is not the default dtype for integers, and will not be inferred; you must explicitly pass … Read more

How to split a dataframe string column into two columns?

TL;DR version: For the simple case of: I have a text column with a delimiter and I want two columns The simplest solution is: df[[‘A’, ‘B’]] = df[‘AB’].str.split(‘ ‘, 1, expand=True) You must use expand=True if your strings have a non-uniform number of splits and you want None to replace the missing values. Notice how, … Read more

Difference between DataFrame, Dataset, and RDD in Spark

First thing is DataFrame was evolved from SchemaRDD. Yes.. conversion between Dataframe and RDD is absolutely possible. Below are some sample code snippets. df.rdd is RDD[Row] Below are some of options to create dataframe. 1) yourrddOffrow.toDF converts to DataFrame. 2) Using createDataFrame of sql context val df = spark.createDataFrame(rddOfRow, schema) where schema can be from … Read more

Update a dataframe in pandas while iterating row by row

You can use df.at: for i, row in df.iterrows(): ifor_val = something if <condition>: ifor_val = something_else df.at[i,’ifor’] = ifor_val For versions before 0.21.0, use df.set_value: for i, row in df.iterrows(): ifor_val = something if <condition>: ifor_val = something_else df.set_value(i,’ifor’,ifor_val) If you don’t need the row values you could simply iterate over the indices of … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)