dataframe – Page 3 – Tarik Billa

Add a string prefix to each value in a pandas string column

January 18, 2024 by Tarik

df[‘col’] = ‘str’ + df[‘col’].astype(str) Example: >>> df = pd.DataFrame({‘col’:[‘a’,0]}) >>> df col 0 a 1 0 >>> df[‘col’] = ‘str’ + df[‘col’].astype(str) >>> df col 0 stra 1 str0

How to sort pandas dataframe by one column

January 10, 2024 by Tarik

Use sort_values to sort the df by a specific column’s values: In [18]: df.sort_values(‘2’) Out[18]: 0 1 2 4 85.6 January 1.0 3 95.5 February 2.0 7 104.8 March 3.0 0 354.7 April 4.0 8 283.5 May 5.0 6 238.7 June 6.0 5 152.0 July 7.0 1 55.4 August 8.0 11 212.7 September 9.0 10 … Read more

How do I check if a pandas DataFrame is empty?

January 10, 2024 by Tarik

You can use the attribute df.empty to check whether it’s empty or not: if df.empty: print(‘DataFrame is empty!’) Source: Pandas Documentation

How to iterate over rows in a Pandas DataFrame?

January 9, 2024 by Tarik

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more

python pandas – dividing column by another column

January 8, 2024 by Tarik

You can use numpy.where: print df hours $ 0 0 8 1 0 9 2 0 9 3 3 6 4 6 4 5 3 7 6 5 5 7 10 1 8 9 3 9 3 6 10 5 4 11 5 7 df[‘$/hour’] = np.where(df[‘hours’] < 1, df[‘hours’], df[‘$’]/df[‘hours’]) print df hours $ … Read more

Defining a UDF that accepts an Array of objects in a Spark DataFrame?

January 8, 2024 by Tarik

What you’re looking for is Seq[o.a.s.sql.Row]: import org.apache.spark.sql.Row val my_size = udf { subjects: Seq[Row] => subjects.size } Explanation: Current representation of ArrayType is, as you already know, WrappedArray so Array won’t work and it is better to stay on the safe side. According to the official specification, the local (external) type for StructType is … Read more

How To Solve KeyError: u”None of [Index([..], dtype=’object’)] are in the [columns]”

January 7, 2024 by Tarik

The problem is that there are spaces in your column names; here is what I get when I save your data and load the dataframe as you have done: df.columns # result: Index([‘LABEL’, ‘ F1’, ‘ F2’, ‘ F3’, ‘ F4’, ‘ F5’, ‘ X’, ‘ Y’, ‘ Z’, ‘ C1’, ‘ C2’], dtype=”object”) so, … Read more

How to reorder factor levels in a tidy way?

January 7, 2024 by Tarik