Add a string prefix to each value in a pandas string column
df[‘col’] = ‘str’ + df[‘col’].astype(str) Example: >>> df = pd.DataFrame({‘col’:[‘a’,0]}) >>> df col 0 a 1 0 >>> df[‘col’] = ‘str’ + df[‘col’].astype(str) >>> df col 0 stra 1 str0
df[‘col’] = ‘str’ + df[‘col’].astype(str) Example: >>> df = pd.DataFrame({‘col’:[‘a’,0]}) >>> df col 0 a 1 0 >>> df[‘col’] = ‘str’ + df[‘col’].astype(str) >>> df col 0 stra 1 str0
Use sort_values to sort the df by a specific column’s values: In [18]: df.sort_values(‘2’) Out[18]: 0 1 2 4 85.6 January 1.0 3 95.5 February 2.0 7 104.8 March 3.0 0 354.7 April 4.0 8 283.5 May 5.0 6 238.7 June 6.0 5 152.0 July 7.0 1 55.4 August 8.0 11 212.7 September 9.0 10 … Read more
You can use the attribute df.empty to check whether it’s empty or not: if df.empty: print(‘DataFrame is empty!’) Source: Pandas Documentation
g1 here is a DataFrame. It has a hierarchical index, though: In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([(‘Alice’, ‘Seattle’), (‘Bob’, ‘Seattle’), (‘Mallory’, ‘Portland’), (‘Mallory’, ‘Seattle’)], dtype=object) Perhaps you want something like this? In [21]: g1.add_suffix(‘_Count’).reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory … Read more
DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more
You can use numpy.where: print df hours $ 0 0 8 1 0 9 2 0 9 3 3 6 4 6 4 5 3 7 6 5 5 7 10 1 8 9 3 9 3 6 10 5 4 11 5 7 df[‘$/hour’] = np.where(df[‘hours’] < 1, df[‘hours’], df[‘$’]/df[‘hours’]) print df hours $ … Read more
What you’re looking for is Seq[o.a.s.sql.Row]: import org.apache.spark.sql.Row val my_size = udf { subjects: Seq[Row] => subjects.size } Explanation: Current representation of ArrayType is, as you already know, WrappedArray so Array won’t work and it is better to stay on the safe side. According to the official specification, the local (external) type for StructType is … Read more
The problem is that there are spaces in your column names; here is what I get when I save your data and load the dataframe as you have done: df.columns # result: Index([‘LABEL’, ‘ F1’, ‘ F2’, ‘ F3’, ‘ F4’, ‘ F5’, ‘ X’, ‘ Y’, ‘ Z’, ‘ C1’, ‘ C2’], dtype=”object”) so, … Read more
The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df. The index of df is a simple index: In [8]: df.index Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=”int64″) while the index of the calculated column … Read more