Converting a Pandas GroupBy multiindex output from Series back to DataFrame

g1 here is a DataFrame. It has a hierarchical index, though: In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([(‘Alice’, ‘Seattle’), (‘Bob’, ‘Seattle’), (‘Mallory’, ‘Portland’), (‘Mallory’, ‘Seattle’)], dtype=object) Perhaps you want something like this? In [21]: g1.add_suffix(‘_Count’).reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory … Read more

How to iterate over rows in a Pandas DataFrame?

DataFrame.iterrows is a generator which yields both the index and row (as a Series): import pandas as pd df = pd.DataFrame({‘c1’: [10, 11, 12], ‘c2’: [100, 110, 120]}) df = df.reset_index() # make sure indexes pair with number of rows for index, row in df.iterrows(): print(row[‘c1’], row[‘c2’]) 10 100 11 110 12 120 Obligatory disclaimer … Read more

Defining a UDF that accepts an Array of objects in a Spark DataFrame?

What you’re looking for is Seq[o.a.s.sql.Row]: import org.apache.spark.sql.Row val my_size = udf { subjects: Seq[Row] => subjects.size } Explanation: Current representation of ArrayType is, as you already know, WrappedArray so Array won’t work and it is better to stay on the safe side. According to the official specification, the local (external) type for StructType is … Read more

How To Solve KeyError: u”None of [Index([..], dtype=’object’)] are in the [columns]”

The problem is that there are spaces in your column names; here is what I get when I save your data and load the dataframe as you have done: df.columns # result: Index([‘LABEL’, ‘ F1’, ‘ F2’, ‘ F3’, ‘ F4’, ‘ F5’, ‘ X’, ‘ Y’, ‘ Z’, ‘ C1’, ‘ C2’], dtype=”object”) so, … Read more

Attaching a calculated column to an existing dataframe raises TypeError: incompatible index of inserted column with frame index

The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of df. The index of df is a simple index: In [8]: df.index Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype=”int64″) while the index of the calculated column … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)