Building a row from a dict in pySpark

Question

You can use keyword arguments unpacking as follows:

Row(**row_dict)

## Row(C0=-1.1990072635132698, C3=0.12605772684660232, C4=0.5760856026559944, 
##     C5=0.1951877800894315, C6=24.72378589441825, summary='kurtosis')

It is important to note that it internally sorts data by key to address problems with older Python versions.

This behavior is likely to be removed in the upcoming releases – see SPARK-29748 Remove sorting of fields in PySpark SQL Row creation. Once it is remove you’ll have to ensure that the order of values in the dict is consistent across records.

Leave a Comment Cancel reply