How to print only a certain column of DataFrame in PySpark?

select and show:

df.select("col").show()

or select, flatMap, collect:

df.select("col").rdd.flatMap(list).collect()

Bracket notation (df[df.col]) is used only for logical slicing and columns by itself (df.col) are not distributed data structures but SQL expressions and cannot be collected.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)