How to create a udf in PySpark which returns an array of strings?

You need to initialize a StringType instance:

label_udf = udf(my_udf, ArrayType(StringType()))
#                                           ^^ 
df.withColumn('subset', label_udf(df.col1)).show()
+------------+------+
|        col1|subset|
+------------+------+
|     oculunt|[s, n]|
|predistposed|[s, n]|
| incredulous|[s, n]|
+------------+------+

Leave a Comment

tech