Defining a UDF that accepts an Array of objects in a Spark DataFrame?

What you’re looking for is Seq[o.a.s.sql.Row]: import org.apache.spark.sql.Row val my_size = udf { subjects: Seq[Row] => subjects.size } Explanation: Current representation of ArrayType is, as you already know, WrappedArray so Array won’t work and it is better to stay on the safe side. According to the official specification, the local (external) type for StructType is … Read more

OOo/LibreOffice UNO / Java: How to get calling spreadsheet cell of a calc function?

It looks like you want to register a listener to a spreadsheet component. To satisfy your goal, you could add the listener to the spreadsheet object it self, or to another nested object that implements an interface that supports an add.+EventListener() method. Below is a pair (broadcaster/listener) that I can think you could use in … Read more

How to create a udf in PySpark which returns an array of strings?

You need to initialize a StringType instance: label_udf = udf(my_udf, ArrayType(StringType())) # ^^ df.withColumn(‘subset’, label_udf(df.col1)).show() +————+——+ | col1|subset| +————+——+ | oculunt|[s, n]| |predistposed|[s, n]| | incredulous|[s, n]| +————+——+

SparkSQL: How to deal with null values in user defined function?

This is where Optioncomes in handy: val extractDateAsOptionInt = udf((d: String) => d match { case null => None case s => Some(s.substring(0, 10).filterNot(“-“.toSet).toInt) }) or to make it slightly more secure in general case: import scala.util.Try val extractDateAsOptionInt = udf((d: String) => Try( d.substring(0, 10).filterNot(“-“.toSet).toInt ).toOption) All credit goes to Dmitriy Selivanov who’ve pointed … Read more