What you’re looking for is Seq[o.a.s.sql.Row]:
import org.apache.spark.sql.Row
val my_size = udf { subjects: Seq[Row] => subjects.size }
Explanation:
- Current representation of
ArrayTypeis, as you already know,WrappedArraysoArraywon’t work and it is better to stay on the safe side. - According to the official specification, the local (external) type for
StructTypeisRow. Unfortunately it means that access to the individual fields is not type safe.
Notes:
-
To create
structin Spark < 2.3, function passed toudfhas to returnProducttype (Tuple*orcase class), notRow. That’s because correspondingudfvariants depend on Scala reflection:Defines a Scala closure of n arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure’s signature.
-
In Spark >= 2.3 it is possible to return
Rowdirectly, as long as the schema is provided.def udf(f: AnyRef, dataType: DataType): UserDefinedFunction
Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion.See for example How to create a Spark UDF in Java / Kotlin which returns a complex type?.