How to query JSON data column using Spark DataFrames?
Spark >= 2.4 If needed, schema can be determined using schema_of_json function (please note that this assumes that an arbitrary row is a valid representative of the schema). import org.apache.spark.sql.functions.{lit, schema_of_json, from_json} import collection.JavaConverters._ val schema = schema_of_json(lit(df.select($”jsonData”).as[String].first)) df.withColumn(“jsonData”, from_json($”jsonData”, schema, Map[String, String]().asJava)) Spark >= 2.1 You can use from_json function: import org.apache.spark.sql.functions.from_json import org.apache.spark.sql.types._ … Read more