How to pass whole Row to UDF – Spark DataFrame filter

You have to use struct() function for constructing the row while making a call to the function, follow these steps. Import Row, import org.apache.spark.sql._ Define the UDF def myFilterFunction(r:Row) = {r.get(0)==r.get(1)} Register the UDF sqlContext.udf.register(“myFilterFunction”, myFilterFunction _) Create the dataFrame val records = sqlContext.createDataFrame(Seq((“sachin”, “sachin”), (“aggarwal”, “aggarwal1”))).toDF(“text”, “text2”) Use the UDF records.filter(callUdf(“myFilterFunction”,struct($”text”,$”text2″))).show When u want … Read more

Why does Spark report “java.net.URISyntaxException: Relative path in absolute URI” when working with DataFrames?

It’s the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark’s codebase that may soon be released as 2.0.2 or 2.1.0). The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse that uses /// (triple slashes). Start spark-shell with –conf argument … Read more

How to handle null values when writing to parquet from Spark

You misinterpreted SPARK-10943. Spark does support writing null values to numeric columns. The problem is that null alone carries no type information at all scala> spark.sql(“SELECT null as comments”).printSchema root |– comments: null (nullable = true) As per comment by Michael Armbrust all you have to do is cast: scala> spark.sql(“””SELECT CAST(null as DOUBLE) AS … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)