Spark: How to translate count(distinct(value)) in Dataframe API’s

What you need is the DataFrame aggregation function countDistinct:

import sqlContext.implicits._
import org.apache.spark.sql.functions._

case class Log(page: String, visitor: String)

val logs = data.map(p => Log(p._1,p._2))
            .toDF()

val result = logs.select("page","visitor")
            .groupBy('page)
            .agg('page, countDistinct('visitor))

result.foreach(println)

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)