dataframe: how to groupBy/count then filter on count in Scala

When you pass a string to the filter function, the string is interpreted as SQL. Count is a SQL keyword and using count as a variable confuses the parser. This is a small bug (you can file a JIRA ticket if you want to).

You can easily avoid this by using a column expression instead of a String:

df.groupBy("x").count()
  .filter($"count" >= 2)
  .show()

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)