How to select the first row of each group?

Window functions: Something like this should do the trick: import org.apache.spark.sql.functions.{row_number, max, broadcast} import org.apache.spark.sql.expressions.Window val df = sc.parallelize(Seq( (0,”cat26″,30.9), (0,”cat13″,22.1), (0,”cat95″,19.6), (0,”cat105″,1.3), (1,”cat67″,28.5), (1,”cat4″,26.8), (1,”cat13″,12.6), (1,”cat23″,5.3), (2,”cat56″,39.6), (2,”cat40″,29.7), (2,”cat187″,27.9), (2,”cat68″,9.8), (3,”cat8″,35.6))).toDF(“Hour”, “Category”, “TotalValue”) val w = Window.partitionBy($”hour”).orderBy($”TotalValue”.desc) val dfTop = df.withColumn(“rn”, row_number.over(w)).where($”rn” === 1).drop(“rn”) dfTop.show // +—-+——–+———-+ // |Hour|Category|TotalValue| // +—-+——–+———-+ // | 0| … Read more

SQL – using alias in Group By

SQL is implemented as if a query was executed in the following order: FROM clause WHERE clause GROUP BY clause HAVING clause SELECT clause ORDER BY clause For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step. So in Oracle … Read more

Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints

This problem is usually caused by one of the following null values being returned for columns not set to AllowDBNull duplicate rows being returned with the same primary key. a mismatch in column definition (e.g. size of char fields) between the database and the dataset Try running your query natively and look at the results, … Read more

Oracle SELECT TOP 10 records [duplicate]

You’ll need to put your current query in subquery as below : SELECT * FROM ( SELECT DISTINCT APP_ID, NAME, STORAGE_GB, HISTORY_CREATED, TO_CHAR(HISTORY_DATE, ‘DD.MM.YYYY’) AS HISTORY_DATE FROM HISTORY WHERE STORAGE_GB IS NOT NULL AND APP_ID NOT IN (SELECT APP_ID FROM HISTORY WHERE TO_CHAR(HISTORY_DATE, ‘DD.MM.YYYY’) =’06.02.2009′) ORDER BY STORAGE_GB DESC ) WHERE ROWNUM <= 10 Oracle … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)