Count distinct values with OVER(PARTITION BY id)

No, as the error message states, DISTINCT is not implemented with windows functions. Aplying info from this link into your case you could use something like: WITH uniques AS ( SELECT congestion.id_element, COUNT(DISTINCT congestion.week_nb) AS unique_references FROM congestion WHERE congestion.date >= ‘2014.01.01’ AND congestion.date <= ‘2014.12.31’ GROUP BY congestion.id_element ) SELECT congestion.date, congestion.week_nb, congestion.id_congestion, congestion.id_element, … Read more

Filtering by window function result in Postgresql

I don’t know if this qualifies as “more elegant” but it is written in a different manner than Cybernate’s solution (although it is essentially the same) WITH window_table AS ( SELECT s.*, sum(volume) OVER previous_rows as total FROM stuff s WINDOW previous_rows as (ORDER BY priority desc ROWS between UNBOUNDED PRECEDING and CURRENT ROW) ) … Read more

Avoid performance impact of a single partition mode in Spark window functions

In practice performance impact will be almost the same as if you omitted partitionBy clause at all. All records will be shuffled to a single partition, sorted locally and iterated sequentially one by one. The difference is only in the number of partitions created in total. Let’s illustrate that with an example using simple dataset … Read more

Applying a Window function to calculate differences in pySpark

You can bring the previous day column by using lag function, and add additional column that does actual day-to-day return from the two columns, but you may have to tell spark how to partition your data and/or order it to do lag, something like this: from pyspark.sql.window import Window import pyspark.sql.functions as func from pyspark.sql.functions … Read more

How to perform grouped ranking in MySQL

SELECT id_student, id_class, grade, @student:=CASE WHEN @class <> id_class THEN 0 ELSE @student+1 END AS rn, @class:=id_class AS clset FROM (SELECT @student:= -1) s, (SELECT @class:= -1) c, (SELECT * FROM mytable ORDER BY id_class, id_student ) t This works in a very plain way: Initial query is ordered by id_class first, id_student second. @student … Read more

Dynamic alternative to pivot with CASE and GROUP BY

If you have not installed the additional module tablefunc, run this command once per database: CREATE EXTENSION tablefunc; Answer to question A very basic crosstab solution for your case: SELECT * FROM crosstab( ‘SELECT bar, 1 AS cat, feh FROM tbl_org ORDER BY bar, feh’) AS ct (bar text, val1 int, val2 int, val3 int); … Read more