Unpivot in Spark SQL / PySpark

You can use the built in stack function, for example in Scala: scala> val df = Seq((“G”,Some(4),2,None),(“H”,None,4,Some(5))).toDF(“A”,”X”,”Y”, “Z”) df: org.apache.spark.sql.DataFrame = [A: string, X: int … 2 more fields] scala> df.show +—+—-+—+—-+ | A| X| Y| Z| +—+—-+—+—-+ | G| 4| 2|null| | H|null| 4| 5| +—+—-+—+—-+ scala> df.select($”A”, expr(“stack(3, ‘X’, X, ‘Y’, Y, ‘Z’, … Read more

How to pivot on multiple columns in Spark SQL?

Here’s a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates). dff = mydf.groupBy(‘id’).pivot(‘day’).agg(F.first(‘price’).alias(‘price’),F.first(‘units’).alias(‘unit’)) Here’s the result (apologies for the non-matching ordering and naming): +—+——-+——+——-+——+——-+——+——-+——+ | id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit| +—+——-+——+——-+——+——-+——+——-+——+ |100| 23| 10| 45| 11| 67| 12| 78| 13| |101| 23| 10| 45| 13| 67| 14| 78| 15| … Read more

SQL – How to transpose?

MySQL doesn’t support ANSI PIVOT/UNPIVOT syntax, so that leave you to use: SELECT t.userid MAX(CASE WHEN t.fieldname=”Username” THEN t.fieldvalue ELSE NULL END) AS Username, MAX(CASE WHEN t.fieldname=”Password” THEN t.fieldvalue ELSE NULL END) AS Password, MAX(CASE WHEN t.fieldname=”Email Address” THEN t.fieldvalue ELSE NULL END) AS Email FROM TABLE t GROUP BY t.userid As you can see, … Read more

Dynamically create columns sql

You will want to use a PIVOT function for this. If you have a known number of columns, then you can hard-code the values: select name, [Bronze], [Silver], [Gold], [Platinum], [AnotherOne] from ( select c.name, cr.description, r.typeid from customers c left join rewards r on c.id = r.customerid left join customerrewards cr on r.typeid = … Read more

SQL Server pivot vs. multiple join

The answer will of course be “it depends” but based on testing this end… Assuming 1 million products product has a clustered index on product_id Most (if not all) products have corresponding information in the product_code table Ideal indexes present on product_code for both queries. The PIVOT version ideally needs an index product_code(product_id, type) INCLUDE … Read more

Dynamic alternative to pivot with CASE and GROUP BY

If you have not installed the additional module tablefunc, run this command once per database: CREATE EXTENSION tablefunc; Answer to question A very basic crosstab solution for your case: SELECT * FROM crosstab( ‘SELECT bar, 1 AS cat, feh FROM tbl_org ORDER BY bar, feh’) AS ct (bar text, val1 int, val2 int, val3 int); … Read more

One-to-Many SQL SELECT into single row

This is one way to get the result. This approach uses correlated subqueries. Each subquery uses an ORDER BY clause to sort the related rows from table2, and uses the LIMIT clause to retrieve the 1st, 2nd and 3rd rows. SELECT a.PKID , a.DATA , (SELECT b1.U_DATA FROM table2 b1 WHERE b1.PKID_FROM_TABLE_1 = a.PKID ORDER … Read more