crosstab – Tarik Billa

Dynamic alternative to pivot with CASE and GROUP BY

September 15, 2023 by Tarik

If you have not installed the additional module tablefunc, run this command once per database: CREATE EXTENSION tablefunc; Answer to question A very basic crosstab solution for your case: SELECT * FROM crosstab( ‘SELECT bar, 1 AS cat, feh FROM tbl_org ORDER BY bar, feh’) AS ct (bar text, val1 int, val2 int, val3 int); … Read more

How is a Pandas crosstab different from a Pandas pivot_table?

August 28, 2023 by Tarik

The main difference between the two is the pivot_table expects your input data to already be a DataFrame; you pass a DataFrame to pivot_table and specify the index/columns/values by passing the column names as strings. With cross_tab, you don’t necessarily need to have a DataFrame going in, as you just pass array-like objects for index/columns/values. … Read more

Create a pivot table with PostgreSQL

August 14, 2023 by Tarik

First compute the average with the aggregate function avg(): SELECT neighborhood, bedrooms, avg(price) FROM listings GROUP BY 1,2 ORDER BY 1,2; Then feed the result to the crosstab() function as instructed in great detail in this related answer: PostgreSQL Crosstab Query

Transpose latest rows per user to columns

March 1, 2023 by Tarik

Use crosstab() from the tablefunc module. SELECT * FROM crosstab( $$SELECT user_id, user_name, rn, email_address FROM ( SELECT u.user_id, u.user_name, e.email_address , row_number() OVER (PARTITION BY u.user_id ORDER BY e.creation_date DESC NULLS LAST) AS rn FROM usr u LEFT JOIN email_tbl e USING (user_id) ) sub WHERE rn < 4 ORDER BY user_id $$ , … Read more

How to make a pandas crosstab with percentages?

February 6, 2023 by Tarik

From Pandas 0.18.1 onwards, there’s a normalize option: In [1]: pd.crosstab(df.A,df.B, normalize=”index”) Out[1]: B A B C A one 0.333333 0.333333 0.333333 three 0.333333 0.333333 0.333333 two 0.333333 0.333333 0.333333 Where you can normalise across either all, index (rows), or columns. More details are available in the documentation.

Groupby value counts on the dataframe pandas

January 8, 2023 by Tarik

I use groupby and size df.groupby([‘id’, ‘group’, ‘term’]).size().unstack(fill_value=0) Timing 1,000,000 rows df = pd.DataFrame(dict(id=np.random.choice(100, 1000000), group=np.random.choice(20, 1000000), term=np.random.choice(10, 1000000)))

PostgreSQL Crosstab Query

October 5, 2022 by Tarik

Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that: CREATE EXTENSION IF NOT EXISTS tablefunc; Improved test case CREATE TABLE tbl ( section text , status text , ct integer — “count” is a reserved word in standard SQL ); INSERT … Read more

MySQL – Rows to Columns

October 5, 2022 by Tarik

I’m going to add a somewhat longer and more detailed explanation of the steps to take to solve this problem. I apologize if it’s too long. I’ll start out with the base you’ve given and use it to define a couple of terms that I’ll use for the rest of this post. This will be … Read more