aggregate-functions – Page 3

Spark SQL replacement for MySQL’s GROUP_CONCAT aggregate function

June 5, 2023 by Tarik

Before you proceed: This operations is yet another another groupByKey. While it has multiple legitimate applications it is relatively expensive so be sure to use it only when required. Not exactly concise or efficient solution but you can use UserDefinedAggregateFunction introduced in Spark 1.5.0: object GroupConcat extends UserDefinedAggregateFunction { def inputSchema = new StructType().add(“x”, StringType) … Read more

Return multiple columns of the same row as JSON array of objects

May 26, 2023 by Tarik

json_build_object() in Postgres 9.4 or newer Or jsonb_build_object() to return jsonb. SELECT value_two, json_agg(json_build_object(‘value_three’, value_three , ‘value_four’ , value_four)) AS value_four FROM mytable GROUP BY value_two; The manual: Builds a JSON object out of a variadic argument list. By convention, the argument list consists of alternating keys and values. For any version (incl. Postgres 9.3) … Read more

PostgreSQL: running count of rows for a query ‘by minute’

May 20, 2023 by Tarik

Return only minutes with activity Shortest SELECT DISTINCT date_trunc(‘minute’, “when”) AS minute , count(*) OVER (ORDER BY date_trunc(‘minute’, “when”)) AS running_ct FROM mytable ORDER BY 1; Use date_trunc(), it returns exactly what you need. Don’t include id in the query, since you want to GROUP BY minute slices. count() is typically used as plain aggregate … Read more

LINQ aggregate and group by periods of time

May 17, 2023 by Tarik

You could round the time stamp to the next boundary (i.e. down to the closest 5 minute boundary in the past) and use that as your grouping: var groups = series.GroupBy(x => { var stamp = x.timestamp; stamp = stamp.AddMinutes(-(stamp.Minute % 5)); stamp = stamp.AddMilliseconds(-stamp.Millisecond – 1000 * stamp.Second); return stamp; }) .Select(g => new … Read more

How to SUM and SUBTRACT using SQL?

May 17, 2023 by Tarik

I think this is what you’re looking for. NEW_BAL is the sum of QTYs subtracted from the balance: SELECT master_table.ORDERNO, master_table.ITEM, SUM(master_table.QTY), stock_bal.BAL_QTY, (stock_bal.BAL_QTY – SUM(master_table.QTY)) AS NEW_BAL FROM master_table INNER JOIN stock_bal ON master_bal.ITEM = stock_bal.ITEM GROUP BY master_table.ORDERNO, master_table.ITEM If you want to update the item balance with the new balance, use the … Read more

DISTINCT ON in an aggregate function in postgres

May 11, 2023 by Tarik

The most simple thing I discovered is to use DISTINCT over jsonb (not json!). (jsonb_build_object creates jsonb objects) SELECT JSON_AGG( DISTINCT jsonb_build_object(‘tag_id’, photo_tag.tag_id, ‘name’, tag.name)) AS tags FROM photo LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id GROUP BY … Read more

No non-missing arguments warning when using min or max in reshape2

May 8, 2023 by Tarik

How can I use SUM for bit columns?

May 5, 2023 by Tarik

SELECT SUM(CAST(bitColumn AS INT)) FROM dbo.MyTable need to cast into number or another solution – SELECT COUNT(*) FROM dbo.MyTable WHERE bitColumn = 1

in postgres select, return a column subquery as an array?

May 2, 2023 by Tarik

Use the aggregate function: select usr_id, name, array_agg(tag_id) as tag_arr from users join tags using(usr_id) group by usr_id, name or an array constructor from the results of a subquery: select u.usr_id, name, array( select tag_id from tags t where t.usr_id = u.usr_id ) as tag_arr from users u The second option is a simple one-source … Read more