aggregate – Page 5 – Tarik Billa

How does a site like kayak.com aggregate content? [closed]

January 31, 2023 by Tarik

I’m working in travel industry as a software architect / project lead on the precisely kind of project you describe – in our region we work with suppliers directly, but for outgoing we connect to several aggregators. To answer your question… some data you have, some you get in various ways, and some you have … Read more

Keep other columns when doing groupby

January 14, 2023 by Tarik

Method #1: use idxmin() to get the indices of the elements of minimum diff, and then select those: >>> df.loc[df.groupby(“item”)[“diff”].idxmin()] item diff otherstuff 1 1 1 2 6 2 -6 2 7 3 0 0 [3 rows x 3 columns] Method #2: sort by diff, and then take the first element in each item group: … Read more

Pass percentiles to pandas agg function

January 10, 2023 by Tarik

Perhaps not super efficient, but one way would be to create a function yourself: def percentile(n): def percentile_(x): return np.percentile(x, n) percentile_.__name__ = ‘percentile_%s’ % n return percentile_ Then include this in your agg: In [11]: column.agg([np.sum, np.mean, np.std, np.median, np.var, np.min, np.max, percentile(50), percentile(95)]) Out[11]: sum mean std median var amin amax percentile_50 percentile_95 … Read more

Collapse / concatenate / aggregate a column to a single comma separated string within each group

January 5, 2023 by Tarik

Here are some options using toString, a function that concatenates a vector of strings using comma and space to separate components. If you don’t want commas, you can use paste() with the collapse argument instead. data.table # alternative using data.table library(data.table) as.data.table(data)[, toString(C), by = list(A, B)] aggregate This uses no packages: # alternative using … Read more

ListAGG in SQLSERVER

January 5, 2023 by Tarik

MySQL SELECT FieldA , GROUP_CONCAT(FieldB ORDER BY FieldB SEPARATOR ‘,’) AS FieldBs FROM TableName GROUP BY FieldA ORDER BY FieldA; Oracle & DB2 SELECT FieldA , LISTAGG(FieldB, ‘,’) WITHIN GROUP (ORDER BY FieldB) AS FieldBs FROM TableName GROUP BY FieldA ORDER BY FieldA; PostgreSQL SELECT FieldA , STRING_AGG(FieldB, ‘,’ ORDER BY FieldB) AS FieldBs FROM … Read more

Apply several summary functions (sum, mean, etc.) on several variables by group in one call

January 2, 2023 by Tarik

You can do it all in one step and get proper labeling: > aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) # id1 id2 val1.mn val1.n val2.mn val2.n # 1 a x 1.5 2.0 6.5 2.0 # 2 b x 2.0 2.0 8.0 2.0 # 3 … Read more

Extract row corresponding to minimum value of a variable by group

December 30, 2022 by Tarik

Slightly more elegant: library(data.table) DT[ , .SD[which.min(Employees)], by = State] State Company Employees 1: AK D 24 2: RI E 19 Slighly less elegant than using .SD, but a bit faster (for data with many groups): DT[DT[ , .I[which.min(Employees)], by = State]$V1] Also, just replace the expression which.min(Employees) with Employees == min(Employees), if your data … Read more

Count number of rows within each group

November 21, 2022 by Tarik

Current best practice (tidyverse) is: require(dplyr) df1 %>% count(Year, Month)