Count unique values per groups with Pandas [duplicate]

You need nunique: df = df.groupby(‘domain’)[‘ID’].nunique() print (df) domain ‘facebook.com’ 1 ‘google.com’ 1 ‘twitter.com’ 2 ‘vk.com’ 3 Name: ID, dtype: int64 If you need to strip ‘ characters: df = df.ID.groupby([df.domain.str.strip(“‘”)]).nunique() print (df) domain facebook.com 1 google.com 1 twitter.com 2 vk.com 3 Name: ID, dtype: int64 Or as Jon Clements commented: df.groupby(df.domain.str.strip(“‘”))[‘ID’].nunique() You can retain … Read more

GroupBy pandas DataFrame and select most common value

Pandas >= 0.16 pd.Series.mode is available! Use groupby, GroupBy.agg, and apply the pd.Series.mode function to each group: source.groupby([‘Country’,’City’])[‘Short name’].agg(pd.Series.mode) Country City Russia Sankt-Petersburg Spb USA New-York NY Name: Short name, dtype: object If this is needed as a DataFrame, use source.groupby([‘Country’,’City’])[‘Short name’].agg(pd.Series.mode).to_frame() Short name Country City Russia Sankt-Petersburg Spb USA New-York NY The useful thing … Read more

Python group by

Do it in 2 steps. First, create a dictionary. >>> input = [(‘11013331’, ‘KAT’), (‘9085267’, ‘NOT’), (‘5238761’, ‘ETH’), (‘5349618’, ‘ETH’), (‘11788544’, ‘NOT’), (‘962142’, ‘ETH’), (‘7795297’, ‘ETH’), (‘7341464’, ‘ETH’), (‘9843236’, ‘KAT’), (‘5594916’, ‘ETH’), (‘1550003’, ‘ETH’)] >>> from collections import defaultdict >>> res = defaultdict(list) >>> for v, k in input: res[k].append(v) … Then, convert that dictionary … Read more

SQL query to group by day

if you’re using SQL Server, dateadd(DAY,0, datediff(day,0, created)) will return the day created for example, if the sale created on ‘2009-11-02 06:12:55.000’, dateadd(DAY,0, datediff(day,0, created)) return ‘2009-11-02 00:00:00.000’ select sum(amount) as total, dateadd(DAY,0, datediff(day,0, created)) as created from sales group by dateadd(DAY,0, datediff(day,0, created))

Naming returned columns in Pandas aggregate function? [duplicate]

For pandas >= 0.25 The functionality to name returned aggregate columns has been reintroduced in the master branch and is targeted for pandas 0.25. The new syntax is .agg(new_col_name=(‘col_name’, ‘agg_func’). Detailed example from the PR linked above: In [2]: df = pd.DataFrame({‘kind’: [‘cat’, ‘dog’, ‘cat’, ‘dog’], …: ‘height’: [9.1, 6.0, 9.5, 34.0], …: ‘weight’: [7.9, … Read more

SQL – using alias in Group By

SQL is implemented as if a query was executed in the following order: FROM clause WHERE clause GROUP BY clause HAVING clause SELECT clause ORDER BY clause For most relational database systems, this order explains which names (columns or aliases) are valid because they must have been introduced in a previous step. So in Oracle … Read more

How to access pandas groupby dataframe by key

You can use the get_group method: In [21]: gb.get_group(‘foo’) Out[21]: A B C 0 foo 1.624345 5 2 foo -0.528172 11 4 foo 0.865408 14 Note: This doesn’t require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb)). This … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)