Pandas, for each unique value in one column, get unique values in another column

Question

Here are two strategies to do it. No doubt, there are other ways.

Assuming your dataframe looks something like this (obviously with more columns):

df = pd.DataFrame({'author':['a', 'a', 'b'], 'subreddit':['sr1', 'sr2', 'sr2']})

>>> df
  author subreddit
0      a       sr1
1      a       sr2
2      b       sr2
...

SOLUTION 1: groupby

More straightforward than solution 2, and similar to your first attempt:

group = df.groupby('author')

df2 = group.apply(lambda x: x['subreddit'].unique())

# Alternatively, same thing as a one liner:
# df2 = df.groupby('author').apply(lambda x: x['subreddit'].unique())

Result:

>>> df2
author
a    [sr1, sr2]
b         [sr2]

The author is the index, and the single column is the list of all subreddits they are active in (this is how I interpreted how you wanted your output, according to your description).

If you wanted the subreddits each in a separate column, which might be more useable, depending on what you want to do with it, you could just do this after:

df2 = df2.apply(pd.Series)

Result:

>>> df2
          0    1
author          
a       sr1  sr2
b       sr2  NaN

Solution 2: Iterate through dataframe

you can make a new dataframe with all unique authors:

df2 = pd.DataFrame({'author':df.author.unique()})

And then just get the list of all unique subreddits they are active in, assigning it to a new column:

df2['subreddits'] = [list(set(df['subreddit'].loc[df['author'] == x['author']])) 
    for _, x in df2.iterrows()]

This gives you this:

>>> df2
  author  subreddits
0      a  [sr2, sr1]
1      b       [sr2]

Leave a Comment Cancel reply