### Two major differences between `apply`

and `transform`

There are two major differences between the `transform`

and `apply`

groupby methods.

**Input**:`apply`

implicitly passes all the columns for each group as a**DataFrame**to the custom function.- while
`transform`

passes each column for each group individually as a**Series**to the custom function.

**Output**:- The custom function passed to
.`apply`

can return a scalar, or a Series or DataFrame (or numpy array or even list) - The custom function passed to
(a one dimensional Series, array or list)`transform`

must return a sequence**the same length as the group**.

- The custom function passed to

So, `transform`

works on just one Series at a time and `apply`

works on the entire DataFrame at once.

### Inspecting the custom function

It can help quite a bit to inspect the input to your custom function passed to `apply`

or `transform`

.

### Examples

Let’s create some sample data and inspect the groups so that you can see what I am talking about:

```
import pandas as pd
import numpy as np
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'],
'a':[4,5,1,3], 'b':[6,10,3,11]})
State a b
0 Texas 4 6
1 Texas 5 10
2 Florida 1 3
3 Florida 3 11
```

Let’s create a simple custom function that prints out the type of the implicitly passed object and then raises an exception so that execution can be stopped.

```
def inspect(x):
print(type(x))
raise
```

Now let’s pass this function to both the groupby `apply`

and `transform`

methods to see what object is passed to it:

```
df.groupby('State').apply(inspect)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
RuntimeError
```

As you can see, a DataFrame is passed into the `inspect`

function. You might be wondering why the type, DataFrame, got printed out twice. Pandas runs the first group twice. It does this to determine if there is a fast way to complete the computation or not. This is a minor detail that you shouldn’t worry about.

Now, let’s do the same thing with `transform`

```
df.groupby('State').transform(inspect)
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
RuntimeError
```

It is passed a Series – a totally different Pandas object.

So, `transform`

is only allowed to work with a single Series at a time. It is impossible for it to act on two columns at the same time. So, if we try and subtract column `a`

from `b`

inside of our custom function we would get an error with `transform`

. See below:

```
def subtract_two(x):
return x['a'] - x['b']
df.groupby('State').transform(subtract_two)
KeyError: ('a', 'occurred at index a')
```

We get a KeyError as pandas is attempting to find the Series index `a`

which does not exist. You can complete this operation with `apply`

as it has the entire DataFrame:

```
df.groupby('State').apply(subtract_two)
State
Florida 2 -2
3 -8
Texas 0 -2
1 -5
dtype: int64
```

The output is a Series and a little confusing as the original index is kept, but we have access to all columns.

### Displaying the passed pandas object

It can help even more to display the entire pandas object within the custom function, so you can see exactly what you are operating with. You can use `print`

statements by I like to use the `display`

function from the `IPython.display`

module so that the DataFrames get nicely outputted in HTML in a jupyter notebook:

```
from IPython.display import display
def subtract_two(x):
display(x)
return x['a'] - x['b']
```

Screenshot:

### Transform must return a single dimensional sequence the same size as the group

The other difference is that `transform`

must return a single dimensional sequence the same size as the group. In this particular instance, each group has two rows, so `transform`

must return a sequence of two rows. If it does not then an error is raised:

```
def return_three(x):
return np.array([1, 2, 3])
df.groupby('State').transform(return_three)
ValueError: transform must return a scalar value for each group
```

The error message is not really descriptive of the problem. You must return a sequence the same length as the group. So, a function like this would work:

```
def rand_group_len(x):
return np.random.rand(len(x))
df.groupby('State').transform(rand_group_len)
a b
0 0.962070 0.151440
1 0.440956 0.782176
2 0.642218 0.483257
3 0.056047 0.238208
```

### Returning a single scalar object also works for `transform`

If you return just a single scalar from your custom function, then `transform`

will use it for each of the rows in the group:

```
def group_sum(x):
return x.sum()
df.groupby('State').transform(group_sum)
a b
0 9 16
1 9 16
2 4 14
3 4 14
```