In pandas, is inplace = True considered harmful, or not?
Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace argument be deprecated api-wide sometime in the near future. In a nutshell, here’s everything wrong with the inplace argument:
inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplacedoes not work with method chaininginplacecan lead to the dreadedSettingWithCopyWarningwhen called on a DataFrame column, and may sometimes fail to update the column in-place
The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.
We take a look at the points above in more depth.
Performance
It is a common misconception that using inplace=True will lead to more efficient or optimized code. In general, there are no performance benefits to using inplace=True (but there are rare exceptions which are mostly a result of implementation detail in the library and should not be used as a crutch to advocate for this argument’s usage). Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.
Method Chaining
inplace=True also hinders method chaining. Contrast the working of
result = df.some_function1().reset_index().some_function2()
As opposed to
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
Unintended Pitfalls
One final caveat to keep in mind is that calling inplace=True can trigger the SettingWithCopyWarning:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Which can cause unexpected behavior.