And when is the `inplace` argument misleading?
Mar 27, 2023
Several methods for the Pandas DataFrame
support an inplace
argument. You
can find quite contradicting advice around it online. Some describe it as "good
practice", the
Pandas docs says
"its use is discouraged".
The
inplace=True
keyword has been a controversial topic for many years. It is generally seen (at least by several pandas maintainers and educators) as bad practice and often unnecessary, but at the same time it is also widely used, partly because of confusion around the impact of the keyword.
PDEP-8 / Motivation and Scope
Operations that re-arrange the rows of a DataFrame
can't be executed without
copying. Avoid the inplace
argument for these methods.
Pandas provides many different methods for a DataFrame
. Let's group them based
on the type of change. Then consider whether they can be executed without
creating a copy of the DataFrame
object.
operation | example methods | possible without a copy |
---|---|---|
add rows | merge , append |
❌ |
re-arrange rows | sort_values , sort_index |
❌ |
delete rows from the end | pop |
✅ |
delete rows from arbitrary location | dropna , query , truncate |
❌ |
add columns | insert |
✅ |
delete columns | drop (columns) |
✅ |
mutate the elements | update , where , fillna , replace |
✅ |
Note the difference between:
DataFrame
Removing rows from the end of a DataFrame
can happen without creating a copy:
pop
operates default in place.
Removing rows from an arbitrary position re-arranges some of the remaining rows.
=> Here, a copy is needed. Methods like dropna
can't operate in place.
The problem is that there are several methods that
inplace=True
inplace
optionExamples for this include dropna
, sort_values
, query
.
The Pandas Enhancement Proposal,
PDEP-8, suggests the removal
of the inplace
argument for such DataFrame
methods.
In the PDEP, the Pandas methods that provide some inplace functionality are
grouped into 4 groups. Group 4 contains the methods that provide an inplace
or
copy
option, although they always need to create a copy. The PDEP suggests the
removal of these options.
Even if these inplace
options aren't deprecated currently, it's better to
avoid them for these "Group 4" methods.
DataFrame
object makes the code more confusing and prone to
bugs 🐛.None
is confusing and hinders method chaining.The "Group 4" methods where the usage of inplace
is possible but discouraged:
drop
dropna
drop_duplicates
sort_values
sort_index
eval
query
For methods that can happen with or without copying the DataFrame
object, the
inplace
argument is OK and can bring performance gains. Examples are fillna
and where
.
Sourcery has introduced the
pandas-avoid-inplace
rule
for the "Group 4" methods in the versions: