When is inplace in Pandas faster?

And when is the `inplace` argument misleading?

Written by Reka Horvath on

panda

Photo by Kerry Hu on Unsplash

Several methods for the Pandas DataFrame support an inplace argument. You can find quite contradicting advice around it online. Some describe it as “good practice”, the Pandas docs says “its use is discouraged”.

The inplace=True keyword has been a controversial topic for many years. It is generally seen (at least by several pandas maintainers and educators) as bad practice and often unnecessary, but at the same time it is also widely used, partly because of confusion around the impact of the keyword.

PDEP-8 / Motivation and Scope

TLDR

Operations that re-arrange the rows of a DataFrame can’t be executed without copying. Avoid the inplace argument for these methods.

Ways of Changing Data

Pandas provides many different methods for a DataFrame. Let’s group them based on the type of change. Then consider whether they can be executed without creating a copy of the DataFrame object.

operationexample methodspossible without a copy
add rowsmerge, append
re-arrange rowssort_values, sort_index
delete rows from the endpop
delete rows from arbitrary locationdropna, query, truncate
add columnsinsert
delete columnsdrop (columns)
mutate the elementsupdate, where, fillna, replace

Note the difference between:

Removing rows from the end of a DataFrame can happen without creating a copy: pop operates default in place.

Removing rows from an arbitrary position re-arranges some of the remaining rows. => Here, a copy is needed. Methods like dropna can’t operate in place.

removing a row from a DataFrame

The problem is that there are several methods that

Examples for this include dropna, sort_values, query.

Coming Soon

The Pandas Enhancement Proposal, PDEP-8, suggests the removal of the inplace argument for such DataFrame methods.

In the PDEP, the Pandas methods that provide some inplace functionality are grouped into 4 groups. Group 4 contains the methods that provide an inplace or copy option, although they always need to create a copy. The PDEP suggests the removal of these options.

For Now

Even if these inplace options aren’t deprecated currently, it’s better to avoid them for these “Group 4” methods.

The “Group 4” methods where the usage of inplace is possible but discouraged:

For methods that can happen with or without copying the DataFrame object, the inplace argument is OK and can bring performance gains. Examples are fillna and where.

Sourcery has introduced the pandas-avoid-inplace rule for the “Group 4” methods in the versions: