When is inplace in Pandas faster?

And when is the `inplace` argument misleading?

Date

Mar 27, 2023

panda

Photo by Kerry Hu on Unsplash

Several methods for the Pandas DataFrame support an inplace argument. You can find quite contradicting advice around it online. Some describe it as "good practice", the Pandas docs says "its use is discouraged".

The inplace=True keyword has been a controversial topic for many years. It is generally seen (at least by several pandas maintainers and educators) as bad practice and often unnecessary, but at the same time it is also widely used, partly because of confusion around the impact of the keyword.

PDEP-8 / Motivation and Scope

TLDR

Operations that re-arrange the rows of a DataFrame can't be executed without copying. Avoid the inplace argument for these methods.

Ways of Changing Data

Pandas provides many different methods for a DataFrame. Let's group them based on the type of change. Then consider whether they can be executed without creating a copy of the DataFrame object.

operation example methods possible without a copy
add rows merge, append
re-arrange rows sort_values, sort_index
delete rows from the end pop
delete rows from arbitrary location dropna, query, truncate
add columns insert
delete columns drop (columns)
mutate the elements update, where, fillna, replace

Note the difference between:

  • removing rows from the end of a DataFrame
  • removing rows from an arbitrary position

Removing rows from the end of a DataFrame can happen without creating a copy: pop operates default in place.

Removing rows from an arbitrary position re-arranges some of the remaining rows. => Here, a copy is needed. Methods like dropna can't operate in place.

removing a row from a DataFrame

The problem is that there are several methods that

  • always make a copy even when inplace=True
  • still provide an inplace option

Examples for this include dropna, sort_values, query.

Coming Soon

The Pandas Enhancement Proposal, PDEP-8, suggests the removal of the inplace argument for such DataFrame methods.

In the PDEP, the Pandas methods that provide some inplace functionality are grouped into 4 groups. Group 4 contains the methods that provide an inplace or copy option, although they always need to create a copy. The PDEP suggests the removal of these options.

For Now

Even if these inplace options aren't deprecated currently, it's better to avoid them for these "Group 4" methods.

  • They don't bring any performance gains.
  • The the mutated DataFrame object makes the code more confusing and prone to bugs 🐛.
  • The return value None is confusing and hinders method chaining.

The "Group 4" methods where the usage of inplace is possible but discouraged:

  • drop
  • dropna
  • drop_duplicates
  • sort_values
  • sort_index
  • eval
  • query

For methods that can happen with or without copying the DataFrame object, the inplace argument is OK and can bring performance gains. Examples are fillna and where.

Sourcery has introduced the pandas-avoid-inplace rule for the "Group 4" methods in the versions:

  • 1.0.10b10 (released on 2023-03-24)
  • 1.1.0 (released on 2023-03-28)

Related Sources