Truthy and falsy values, None, and comparison in Python and Pandas.
May 19, 2023
Photo by Evan Buchholz on Unsplash
Missing data are a frequent source of headache (and bugs 🐛). Often, it's far from obvious whether a value can be empty. And if it can, it usually means introducing several conditionals in the code and edge cases in the tests.
To make these conditionals checking for missing data more concise, many programming languages, incl. Python, have a concept of "truthy" and "falsy" values. Thanks to this, various non-boolean data types can be interpreted in boolean contexts. It's important to distinguish between:
True
or False
values: These are boolean.By default, the following are considered "falsy" in Python:
None
and False
.0
, 0.0
, 0j
, Decimal(0)
, Fraction(0, 1)
''
, ()
, []
, {}
, set()
, range(0)
Any other value is considered "truthy".
The concept of truthy and falsy values has a big benefit: It allows you to use non-boolean expressions in conditions and other boolean operations. This makes the code more concise.
data = []
if data:
print("Data is truthy!") # This won't print.
else:
print("Data is falsy!") # This will print.
data
is an empty list, which is considered falsy. => The else
statement gets
executed.
The concise comparison above has one big assumption: That all falsy values
should produce the same behavior. It doesn't work anymore if a falsy value has a
special meaning. E.g. if the code needs to work differently for None
and an
empty string.
This is usually a bad practice, because it can easily lead to confusion and bugs 🐛. For example, Django's documentation discourages using multiple values for "no data":
Avoid using null on string-based fields such as CharField and TextField. If a string-based field has null=True, that means it has two possible values for “no data”: NULL, and the empty string. In most cases, it’s redundant to have two possible values for “no data;” the Django convention is to use the empty string, not NULL.
Django 4.2 Documentation / Model field reference / Field options
What if (after considering the trade-offs above) you've decided to give a
special meaning to a falsy value? How to compare whether a value is actually
None
?
There are 2 ways to achieve this:
==
) operator ❌is
✅PEP 8 has a clear recommendation:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
The problem with using ==
is that it's possible for a class to override the
__eq__
method (which determines the behavior of ==
), which could lead to
unexpected results.
class AlwaysEqual:
def __eq__(self, other):
return True
object = AlwaysEqual()
print(object == None) # prints: True
print(object is None) # prints: False
The AlwaysEqual
class overrides the __eq__
method to always return True.
Therefore, even though the object isn't None
, when compared to None
using
==
, it returns True
.
isna
Similarly to the comparison to None
in Python, there are 2 ways to detect
missing values in Pandas. And one is clearly preferred:
numpy.nan
with the equality operator ==
❌isna
or isnull
function ✅An equality check with the ==
equals operator, like df['column'] == np.nan
,
behaves differently than what you might expect. This stems from a peculiar
property of numpy.nan
: It is not considered equal to any value, even itself.
(Note that this is a difference to Python's None
. It's a singleton, so
None==None
returns True
.)
Let's consider this DataFrame
with 2 columns A
and B
as an example:
data = {"A": [1, 2, np.nan, 4], "B": [9, 10, 11, 12]}
df = pd.DataFrame(data)
== numpy.nan
print(df["A"] == np.nan)
The output:
0 False
1 False
2 False
3 False
Name: A, dtype: bool
The returned value is always False
, even for np.nan
.
isna()
print(df["A"].isna())
The output:
0 False
1 False
2 True
3 False
Name: A, dtype: bool
The returned value is:
True
for np.nan
False
for every other value.For more info, check out Pandas Docs / Missing Data
Dealing with missing and empty values is tricky. In this post, we've discussed 3 guidelines, that make it less error-prone:
None
in Python, use is
or is not
.isna
or isnull
functions.