-
Explicit is better than implicit.
df[boolean_mask]
selects rows whereboolean_mask
is True, but there is a corner case when you might not want it to: whendf
has boolean-valued column labels:In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df Out[229]: False True 0 3 1 1 4 2 2 5 3
You might want to use
df[[True]]
to select theTrue
column. Instead it raises aValueError
:In [230]: df[[True]] ValueError: Item wrong length 1 instead of 3.
Versus using
loc
:In [231]: df.loc[[True]] Out[231]: False True 0 3 1
In contrast, the following does not raise
ValueError
even though the structure ofdf2
is almost the same asdf1
above:In [258]: df2 = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]}); df2 Out[258]: A B 0 1 3 1 2 4 2 3 5 In [259]: df2[['B']] Out[259]: B 0 3 1 4 2 5
Thus,
df[boolean_mask]
does not always behave the same asdf.loc[boolean_mask]
. Even though this is arguably an unlikely use case, I would recommend always usingdf.loc[boolean_mask]
instead ofdf[boolean_mask]
because the meaning ofdf.loc
‘s syntax is explicit. Withdf.loc[indexer]
you know automatically thatdf.loc
is selecting rows. In contrast, it is not clear ifdf[indexer]
will select rows or columns (or raiseValueError
) without knowing details aboutindexer
anddf
. -
df.loc[row_indexer, column_index]
can select rows and columns.df[indexer]
can only select rows or columns depending on the type of values inindexer
and the type of column valuesdf
has (again, are they boolean?).In [237]: df2.loc[[True,False,True], 'B'] Out[237]: 0 3 2 5 Name: B, dtype: int64
-
When a slice is passed to
df.loc
the end-points are included in the range. When a slice is passed todf[...]
, the slice is interpreted as a half-open interval:In [239]: df2.loc[1:2] Out[239]: A B 1 2 4 2 3 5 In [271]: df2[1:2] Out[271]: A B 1 2 4