- pandas has three
.replace
methods:
pandas.Series.replace
for a column
pandas.Series.str.replace
for a column
pandas.DataFrame.replace
for multiple columns, and removes the need to use .apply
regex=False
is the default setting, so set regex=True
df[df.columns[1:]]
selects the last three columns.
- Tested in
python 3.11.4
, pandas 2.1.0
# replace values only in selected columns
df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)
# replace values in all columns
df = df.replace('[\$,]', '', regex=True).astype(float)
- Additional patterns for cleaning monetary columns:
'[^.0-9]'
: remove all non-digits except decimals points
'[^.0-9\-]'
: remove all non-digits except decimals points, and negative signs
'\D'
: remove all non-digits, including the decimal point, and negative sign, so it’s good for positive int only columns