- pandas has three
.replace methods:
pandas.Series.replace for a column
pandas.Series.str.replace for a column
pandas.DataFrame.replace for multiple columns, and removes the need to use .apply
regex=False is the default setting, so set regex=True
df[df.columns[1:]] selects the last three columns.
- Tested in
python 3.11.4, pandas 2.1.0
# replace values only in selected columns
df[df.columns[1:]] = df[df.columns[1:]].replace('[\$,]', '', regex=True).astype(float)
# replace values in all columns
df = df.replace('[\$,]', '', regex=True).astype(float)
- Additional patterns for cleaning monetary columns:
'[^.0-9]': remove all non-digits except decimals points
'[^.0-9\-]': remove all non-digits except decimals points, and negative signs
'\D': remove all non-digits, including the decimal point, and negative sign, so it’s good for positive int only columns