Pandas: change data type of Series to String

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype(‘str’) nor astype(str) work. As per the documentation, a Series can be converted to the string datatype in the following ways: df[‘id’] = df[‘id’].astype(“string”) df[‘id’] = pandas.Series(df[‘id’], dtype=”string”) df[‘id’] = pandas.Series(df[‘id’], dtype=pandas.StringDtype)

How to get the first column of a pandas DataFrame as a Series?

>>> import pandas as pd >>> df = pd.DataFrame({‘x’ : [1, 2, 3, 4], ‘y’ : [4, 5, 6, 7]}) >>> df x y 0 1 4 1 2 5 2 3 6 3 4 7 >>> s = df.ix[:,0] >>> type(s) <class ‘pandas.core.series.Series’> >>> =========================================================================== UPDATE If you’re reading this after June 2017, ix … Read more

Conditional Replace Pandas

.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by: mask = df.my_channel > 20000 column_name=”my_channel” df.loc[mask, column_name] = 0 Or, in one line, df.loc[df.my_channel > … Read more

Keep only date part when using pandas.to_datetime

Since version 0.15.0 this can now be easily done using .dt to access just the date component: df[‘just_date’] = df[‘dates’].dt.date The above returns a datetime.date dtype, if you want to have a datetime64 then you can just normalize the time component to midnight so it sets all the values to 00:00:00: df[‘normalised_date’] = df[‘dates’].dt.normalize() This … Read more

Combining two Series into a DataFrame in pandas

I think concat is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them): In [1]: s1 = pd.Series([1, 2], index=[‘A’, ‘B’], name=”s1″) In [2]: s2 = pd.Series([3, 4], index=[‘A’, ‘B’], name=”s2″) In [3]: pd.concat([s1, s2], axis=1) Out[3]: s1 … Read more