label-encoder encoding missing values

Question

Don’t use LabelEncoder with missing values. I don’t know which version of scikit-learn you’re using, but in 0.17.1 your code raises TypeError: unorderable types: str() > float().

As you can see in the source it uses numpy.unique against the data to encode, which raises TypeError if missing values are found. If you want to encode missing values, first change its type to a string:

a[pd.isnull(a)]  = 'NaN'

Leave a Comment Cancel reply