How to get maximum length of each column in the data frame using pandas python

Question

One solution is to use numpy.vectorize. This may be more efficient than pandas-based solutions.

You can use pd.DataFrame.select_dtypes to select object columns.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['abc', 'de', 'abcd'],
                   'B': ['a', 'abcde', 'abc'],
                   'C': [1, 2.5, 1.5]})

measurer = np.vectorize(len)

Max length for all columns

res1 = measurer(df.values.astype(str)).max(axis=0)

array([4, 5, 3])

Max length for object columns

res2 = measurer(df.select_dtypes(include=[object]).values.astype(str)).max(axis=0)

array([4, 5])

Or if you need output as a dictionary:

res1 = dict(zip(df, measurer(df.values.astype(str)).max(axis=0)))

{'A': 4, 'B': 5, 'C': 3}

df_object = df.select_dtypes(include=[object])
res2 = dict(zip(df_object, measurer(df_object.values.astype(str)).max(axis=0)))

{'A': 4, 'B': 5}

Leave a Comment Cancel reply