Pyspark: Convert column to lowercase

Import lower alongside col:

from pyspark.sql.functions import lower, col

Combine them together using lower(col("bla")). In a complete query:

spark.table('bla').select(lower(col('bla')).alias('bla'))

which is equivalent to the SQL query

SELECT lower(bla) AS bla FROM bla

To keep the other columns, do

spark.table('foo').withColumn('bar', lower(col('bar')))

Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)