How to use AND or OR condition in when in Spark

pyspark.sql.functions.when takes a Boolean Column as its condition. When using PySpark, it’s often useful to think “Column Expression” when you read “Column”.

Logical operations on PySpark columns use the bitwise operators:

  • & for and
  • | for or
  • ~ for not

When combining these with comparison operators such as <, parenthesis are often needed.

In your case, the correct statement is:

import pyspark.sql.functions as F
df = df.withColumn('trueVal',
    F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))

See also: SPARK-8568

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)