pyspark.sql.functions.when
takes a Boolean Column as its condition. When using PySpark, it’s often useful to think “Column Expression” when you read “Column”.
Logical operations on PySpark columns use the bitwise operators:
&
forand
|
foror
~
fornot
When combining these with comparison operators such as <
, parenthesis are often needed.
In your case, the correct statement is:
import pyspark.sql.functions as F
df = df.withColumn('trueVal',
F.when((df.value < 1) | (df.value2 == 'false'), 0).otherwise(df.value))
See also: SPARK-8568