Spark add new column to dataframe with value from previous row
You can use lag window function as follows from pyspark.sql.functions import lag, col from pyspark.sql.window import Window df = sc.parallelize([(4, 9.0), (3, 7.0), (2, 3.0), (1, 5.0)]).toDF([“id”, “num”]) w = Window().partitionBy().orderBy(col(“id”)) df.select(“*”, lag(“num”).over(w).alias(“new_col”)).na.drop().show() ## +—+—+——-+ ## | id|num|new_col| ## +—+—+——-| ## | 2|3.0| 5.0| ## | 3|7.0| 3.0| ## | 4|9.0| 7.0| ## +—+—+——-+ but … Read more