spark access first n rows – take vs limit

Although it still is answered, I want to share what I learned.

myDataFrame.take(10)

-> results in an Array of Rows.
This is an action and performs collecting the data (like collect does).

myDataFrame.limit(10)

-> results in a new Dataframe.
This is a transformation and does not perform collecting the data.

I do not have an explanation why then limit takes longer, but this may have been answered above. This is just a basic answer to what the difference is between take and limit.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)