In Python, You can shuffle the rows and then take the top ones:
import org.apache.spark.sql.functions.rand
dataset.orderBy(rand()).limit(n)
In Python, You can shuffle the rows and then take the top ones:
import org.apache.spark.sql.functions.rand
dataset.orderBy(rand()).limit(n)