Apache Spark: Splitting Pair RDD into multiple RDDs by key to save values
I think this problem is similar to Write to multiple outputs by key Spark – one Spark job Please refer the answer there. import org.apache.hadoop.io.NullWritable import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] { override def generateActualKey(key: Any, value: Any): Any = NullWritable.get() override def generateFileNameForKeyValue(key: Any, value: Any, name: String): String … Read more