Spark RDD – Mapping with extra arguments
You can use an anonymous function either directly in a flatMap json_data_rdd.flatMap(lambda j: processDataLine(j, arg1, arg2)) or to curry processDataLine f = lambda j: processDataLine(j, arg1, arg2) json_data_rdd.flatMap(f) You can generate processDataLine like this: def processDataLine(arg1, arg2): def _processDataLine(dataline): return … # Do something with dataline, arg1, arg2 return _processDataLine json_data_rdd.flatMap(processDataLine(arg1, arg2)) toolz library provides … Read more