Spark : Read file only if the path exists
You can filter out the irrelevant files as in @Psidom’s answer. In spark, the best way to do so is to use the internal spark hadoop configuration. Given that spark session variable is called “spark” you can do: import org.apache.hadoop.fs.FileSystem import org.apache.hadoop.fs.Path val hadoopfs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration) def testDirExist(path: String): Boolean = { val p … Read more