How do I read a parquet in PySpark written from Spark?

Question

I use the following two ways to read the parquet file:

Initialize Spark Session:

from pyspark.sql import SparkSession
spark = SparkSession.builder \
    .master('local') \
    .appName('myAppName') \
    .config('spark.executor.memory', '5gb') \
    .config("spark.cores.max", "6") \
    .getOrCreate()

Method 1:

df = spark.read.parquet('path-to-file/commentClusters.parquet')

Method 2:

sc = spark.sparkContext

# using SQLContext to read parquet file
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

# read parquet file
df = sqlContext.read.parquet('path-to-file/commentClusters.parquet')

Leave a Comment Cancel reply