I use the following two ways to read the parquet file:
Initialize Spark Session:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('local') \
.appName('myAppName') \
.config('spark.executor.memory', '5gb') \
.config("spark.cores.max", "6") \
.getOrCreate()
Method 1:
df = spark.read.parquet('path-to-file/commentClusters.parquet')
Method 2:
sc = spark.sparkContext
# using SQLContext to read parquet file
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
# read parquet file
df = sqlContext.read.parquet('path-to-file/commentClusters.parquet')