You can import the csv file into a dataframe with a predefined schema. The way you define a schema is by using the StructType
and StructField
objects. Assuming your data is all IntegerType
data:
from pyspark.sql.types import StructType, StructField, IntegerType
schema = StructType([
StructField("member_srl", IntegerType(), True),
StructField("click_day", IntegerType(), True),
StructField("productid", IntegerType(), True)])
df = spark.read.csv("user_click_seq.csv",header=False,schema=schema)
should work.