What are workers, executors, cores in Spark Standalone cluster?

Spark uses a master/slave architecture. As you can see in the figure, it has one central coordinator (Driver) that communicates with many distributed workers (executors). The driver and each of the executors run in their own Java processes. DRIVER The driver is the process where the main method runs. First it converts the user program … Read more

What is the difference between map and flatMap and a good use case for each?

Here is an example of the difference, as a spark-shell session: First, some data – two lines of text: val rdd = sc.parallelize(Seq(“Roses are red”, “Violets are blue”)) // lines rdd.collect res0: Array[String] = Array(“Roses are red”, “Violets are blue”) Now, map transforms an RDD of length N into another RDD of length N. For … Read more

How to change dataframe column names in PySpark?

There are many ways to do that: Option 1. Using selectExpr. data = sqlContext.createDataFrame([(“Alberto”, 2), (“Dakota”, 2)], [“Name”, “askdaosdka”]) data.show() data.printSchema() # Output #+——-+———-+ #| Name|askdaosdka| #+——-+———-+ #|Alberto| 2| #| Dakota| 2| #+——-+———-+ #root # |– Name: string (nullable = true) # |– askdaosdka: long (nullable = true) df = data.selectExpr(“Name as name”, “askdaosdka as … Read more

Difference between DataFrame, Dataset, and RDD in Spark

First thing is DataFrame was evolved from SchemaRDD. Yes.. conversion between Dataframe and RDD is absolutely possible. Below are some sample code snippets. df.rdd is RDD[Row] Below are some of options to create dataframe. 1) yourrddOffrow.toDF converts to DataFrame. 2) Using createDataFrame of sql context val df = spark.createDataFrame(rddOfRow, schema) where schema can be from … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)