1. The Cluster Manager is a long-running service, on which node it is running?
Cluster Manager is Master process in Spark standalone mode. It can be started anywhere by doing ./sbin/start-master.sh
, in YARN it would be Resource Manager.
2. Is it possible that the Master and the Driver nodes will be the same machine? I presume that there should be a rule somewhere stating that these two nodes should be different?
Master
is per cluster, and Driver
is per application. For standalone/yarn clusters, Spark currently supports two deploy modes.
- In client mode, the driver is launched in the same process as the client that submits the application.
- In cluster mode, however, for standalone, the driver is launched from one of the Worker & for yarn, it is launched inside application master node and the client process exits as soon as it fulfils its responsibility of submitting the application without waiting for the app to finish.
If an application submitted with --deploy-mode client
in Master node, both Master and Driver will be on the same node. check deployment of Spark application over YARN
3. In the case where the Driver node fails, who is responsible for re-launching the application? And what will happen exactly? i.e. how the Master node, Cluster Manager and Workers nodes will get involved (if they do), and in which order?
If the driver fails, all executors tasks will be killed for that submitted/triggered spark application.
4. In the case where the Master node fails, what will happen exactly and who is responsible for recovering from the failure?
Master node failures are handled in two ways.
-
Standby Masters with ZooKeeper:
Utilizing ZooKeeper to provide leader election and some state storage,
you can launch multiple Masters in your cluster connected to the same
ZooKeeper instance. One will be elected “leader” and the others will
remain in standby mode. If the current leader dies, another Master
will be elected, recover the old Master’s state, and then resume
scheduling. The entire recovery process (from the time the first
leader goes down) should take between 1 and 2 minutes. Note that this
delay only affects scheduling new applications – applications that
were already running during Master failover are unaffected. check here
for configurations -
Single-Node Recovery with Local File System:
ZooKeeper is the best way to go for production-level high
availability, but if you want to be able to restart the Master if
it goes down, FILESYSTEM mode can take care of it. When applications
and Workers register, they have enough state written to the provided
directory so that they can be recovered upon a restart of the Master
process. check here for conf and more details