cluster-computing – Page 2

How to add a new node to my Elasticsearch cluster

September 2, 2023 by Tarik

TIPS TO ADD ANOTHER NODE: 1) VERSIONS: It is a good advise to check all of your nodes for the status: http://elastic-node1:9200/ Keep in mind that in most cases: VERSION NEED TO BE THE SAME, EVEN MINOR { “name” : “node2”, “cluster_name” : “xxxxxxxxxxx”, “cluster_uuid” : “n-xxxxxxxxxxxxxxx”, “version” : { “number” : “5.2.2”, “build_hash” : … Read more

How to submit a job to any [subset] of nodes from nodelist in SLURM?

August 31, 2023 by Tarik

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use: sbatch –exclude=myCluster[01-09] myScript.sh and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows … Read more

What does Apache Mesos actually do?

July 27, 2023 by Tarik

Your summary is almost right but it does not reflect the essence of what mesos represents. The vision of mesosphere, the Company behind the project, is to create a “Datacenter Operating System” and the mesos is the kernel of it in analogy to the kernel of a normal OS. The API is not limited to … Read more

What’s the difference between Cluster and Instance in AWS Aurora RDS

July 19, 2023 by Tarik

“OSError: [Errno 17] File exists” when trying to use os.makedirs [duplicate]

June 29, 2023 by Tarik

As of Python >=3.2, os.makedirs() can take a third optional argument exist_ok: os.makedirs(mydir, exist_ok=True)

How to fix symbol lookup error: undefined symbol errors in a cluster environment

June 9, 2023 by Tarik

After two dozens of comments to understand the situation, it was found that the libhdf5.so.7 was actually a symlink (with several levels of indirection) to a file that was not shared between the queued processes and the interactive processes. This means even though the symlink itself lies on a shared filesystem, the contents of the … Read more

Singleton in Cluster environment

June 9, 2023 by Tarik

Replace your singleton cache with a distributed cache. One such cache could be JBoss Infinispan but I’m sure that other distributed cache and grid technologies exist, including commercial ones which are probably more mature at this point. For singleton objects in general, I’m not sure. I think I’d try to not have singletons in the … Read more

How to set amount of Spark executors?

May 31, 2023 by Tarik

In Spark 2.0+ version use spark session variable to set number of executors dynamically (from within program) spark.conf.set(“spark.executor.instances”, 4) spark.conf.set(“spark.executor.cores”, 4) In above case maximum 16 tasks will be executed at any given time. other option is dynamic allocation of executors as below – spark.conf.set(“spark.dynamicAllocation.enabled”, “true”) spark.conf.set(“spark.executor.cores”, 4) spark.conf.set(“spark.dynamicAllocation.minExecutors”,”1″) spark.conf.set(“spark.dynamicAllocation.maxExecutors”,”5″) This was you can let … Read more

Easy way to use parallel options of scikit-learn functions on HPC

May 29, 2023 by Tarik

SKLearn manages its parallelism with Joblib. Joblib can swap out the multiprocessing backend for other distributed systems like dask.distributed or IPython Parallel. See this issue on the sklearn github page for details. Example using Joblib with Dask.distributed Code taken from the issue page linked above. from sklearn.externals.joblib import parallel_backend search = RandomizedSearchCV(model, param_space, cv=10, n_iter=1000, … Read more

What are the differences between a node, a cluster and a datacenter in a cassandra nosql database?

May 24, 2023 by Tarik

The hierarchy of elements in Cassandra is: Cluster Data center(s) Rack(s) Server(s) Node (more accurately, a vnode) A Cluster is a collection of Data Centers. A Data Center is a collection of Racks. A Rack is a collection of Servers. A Server contains 256 virtual nodes (or vnodes) by default. A vnode is the data … Read more