Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
README.md		README.md
docker-compose-hadoop.yml		docker-compose-hadoop.yml
docker-compose-services.yml		docker-compose-services.yml
docker-compose-spark.yml		docker-compose-spark.yml
hadoop.env		hadoop.env

README.md

Running Hadoop and Spark in Swarm cluster

Setup dnsmasq for local deployment

dnsmasq in required for local traefik setup. When deploying on real swarm cluster, this step is unnecessary, simply modify traefik setup to use your registered domain name.

Install dnsmasq:

sudo apt-get install dnsmasq

Inject local.host domain into dnsmasq and restart the service:

echo "address=/local.host/127.0.0.1" | sudo tee /etc/dnsmasq.d/workbench.conf
sudo systemctl restart dnsmasq

Check that it worked (both pings should work, resolves to 127.0.0.1):

ping local.host
ping namenode.local.host

Initial setup

Create an overlay network:

make network

Deploy traefik:

make traefik

Now navigate to localhost:8080 (or yourserver:8080) and check that traefik is running.

Deploying HDFS (without YARN)

There is no need to explicitly pull the images, however doing it this way you can see the download progress. In case of multiple server deployment, if you pull only on your swarm manager, the images still need to be pulled on other nodes. Pull the images:

docker-compose -f docker-compose-hadoop.yml pull

To deploy HDFS run:

make hadoop

Go to traefik again and check if hadoop is running, copy/paste generated domain name into browser and check if namenode/datanode is working as well.

Deploying Spark

Pull the images:

docker-compose -f docker-compose-spark.yml pull

Deploy Spark:

make spark

Deploying Apache Zeppelin and HDFS Filebrowser

Pull the images:

docker-compose -f docker-compose-services.yml pull

Deploy the services:

make services

Navigate to traefik and go to HDFS Filebrowser/Apache Zeppelin from there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swarm

swarm

README.md

Running Hadoop and Spark in Swarm cluster

Setup dnsmasq for local deployment

Initial setup

Deploying HDFS (without YARN)

Deploying Spark

Deploying Apache Zeppelin and HDFS Filebrowser

Files

swarm

Directory actions

More options

Directory actions

More options

Latest commit

History

swarm

Folders and files

parent directory

README.md

Running Hadoop and Spark in Swarm cluster

Setup dnsmasq for local deployment

Initial setup

Deploying HDFS (without YARN)

Deploying Spark

Deploying Apache Zeppelin and HDFS Filebrowser