Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updater script, asset files and also spark.binproto file added. #448

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

VickyTheViking
Copy link

Hi, dear tsunami team.
Apache Spark has different Web UI based on the way which it is ran. After hours of search I found a way to run it in the way that I can access all web UIs. So in this pull request we have:

1- Master web UI
2- Worker web UI
3- Web interface (Runs only when a SparkContext is running)

Which are extracted in one run of each version . I used apache/spark as base docker image, because it covered more versions than the official _/spark docker repo which naturally do not differ from each other. and versions without docker image were ignored.

@lokiuox
Copy link
Collaborator

lokiuox commented May 24, 2024

Hey @VickyTheViking, thanks for your contribution!

I'm reviewing your plugin but I've found that it's not working properly. Specifically, it looks like the fingerprinting for the main 8080 port is working, but not for the other two ports. I tried to do some quick troubleshooting, but I'm not familiar with Spark.

Here are the issues to fix:

  • update.sh should be in be in the spark/ folder, not spark/app
  • update.sh should have the executable bit set
  • The spark-worker container fails to start. Docker logs show the issue ERROR Utils: Failed to create directory /opt/spark/work which appears to be a permission issue. I was able to bypass it by adding user: root in the docker compose file, but I don't know if this is the correct way to fix such an issue. This seems to fix the issue with port 8081.
  • Nonetheless, the docker exec command fails, so the Python script is never executed and port 4040 remains unreachable.
  • Also the selected Spark docker images do not seem to have Python at all, ensure they are the correct images.

Feel free to reach out.

~ Savio (Doyensec)

@VickyTheViking
Copy link
Author

Hi @lokiuox thank you for review.

I fixed some items you have told. but for the permission error you mentioned I searched for the best way to fix, the best way is what you did before setting user: root and because we only want to run spark to getting fingerprints then that's not have any security issue. the example I provided runs the spark core and because of infinite loop waits until close so in this time master and worker dashboard are visible to use via port 8080 and 8081 which indicates to master and worker UI and port 4040 which indicates to the dashboard UI.

Spark image does not have python but it has some java files which they can run python. for example I can run Fibonacci example with this command:

docker exec -d spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py

In this example we run the fib.py example with spark-submit.

@lokiuox
Copy link
Collaborator

lokiuox commented Jul 8, 2024

Hey @VickyTheViking, thanks for the update, you still have to address the following issues:

  • Set user: root in the docker compose file to fix the spark-worker container
  • Fix the issue with the docker exec command failing

This is what I get when I try to manually reproduce the workflow and I launch the docker exec command:

$ docker exec -it spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
24/07/08 17:51:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(Unknown Source)
	at java.base/java.lang.ProcessImpl.start(Unknown Source)
	... 16 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

@tooryx tooryx linked an issue Aug 6, 2024 that may be closed by this pull request
@tooryx tooryx added Contributor main The main issue a contributor is working on (top of the contribution queue). fingerprints labels Aug 6, 2024
@lokiuox
Copy link
Collaborator

lokiuox commented Nov 7, 2024

Hi @VickyTheViking, are you still interested in contributing to Tsunami with this plugin?

@VickyTheViking
Copy link
Author

@lokiuox Hi, sorry for being late. I think I can finish this plugin soon. so yes I am still interested in contributing Tsunami. please give some time to do this. thanks.

…f docker network

use python3 included containers, otherwise install python3, python3-pip and pyspark python package
@VickyTheViking
Copy link
Author

VickyTheViking commented Dec 8, 2024

@lokiuox Hii

Set user: root in the docker compose file to fix the spark-worker container

done

Fix the issue with the docker exec command failing

done
I tried to optimize this. I utilized the spark containers with python3, which contains half of the versions, and I installed the required packages for the older versions.

I found a bug that made me change the hostname of the spark-master to 127.0.0.1. with this update, I haven't seen any error messages so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributor main The main issue a contributor is working on (top of the contribution queue). fingerprints
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AI PRP: Request New Web Fingerprint for Spark
3 participants