Updater script, asset files and also spark.binproto file added. #448

VickyTheViking · 2024-04-02T09:56:58Z

Hi, dear tsunami team.
Apache Spark has different Web UI based on the way which it is ran. After hours of search I found a way to run it in the way that I can access all web UIs. So in this pull request we have:

1- Master web UI
2- Worker web UI
3- Web interface (Runs only when a SparkContext is running)

Which are extracted in one run of each version . I used apache/spark as base docker image, because it covered more versions than the official _/spark docker repo which naturally do not differ from each other. and versions without docker image were ignored.

lokiuox · 2024-05-24T00:44:54Z

Hey @VickyTheViking, thanks for your contribution!

I'm reviewing your plugin but I've found that it's not working properly. Specifically, it looks like the fingerprinting for the main 8080 port is working, but not for the other two ports. I tried to do some quick troubleshooting, but I'm not familiar with Spark.

Here are the issues to fix:

update.sh should be in be in the spark/ folder, not spark/app
update.sh should have the executable bit set
The spark-worker container fails to start. Docker logs show the issue ERROR Utils: Failed to create directory /opt/spark/work which appears to be a permission issue. I was able to bypass it by adding user: root in the docker compose file, but I don't know if this is the correct way to fix such an issue. This seems to fix the issue with port 8081.
Nonetheless, the docker exec command fails, so the Python script is never executed and port 4040 remains unreachable.
Also the selected Spark docker images do not seem to have Python at all, ensure they are the correct images.

Feel free to reach out.

~ Savio (Doyensec)

VickyTheViking · 2024-06-25T10:33:57Z

Hi @lokiuox thank you for review.

I fixed some items you have told. but for the permission error you mentioned I searched for the best way to fix, the best way is what you did before setting user: root and because we only want to run spark to getting fingerprints then that's not have any security issue. the example I provided runs the spark core and because of infinite loop waits until close so in this time master and worker dashboard are visible to use via port 8080 and 8081 which indicates to master and worker UI and port 4040 which indicates to the dashboard UI.

Spark image does not have python but it has some java files which they can run python. for example I can run Fibonacci example with this command:

docker exec -d spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py

In this example we run the fib.py example with spark-submit.

lokiuox · 2024-07-08T17:54:49Z

Hey @VickyTheViking, thanks for the update, you still have to address the following issues:

Set user: root in the docker compose file to fix the spark-worker container
Fix the issue with the docker exec command failing

This is what I get when I try to manually reproduce the workflow and I launch the docker exec command:

$ docker exec -it spark-master /opt/spark/bin/spark-submit --master spark://spark-master:7077 /opt/spark/examples/src/main/python/fib.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
24/07/08 17:51:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at java.base/java.lang.ProcessBuilder.start(Unknown Source)
	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(Unknown Source)
	at java.base/java.lang.ProcessImpl.start(Unknown Source)
	... 16 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

lokiuox · 2024-11-07T11:41:11Z

Hi @VickyTheViking, are you still interested in contributing to Tsunami with this plugin?

VickyTheViking · 2024-11-22T20:25:48Z

@lokiuox Hi, sorry for being late. I think I can finish this plugin soon. so yes I am still interested in contributing Tsunami. please give some time to do this. thanks.

…f docker network use python3 included containers, otherwise install python3, python3-pip and pyspark python package

VickyTheViking · 2024-12-08T00:13:49Z

@lokiuox Hii

Set user: root in the docker compose file to fix the spark-worker container

done

Fix the issue with the docker exec command failing

done
I tried to optimize this. I utilized the spark containers with python3, which contains half of the versions, and I installed the required packages for the older versions.

I found a bug that made me change the hostname of the spark-master to 127.0.0.1. with this update, I haven't seen any error messages so far.

updater script, asset files and also fingerprint added.

7d7383e

maoning mentioned this pull request Apr 25, 2024

AI PRP: Request New Web Fingerprint for Spark #417

Open

a little fix

a87daa7

tooryx assigned VickyTheViking Jul 17, 2024

tooryx linked an issue Aug 6, 2024 that may be closed by this pull request

AI PRP: Request New Web Fingerprint for Spark #417

Open

tooryx added Contributor main The main issue a contributor is working on (top of the contribution queue). fingerprints labels Aug 6, 2024

user: root

e79f202

VickyTheViking force-pushed the spark branch from fac2fba to e79f202 Compare December 7, 2024 21:13

VickyTheViking added 4 commits December 7, 2024 23:26

spark-master to 127.0.0.1 because it is not resolvable from outside o…

edaff69

…f docker network use python3 included containers, otherwise install python3, python3-pip and pyspark python package

minor

075cb5a

fix bugs

58f6602

update binproto

9b55a10

VickyTheViking force-pushed the spark branch from 78c51dd to 9b55a10 Compare December 8, 2024 00:07

add new versions

bc9b70f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updater script, asset files and also spark.binproto file added. #448

Updater script, asset files and also spark.binproto file added. #448

VickyTheViking commented Apr 2, 2024

lokiuox commented May 24, 2024

VickyTheViking commented Jun 25, 2024

lokiuox commented Jul 8, 2024

lokiuox commented Nov 7, 2024

VickyTheViking commented Nov 22, 2024

VickyTheViking commented Dec 8, 2024 •

edited

Loading

Updater script, asset files and also spark.binproto file added. #448

Are you sure you want to change the base?

Updater script, asset files and also spark.binproto file added. #448

Conversation

VickyTheViking commented Apr 2, 2024

lokiuox commented May 24, 2024

VickyTheViking commented Jun 25, 2024

lokiuox commented Jul 8, 2024

lokiuox commented Nov 7, 2024

VickyTheViking commented Nov 22, 2024

VickyTheViking commented Dec 8, 2024 • edited Loading

VickyTheViking commented Dec 8, 2024 •

edited

Loading