Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoClassDefFoundError: Could not initialize class org.apache.jena.riot.system.RiotLib #6

Open
JNKHunter opened this issue May 28, 2020 · 9 comments
Assignees
Milestone

Comments

@JNKHunter
Copy link

Hello,

When running the example on a Spark cluster using 'spark-submit', the following error is encountered. Any ideas what might be causing this?

Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.jena.riot.system.RiotLib
	at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:135)
	at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$1.apply(NTripleReader.scala:118)
	at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance$lzycompute(NTripleReader.scala:207)
	at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.instance(NTripleReader.scala:207)
	at net.sansa_stack.rdf.spark.io.NonSerializableObjectWrapper.get(NTripleReader.scala:209)
	at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:148)
	at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:140)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
@LorenzBuehmann
Copy link
Member

Hi,
did you use the latest 0.7.1 template? Or maybe can you just paste your POM file here? The idea of this Maven template was just to show how one can add the SANSA artifacts - basically indeed this is more just a minor guide for non experienced Maven user. But maybe you or we forgot something.

Also, can you describe how you created the Maven artifact? I guess mvn package which triggers the Maven Shade plugin?

@JNKHunter
Copy link
Author

Hi Lorenz, thanks. I figured this was just a test dir for beginners.

I'm using the exact POM file from the develop branch, which is using the 0.7.2 version
https://github.com/SANSA-Stack/SANSA-Template-Maven-Spark/blob/48adae0cb02407fc727d704b928417ed0003c940/pom.xml

And you're correct, I'm using mvn package to create the jar.

Do you recommend switching to the 0.7.1 version?

@LorenzBuehmann
Copy link
Member

Well, the latest version should work ... so, no need to go back I think.

Let me check what's going wrong here. I've seen this issue before but I thought it has been resolved already - at least it shouldn't happen with the ResourcETransformer in the Maven Shade plugin enabled - which is the case.

By the way, I'll also reply to your mailing list question once I found a good answer.

@kohpai
Copy link

kohpai commented May 22, 2021

I'm also having the same issue on Spark 2.2.1, Scala 2.11.8, JDK 1.8

@LorenzBuehmann
Copy link
Member

Hi.

Do you really want to use such an old Spark version?
Also, SANSA-Stack has been migrated into a single repository in the meantime: https://github.com/SANSA-Stack/SANSA-Stack
There should be documentation on how to add it to your POM file, i.e. which Maven artifacts as well as the repositories.

@kohpai
Copy link

kohpai commented May 26, 2021

I have just switched to Spark 2.4.8. Also tried the example in https://github.com/SANSA-Stack/SANSA-Stack, but the problem still persists. I now downgraded sansa to sansa-rdf-spark-core v0.3.0, it works. But I can only read NT files.

@LorenzBuehmann
Copy link
Member

wait a second. what exactly do you want to do (loading which files) and what exactly are you doing to use SANSA? I mean, the Maven template is nothing more than a stub of the dependencies, you won't even need all of them if for example you just want to load the RDF data. And which file format do you want to load? The most efficient way is for sure N-Triples as this format is splittable.

@kohpai
Copy link

kohpai commented May 29, 2021

We want to use SANSA for loading RDF into Spark, like you have speculated. I am aware that we only need sansa-rdf-spark for that task. Ah, so N-Triples is more suitable? We wanted to use TTL solely because the file size is smaller.

@kohpai
Copy link

kohpai commented Jun 9, 2021

Just to update. I have tried many things. I couldn't fix it, but I found an obvious workaround that I didn't think of before; the -jars option when executing spark-submit. Basically just go ahead and download the necessary jars from http://archive.apache.org/dist/jena/binaries/. Then load all the jars when submitting the application. So now I can use SANSA 0.7.2 with Scala 2.12.10 and Spark 3.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants