Spark Streaming Kafka consumer vaquar khan

POC with Spark processing a stream from Kafka. Uses Log4j2 kafka appender.# Apache-kafka-spark-streaming-poc

https://stanford.edu/~rezab/sparkclass/slides/td_streaming.pdf

https://youtu.be/_adU0xpFpU8

http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/

http://www.slideshare.net/prakash573/spark-streaming-best-practices

https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/README.html

https://github.com/beeva/beeva-best-practices/blob/master/big_data/spark/README.md

https://www.infoq.com/articles/apache-spark-introduction

http://hortonworks.com/hadoop-tutorial/introduction-spark-streaming/

https://databricks.com/blog/2015/07/30/diving-into-apache-spark-streamings-execution-model.html

http://davidssysadminnotes.blogspot.com/2016/09/running-spark-streaming-dcos.html

FAQ:

https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/15%20Streaming%20FAQs.html

JIRA: https://issues.apache.org/jira/browse/SPARK-18124

http://spark.apache.org/docs/latest/configuration.html

http://aseigneurin.github.io/

Spark Streaming Backpressure.

Issue

The streaming part works fine but when we initially start the job, we have to deal with really huge Kafka message backlog, millions of messages, and that first batch runs for over 40 hours, and after 12 hours or so it becomes very very slow, it keeps crunching messages, but at a very low speed.

Solution

 spark.streaming.backpressure.enabled
 spark.streaming.receiver.maxRate
 
 spark.streaming.kafka.maxRatePerPartition
 spark.streaming.backpressure.enabled
 spark.streaming.receiver.maxRate

https://vanwilgenburg.wordpress.com/2015/10/06/spark-streaming-backpressure/

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
gradle		gradle
src/main		src/main
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
kafka-spark-poc.iml		kafka-spark-poc.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming Kafka consumer vaquar khan

Spark Streaming Backpressure.

Issue

Solution

About

Releases

Packages

Languages

vaquarkhan/apache-kafka-spark-streaming-poc

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming Kafka consumer vaquar khan

Spark Streaming Backpressure.

Issue

Solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages