This article is a record for the problems I met when developing Spark Streaming applications.
Log4j reads the first log4j.properties it finds in the class path.
So if you want to use
$SPARK_PATH/conf/log4j.properties, make sure it’s the first one to be found in your class path.
This probably because of that you search the wrong place. The application you submit to Spark is just a
driver program that creates the
DStreams is processed by Spark workers, all the logs inside DStreams transformation functions are going through workers’ standard output and written to
You can also view them through the Spark web UI, which is at
$MASTER_IP:8080 by default. Click the app and then you will see links to the logs on workers.
When lauching several apps you may meet a problem like this:
org.apache.spark.scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
This is because of either not enough cores or memory. To solve this problem, first make sure which one you are short of by checking the cluster status on the Spark web UI. Then adjust them as following.
1. Add more slaves to
2. Adjust worker instances per slave:
You can set the memory usage of your app when you submit it:
Specify the core numbers in your app. Or in