Hook up your spark-shell to YourKit profiler by adding the following lines to spark/conf/spark-env.sh
(Source: https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit)
SPARK_DAEMON_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
export SPARK_DAEMON_JAVA_OPTS
SPARK_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
export SPARK_JAVA_OPTS
This would attach the YourKit profiler in ‘sampling’ mode for all spark-submits, including the ones from spark-shell. Sometimes I prefer ‘tracing’, and at times skip profiling completely. The following code snippet helps you specify the profiling option while launching the spark-shell.
prof="none"
for curr in $@
do
if [[ "$curr" =~ -Dprofiling=(.*) ]] ; then
option=${BASH_REMATCH[1]}
if [ "$option" == "sampling" ]; then
prof="sampling"
fi
if [ "$option" == "tracing" ]; then
prof="tracing"
fi
fi
done
if [ "$prof" != "none" ]; then
SPARK_DAEMON_JAVA_OPTS+=" -agentpath:/opt/yourkit/yjp-2014-build-14116/bin/linux-x86-64/libyjpagent.so=$prof"
export SPARK_DAEMON_JAVA_OPTS
SPARK_JAVA_OPTS+=" -agentpath:/opt/yourkit/yjp-2014-build-14116/bin/linux-x86-64/libyjpagent.so=$prof"
export SPARK_JAVA_OPTS
fi