Quick look at Spark UI

Dhruv Saksena
3 min readNov 27, 2021

Spark provides a good UI to provide monitor status of your application, resource consumption and configuration. It’s a part of Spark package.

By default it runs on port 4040 http://localhost:4040/jobs/

Spark UI

This UI will only show any data when you execute any action. If we are just creating and transforming RDDs then this UI wont show anything.

Any work we give to Spark is termed as “Job”. Each job is broken down into various stages. Stages are internally broken down into various TaskLets and tasklet is further broken down into tasks.

To begin with let’s execute a very simple job from Spark Shell-

scala> val sampleRDD = sc.parallelize(1 to 10000);sampleRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23scala> sampleRDD.collect();res1: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,...

Here, we are just collecting data. Now, as soon as you do sampleRDD.collect(), Spark cluster will come into action and do the data processing-

Here 8 is the number of cores in your system and that is the level of parallel tasks we execute to complete the job.

To check the stages of the job, just click on the description above-

Stages of a job and parallel execution of tasks

Now, let’s take one more example where we load a file in spark memory-

scala> val fileRDD = sc.textFile("/Users/dhruv/Documents/Personal/Learning/Learning/Spark/abc.rtf");fileRDD: org.apache.spark.rdd.RDD[String] = /Users/dhruv/Documents/Personal/Learning/Learning/Spark/abc.rtf MapPartitionsRDD[2] at textFile at <console>:23scala> fileRDD.cacheres2: fileRDD.type = /Users/dhruv/Documents/Personal/Learning/Learning/Spark/abc.rtf MapPartitionsRDD[2] at textFile at <console>:23scala> fileRDD.collectres3: Array[String] = Array({\rtf1\ansi\ansicpg1252\cocoartf2578, \cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fmodern\fcharset0 Courier;}, {\colortbl;\red255\green255\blue255;\red0\green0\blue0;}, {\*\expandedcolortbl;;\cssrgb\c0\c0\c0;}, \paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0, \deftab720, \pard\pardeftab720\partightenfactor0, "", \f0\fs24 \cf2 \expnd0\expndtw0\kerning0, \outl0\strokewidth0 \strokec2 Quod equidem non reprehendo;\, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quibus natura iure responderit non esse verum aliunde finem beate vivendi, a se principia rei gerendae peti; Quae enim adhuc protulisti, popularia sunt, ego autem a te elegantiora desidero. Duo Reges: constructio interrete. Tum Lucius: Mihi...

Now, if we go to Spark UI storage tab we can see how this file is stored in the Spark memory and the partitions aswell-

Storage Tab in Spark UI

To get the Environment details for the Spark, just click on Environment tab-

Environment Tab in Spark UI

The Executors tab provides information about memory, cores and other resources being used by executors. For debugging purpose, you can also download the Thread Dump

Executors tab in Spark UI

--

--