How Spark Actually Works: Behind the Curtain of Your First .show() | by B V Sarath Chandra

Publish 1: Apache Spark Internals

You ran .present() in your first Spark DataFrame and noticed the information — however what actually occurred below the hood?

Once we write one thing like:

df = spark.learn.csv("path/to/information.csv", header=True)
df.present()

…many people assume Spark “reads” the information and shows it, similar to Pandas.
However right here’s the reality:

Nothing truly occurs till .present() is executed.

Spark is a lazy execution engine. It builds up a plan behind the scenes — and solely acts when it actually has to.

Let’s stroll by the behind-the-scenes steps Spark performs:

If you write df = spark.learn.csv(...), Spark creates a logical plan:

It is aware of the supply (CSV)
It information the schema
It tracks any transformations you apply later (filter, groupBy, and so forth.)