Use Spark Interims to Troubleshoot and Polish Low-Code Spark Pipelines: Part 1

Authors:
Anya Bida

Let’s take advantage of Spark’s interim metadata to understand our Spark job behavior with low-code tooling. The Spark UI shows me some nice metrics for job completion time, num rows read, num rows written, and some related details...

...but I want to know how my pipeline behaves over time.

Ok, but manually checking for pipeline success is not a viable goal. I need testing and alerting!

Historical metadata gets super handy when I want to compare my pipeline runs using multiple Spark versions. Check out Part 2 of this blog where we troubleshoot individual dataframes.

How can I try Prophecy?

Prophecy is available as a SaaS product where you can add your Databricks credentials and start using it with Databricks. Or you can use an Enterprise Trial with Prophecy's Databricks account for a couple of weeks to kick the tires with examples. We also support installing Prophecy in your network (VPC or on-prem) on Kubernetes. Sign up for your 14 day free trial account here.