Prophecy: Low-Code Data Engineering on Databricks Lakehouse via Partner Connect

Authors: Shagun Bains

Prophecy makes data users more productive on Databricks Lakehouse with its Low Code Data Studio that provides visual development, deployment and management of data pipelines. Databricks is the most powerful data lakehouse platform in the cloud, with a platform that unifies all your data, analytics and AI. Databricks partner connect provides one-click connection to sign-up and sign-in into Prophecy.

Prophecy enables you to easily build visual drag-and-drop data pipelines - whether you're a data engineer, visual ETL developer, data analyst or a data scientist you can be productive quickly.

As you build the visual drag-and-drop pipelines, Prophecy simultaneously generates high-quality Apache Spark code that is stored on Git. This code has more clarity & performance than hand-written Spark code.

What’s new about Prophecy’s integration with Databricks Partner Connect?

Databricks Partner Connect makes getting started with Prophecy simple. You can initiate sign-up (or sign-in once you are signed up) using a single click. Partner Connect automatically creates a Prophecy account for you and connects your Databricks workspace to it. You can then start building your data pipelines visually in a few clicks.

Get Started with Partner Connect - 2 mins

Go to Partner Connect & Click on Prophecy

Automatic Handshake - Once you select to Connect to Prophecy, your Databricks email address and necessary connection details are automatically passed to Prophecy. You only have to provide a password that you will use for logging into Prophecy (this is not related to your Databricks account)

You'll reach the Prophecy home page, and that's it - you're all set to start developing & exploring!

Now, let's run your first Spark example pipeline and see how amazingly simple Databricks can be.

Run first Spark pipeline - 5 mins

To run your first Spark pipelines, we'll hit 3 milestones in 5 minutes demonstrating how to extract, transform and load data with interactive execution on Spark.

1. Open a Visual Spark Workflow - see the Business Logic

Spark workflows in Prophecy are version-controlled in a git repository.

Once you are on the Prophecy home page, click to open the 'join-agg-sort' workflow. Then click on the [ OPEN WORKFLOW ] button..

...and, it opens in a Visual Editor!

In Prophecy, you develop workflows by connecting GEMS on the CANVAS

Fire up a small interactive cluster, and you can inspect the business logic while the cluster spins up. When you click to fire up a cluster, it'll spin up a cluster in your Databricks account - this takes about 3 to 4 minutes. Note that you can also connect to existing long-running clusters if you use them.

Fire up a Small cluster from the top-right, the cluster will connect to this workflow!

You can click on the Databricks icon to see the cluster being used by Prophecy - in your Databricks account. 

The Cluster is spun up in your Databricks account, with your privileges

Now, double click on various Gems, to get a feel for how business logic is developed in Prophecy.

2. Run Interactively on Databricks!

Once the cluster is ready, you can switch back to your workflow, hit the play button, and see the data after each step!

Click Play on Bottom right to run the workflow and then Click on the Blue Data icon after Customer to see the Data

This is something you cannot do in code - see your data (and schema) after every transformation.

Customer Dataset

Now, you can run the existing workflow. But how easy is it to build one? Let's see by editing a workflow.

3. Edit a Workflow & Re-run

Let's add a transform to edit the data. From the menu Transform, select the Reformat Gem- This shows up on the canvas. You can delete an existing edge, and insert the Gem in the middle by connecting input / output ports appropriately. Note that Prophecy supports the standard Spark DataFrame transforms and Delta Lake including Merges.

Select a Reformat from the Menu and connect it in the pipeline

Now, inside the Reformat Gem, we'll write some transformation logic. Let's add a new column called full_name - the value of which is computed by applying the CONCAT function on first_name and last_name columns. As you start to write the expression, you'll see the Expression Builder pop up to help you finish your expressions. Use the Down Arrow to select a suggestion and press Tab to select it.

Edit the Reformat.You can see Input Schema to the left.
As you create/edit columns, expression builder will help you write SQL expressions.

Save and re-run the workflow, and you'll see that the data output from Reformat has the column full_name in it.

Save and re-run the workflow to see the edits

Now, you can explore from here!

Prophecy provides a lot more capabilities - you can develop tests and deploy or schedule the Spark workflows you built visually. You can also search your datasets and workflows, and for any column in a dataset see the lineage to understand how it was computed. To dive deeper, here are some next steps:

  1. See a Blog with Videos of how to use Prophecy to do more
  2. Ask us questions on the slack channel
  3. You can always schedule a demo here
  4. Read Databricks Blog to learn more about Partner connect