Prophecy: Low-Code Data Engineering on Databricks Lakehouse via Partner Connect
Prophecy makes data users more productive on Databricks Lakehouse with its Low Code Data Studio that provides visual development, deployment and management of data pipelines. Databricks is the most powerful data lakehouse platform in the cloud, with a platform that unifies all your data, analytics and AI. Databricks partner connect provides one-click connection to sign-up and sign-in into Prophecy.
Prophecy enables you to easily build visual drag-and-drop data pipelines - whether you're a data engineer, visual ETL developer, data analyst or a data scientist you can be productive quickly.
As you build the visual drag-and-drop pipelines, Prophecy simultaneously generates high-quality Apache Spark code that is stored on Git. This code has more clarity & performance than hand-written Spark code.
What’s new about Prophecy’s integration with Databricks Partner Connect?
Databricks Partner Connect makes getting started with Prophecy simple. You can initiate sign-up (or sign-in once you are signed up) using a single click. Partner Connect automatically creates a Prophecy account for you and connects your Databricks workspace to it. You can then start building your data pipelines visually in a few clicks. Prophecy is forever free for individual users of Databricks
Get Started with Partner Connect - 2 mins
Go to Partner Connect & Click on Prophecy
Automatic Handshake - Once you select to Connect to Prophecy, your Databricks email address and necessary connection details are automatically passed to Prophecy. You only have to provide a password that you will use for logging into Prophecy (this is not related to your Databricks account)
You'll reach the Prophecy home page, and that's it - you're all set to start developing & exploring!
Now, let's run your first Spark example pipeline and see how amazingly simple Databricks can be.
Run first Spark pipeline - 5 mins
To run your first Spark pipelines, we'll hit 3 milestones in 5 minutes demonstrating how to extract, transform and load data with interactive execution on Spark.
1. Open a Visual Spark Workflow - see the Business Logic
Spark workflows in Prophecy are version-controlled in a git repository.
Once you are on the Prophecy home page, click to open the 'join-agg-sort' workflow. Then click on the [ OPEN WORKFLOW ] button..
...and, it opens in a Visual Editor!
In Prophecy, you develop workflows by connecting GEMS on the CANVAS
Fire up a small interactive cluster, and you can inspect the business logic while the cluster spins up. When you click to fire up a cluster, it'll spin up a cluster in your Databricks account - this takes about 3 to 4 minutes. Note that you can also connect to existing long-running clusters if you use them.
You can click on the Databricks icon to see the cluster being used by Prophecy - in your Databricks account.
Now, double click on various Gems, to get a feel for how business logic is developed in Prophecy.
2. Run Interactively on Databricks!
Once the cluster is ready, you can switch back to your workflow, hit the play button, and see the data after each step!
This is something you cannot do in code - see your data (and schema) after every transformation.
Now, you can run the existing workflow. But how easy is it to build one? Let's see by editing a workflow.
3. Edit a Workflow & Re-run
Let's add a transform to edit the data. From the menu Transform, select the Reformat Gem- This shows up on the canvas. You can delete an existing edge, and insert the Gem in the middle by connecting input / output ports appropriately. Note that Prophecy supports the standard Spark DataFrame transforms and Delta Lake including Merges.
Now, inside the Reformat Gem, we'll write some transformation logic. Let's add a new column called full_name - the value of which is computed by applying the CONCAT function on first_name and last_name columns. As you start to write the expression, you'll see the Expression Builder pop up to help you finish your expressions. Use the Down Arrow to select a suggestion and press Tab to select it.
Save and re-run the workflow, and you'll see that the data output from Reformat has the column full_name in it.
Now, you can explore from here!
Prophecy provides a lot more capabilities - you can develop tests and deploy or schedule the Spark workflows you built visually. You can also search your datasets and workflows, and for any column in a dataset see the lineage to understand how it was computed. To dive deeper, here are some next steps: