Be more Productive on Spark with

Author: Prophecy Team!

We've been working tirelessly over the last two years - trying to perfect the development and deployment experience on Spark. We're very excited to share our learnings about productive and powerful development on Spark.

In this blog, we'll share the four main pillars of new Low-Code products that make Spark development way easier and faster. After this, we'll go through and show the various pieces of Data Engineering process and how they come together to support these pillars.

Here is the link for TL;DR, show me the product :)

Four Pillars for Productivity & Power

Following are the four pillars, let's understand them a bit better!

Low-Code for Complete Data Engineering

Prophecy is low-code data engineering product on Spark, Delta lake and Airflow. You get the best experience with Databricks in the cloud. Let's look at how to achieve various things:

Develop a new Spark Workflow in 5 mins!

Developing a simple end-to-end workflow on Spark should take no more than 5 minutes with Low-Code. Here are the key things that make development fast:

Let's see this in action - we've accelerated the videos to respect your time!

Extend with your Own Gems!

If you're the platform team - you want to standardize development. We'll show how you can build and rollout your own framework

Let's see how you can create your own Gem:

Develop a Spark Test in 3 mins!

Everyone is struggling to get good test coverage. Test coverage enables you to be agile where you can you can have higher confidence when you run these tests in CI, CD before pushing new code to production. Here is how tests can be written quickly:

Let's see this in action:

Develop a Schedule in 5 mins!

Now that you have developed a workflow - you'll want to run it regularly - perhaps deploy it to run everyday at 9am. You'll also want to do other steps - perhaps run a sensor to wait for a file to show up in S3 storage, send an e-mail on failure or move some files around after they are processed. Prophecy Low code Airflow makes all this super easy and fast.

Airflow is the popular open source scheduler - it seems to handle that 1 must-have use case - that no other product does - and this use case is different for every team. It is based on Python and there is good number of active developers enhancing it. Usability is quite a different story - getting it working correctly in production is hard, building a schedule dag has a steep learning curve and after that is quite involved to test it. We're fixing this experience based on following tenets.

Let's see the simplest way to schedule:

Search & Lineage for free!

Column-Level lineage can track any value at column level through thousands of workflows and datasets, helping solve some challenging problems

Let's see column level lineage in action, search is coming soon!

Ok, this is cool, can I try it?

Prophecy is available as a SaaS product where you can add your Databricks credentials and start using it with Databricks. You can use an Enterprise Trial with Prophecy's Databricks account for a couple of weeks to kick the tires with examples. Or you can do a POC where we will install Prophecy in your network (VPC or on-prem) on kubernetes. Sign up for you account now:

Sign up for your free Account!

We're super excited to share our progress with you, get in touch with us - we're looking to learn!