Prophecy raises Series A to Industrialize Data Refining

Author: Raj Bains

Prophecy is delighted to announce that we raised a $25M Series A led by Insight Partners, with participation from SignalFire, Dig Ventures & Berkeley SkyDeck. We're grateful to the team, the customers and the investors who have trusted us & helped us get started!

_____

Prophecy came about when I saw at Hortonworks that most businesses trying to get value from data were struggling. With my compilers & developer tools expertise, (honed at Microsoft Visual Studio & at NVIDIA building CUDA) I could see that very sophisticated tools can be built by bringing deep systems expertise & coupling that with good design. These two communities - systems programming and visual design - rarely interact & magic can be created at the intersection. The key technical insight was that if we merge visual & code programming, then visual data users can be brought into the world of DevOps. We made git the source of truth - for visual workflows and metadata - and then built the entire product stack on top.

The rest of the blog talks about the challenges in the space & how we're addressing them.

Data Refining is practiced as an Artisanal Process

The fundamental problem is that data tooling hasn’t changed much over the past decade. Data is the new oil, but it needs to be refined into the fuel that powers machine learning and business analytics. This refining (or data engineering) is currently an artisanal process driven by data engineers and custom scripts. The data users continue to wrestle with data infrastructure instead of focusing on delivering business value from analytics and machine learning.
‍‍
Today, this refining of data is the realm of artisans trained in the craft of data engineering, who master many systems and program specialized recipes to produce data. There is little visibility or predictability to the process.

Data Engineering is an Artisanal Process - it does not Scale

If 90% of analytics & machine learning work remains in building data pipelines, why aren't we fixing this? We can send reusable rockets into space, but most businesses cannot track how a sensitive value flows through their various data pipelines and datasets.

The data teams continue to be under pressure to build data products to deliver analytics & insights faster than ever before, without having the tools to succeed. Following is our approach to make data engineering run at scale:

Industrializing Data Refining

Prophecy is on the mission to industrialize data refining - bringing the Standard Oil moment to data. The process needs robust engineering, standardization & high volume production. A productive data engineering organization is built on the three pillars:

Industrialized Data Engineering with Prophecy

Let's see how Prophecy achieves these goals

Low-Code for Visual Development

Prophecy's low-code designer provides a visual drag-and-drop canvas to develop data pipelines, where business logic can be written as simple SQL expressions. We believe it is superior in every way to developing custom scripts:

Code & DevOps

Running at scale requires bringing the best software engineering practices to the refinement of data. Rapid development & deployment of data pipeline can be achieved by code on git & high test coverage, coupled with continuous integration & continuous deployment. Prophecy does the following to make this process work:

These provide the following benefits:

Note: ETL tools industry was built before git and they do not work well with the modern ecosystems. As organizations move to the cloud, they are discarding these products in favor of code centric approaches.

Extensibility

Standardization is essential to scale but the scope of Data Engineering has increased quite beyond what traditional ETL or data integration products provide. It is no longer acceptable to only have a limited palette of visual components, where users get stuck if something does not fit the paradigm.

Prophecy provides extensibility via templates - not as an afterthought - but as the concept at the heart of our architecture. Prophecy provides a set of inbuilt visual operators - such as Spark standard library, and Delta lake library. New visual operators are defined by our customers - usually the data platform teams develop standards for their organizations. This includes custom connectors, and transforms such as an encryption library. Customers also ask us to develop new operators as they need them and we're happy to add new libraries as requested.

Gems

Prophecy enables you to construct data pipelines from standard visual blocks (like Lego pieces), that we call gems. The gems require the users to be able to write Spark code and our customers often rely on Prophecy to help out. Gems include Spark code, properties that are blanks to be filled by the user from UI, and a function to describe the visual layout of the gem.

In the gem builder UI, the left half is where you write the template code for the gem. The top right has a functional UI generated in real time from the template code. You can fill business logic values into this generated UI, and you can immediately see the generated code at the bottom right. You can run this generated code against input data and see the results to ensure everything is working well.

Gem Builder

Complete Product

In the cloud, data engineering only has point products forcing the customer data platform teams to stitch together custom solutions. However, this means that development, deployment and metadata is spread across multiple systems - this is not sustainable over the medium term.

Prophecy instead chooses to provide a complete product:

- Build data pipelines on Spark
- Deploy & Schedule data pipelines on Airflow
- Get unified metadata with search that includes business logic, datasets, execution information
- Column level lineage to see how values flow end-to-end

Complete Product - Development, Deployment & Metadata with Lineage

Summary

Prophecy provides the essential elements required to build a data engineering practice that will fundamentally accelerate data delivery for analytics and machine learning. A complete product to handle your data engineering means you'll spend your time on business logic rather than wrestling with data infrastructure. Again - we enable the

We're excited to be working on a hard challenge that can make data much more central to the economy. Reach out if you want to join us or try the product!