Prophecy raises Series A to Industrialize Data Refining
Prophecy is delighted to announce that we raised a $25M Series A led by Insight Partners, with participation from SignalFire, Dig Ventures & Berkeley SkyDeck. We're grateful to the team, the customers and the investors who have trusted us & helped us get started!
Prophecy came about when I saw at Hortonworks that most businesses trying to get value from data were struggling. With my compilers & developer tools expertise, (honed at Microsoft Visual Studio & at NVIDIA building CUDA) I could see that very sophisticated tools can be built by bringing deep systems expertise & coupling that with good design. These two communities - systems programming and visual design - rarely interact & magic can be created at the intersection. The key technical insight was that if we merge visual & code programming, then visual data users can be brought into the world of DevOps. We made git the source of truth - for visual workflows and metadata - and then built the entire product stack on top.
The rest of the blog talks about the challenges in the space & how we're addressing them.
Data Refining is practiced as an Artisanal Process
The fundamental problem is that data tooling hasn’t changed much over the past decade. Data is the new oil, but it needs to be refined into the fuel that powers machine learning and business analytics. This refining (or data engineering) is currently an artisanal process driven by data engineers and custom scripts. The data users continue to wrestle with data infrastructure instead of focusing on delivering business value from analytics and machine learning.
Today, this refining of data is the realm of artisans trained in the craft of data engineering, who master many systems and program specialized recipes to produce data. There is little visibility or predictability to the process.
If 90% of analytics & machine learning work remains in building data pipelines, why aren't we fixing this? We can send reusable rockets into space, but most businesses cannot track how a sensitive value flows through their various data pipelines and datasets.
The data teams continue to be under pressure to build data products to deliver analytics & insights faster than ever before, without having the tools to succeed. Following is our approach to make data engineering run at scale:
Industrializing Data Refining
Prophecy is on the mission to industrialize data refining - bringing the Standard Oil moment to data. The process needs robust engineering, standardization & high volume production. A productive data engineering organization is built on the three pillars:
- Software engineering best practices applied to data pipelines
- Standardization of data pipelines and the deployment process
- Scale with low-code designer and standardization
Let's see how Prophecy achieves these goals
Low-Code for Visual Development
Prophecy's low-code designer provides a visual drag-and-drop canvas to develop data pipelines, where business logic can be written as simple SQL expressions. We believe it is superior in every way to developing custom scripts:
- Many data users: Our users don't need to be experts in Spark or Airflow, and this enables all the data users - data engineers, visual ETL developers, data scientists and data analysts to succeed.
- Productive development: It is very quick to develop pipelines - you can drag and drop visual blocks into a pipeline and interactively run them to see the data after every step along the journey. Even coding data engineers are more productive with our product.
Code & DevOps
Running at scale requires bringing the best software engineering practices to the refinement of data. Rapid development & deployment of data pipeline can be achieved by code on git & high test coverage, coupled with continuous integration & continuous deployment. Prophecy does the following to make this process work:
- Visual data pipelines as code: Prophecy low-code editor stores visual data pipelines as high-quality code on git
- High test coverage: Prophecy makes test-generation & editing easy, and this results in high test coverage for our users after the switch
- Metadata as code: Much of the metadata from projects including workflows, schedules and datasets, and computed metadata such as column-level lineage are also stored on git with Prophecy.
These provide the following benefits:
- DevOps practices: For data projects, the pipeline code, schedules & tests are stored on git - with every developer working on her branch. Every change gets reviewed, and on every commit tests are run. The code is then deployed to be run per the schedule. Bad changes can be rolled back reliably. This process enables data teams to quickly move new and edited changes to production with high confidence.
- Zero lock-in: Prophecy generated code is in 100% open-source format with data pipelines in Apache Spark format and schedules in Apache Airflow format. This ensures freedom from lock-in and cost management.
- Git versioning for time travel: Given that data projects including metadata are stored together on git, the user can traverse across time, and for example see how a value is computed today, and compare it with how it was computed a month earlier to understand why a breakage has occurred.
Note: ETL tools industry was built before git and they do not work well with the modern ecosystems. As organizations move to the cloud, they are discarding these products in favor of code centric approaches.
Standardization is essential to scale but the scope of Data Engineering has increased quite beyond what traditional ETL or data integration products provide. It is no longer acceptable to only have a limited palette of visual components, where users get stuck if something does not fit the paradigm.
Prophecy provides extensibility via templates - not as an afterthought - but as the concept at the heart of our architecture. Prophecy provides a set of inbuilt visual operators - such as Spark standard library, and Delta lake library. New visual operators are defined by our customers - usually the data platform teams develop standards for their organizations. This includes custom connectors, and transforms such as an encryption library. Customers also ask us to develop new operators as they need them and we're happy to add new libraries as requested.
Prophecy enables you to construct data pipelines from standard visual blocks (like Lego pieces), that we call gems. The gems require the users to be able to write Spark code and our customers often rely on Prophecy to help out. Gems include Spark code, properties that are blanks to be filled by the user from UI, and a function to describe the visual layout of the gem.
In the gem builder UI, the left half is where you write the template code for the gem. The top right has a functional UI generated in real time from the template code. You can fill business logic values into this generated UI, and you can immediately see the generated code at the bottom right. You can run this generated code against input data and see the results to ensure everything is working well.
In the cloud, data engineering only has point products forcing the customer data platform teams to stitch together custom solutions. However, this means that development, deployment and metadata is spread across multiple systems - this is not sustainable over the medium term.
Prophecy instead chooses to provide a complete product:
- Build data pipelines on Spark
- Deploy & Schedule data pipelines on Airflow
- Get unified metadata with search that includes business logic, datasets, execution information
- Column level lineage to see how values flow end-to-end
Prophecy provides the essential elements required to build a data engineering practice that will fundamentally accelerate data delivery for analytics and machine learning. A complete product to handle your data engineering means you'll spend your time on business logic rather than wrestling with data infrastructure. Again - we enable the
- Software engineering best practices applied to data pipelines
- Standardization of data pipelines and the deployment process - including extensibility
- Scale with low-code designer & standardization
We're excited to be working on a hard challenge that can make data much more central to the economy. Reach out if you want to join us or try the product!