In this blog, we’ll learn about the concept of financial reporting, its importance and the various challenges faced while generating an efficient and reliable financial report, and how Prophecy makes the complete process plain sailing.
Prior to joining Prophecy, I’ve been leading the data engineering effort at the consumer banking division of a prominent investment bank for a few years. There, I’ve been closely involved in the end-to-end process of financial report generation. I’ve had first-hand experience of tackling various on-ground challenges in the process. Through this blog, I’m going to demonstrate some of those and how Prophecy equips you well to tackle them.
Understanding Financial Reporting
Financial reporting refers to the standard practice of giving stakeholders an accurate picture of a company’s finances to measure various KPIs. These KPIs are driven by the business model of the organization. In essence, a financial report is what drives an organization’s decisions.
For our example of a consumer bank, let’s first look at its business model and see which KPIs matter most. The business model of a consumer bank typically involves borrowing money from retail customers at the cost of an interest rate and investing that money for a higher return. The difference between this return on investment and the interest paid makes up the bank's profit.
Given the above business model, the “Working Capital” KPI is the most important. The bank needs to have a clear and accurate foresight of its working capital to balance the demand and supply. The demand is, relatively, the more stable component making the forecasting of supply more critical. This is achieved by observing the existing trends of money movements and understanding the behavior of the retail consumers across different product offerings (Fixed Term Account, Savings Account, IRA Account, etc).
To measure the above KPI, the Management Information team of analysts essentially needs a daily report of all account positions. We’ll discuss the reporting requirements of our example in further detail in the below sections.
Importance of a Robust ETL for Financial Reporting
Gartner, organizations face an average loss of $15 million every year due to poor data quality. Therefore, having a modern and foolproof solution for managing financial data, such as the ETL process, is essential for timely data extraction and delivery.
Building a foolproof ETL manually is not easy! And more so for financial data.
To name a few major challenges I’ve seen in my experience:
- Need for high expertise in specific technologies like Spark and Airflow increasing the payroll expenses for the organization
- Manually written large chunks of code that are tedious to understand and debug. This creates room for errors in the data processing.
- No visibility of the lineage of the data without looking into the code. This makes addressing compliance requirements particularly difficult and time-consuming.
Prophecy, addresses these issues very effectively, thanks to its low-code interface, which generates high-quality and performant Spark code and automatically computed column-level lineage.
Enabling working capital forecasting
We have to enable our Management Information team of analysts to forecast the working capital. To achieve that, we need to provide them with a report containing all account positions for every historic business date along with daily cash movements and demographic information. With the available forecasting ability, the business can then make data-driven decisions around pricing (offered interest rate) and product offering opportunities (fine-tune existing products or launch new ones).
This is one of the most common examples of a financial report which gives the business a holistic view of the money movements within and outside the firm along with the demographic understanding.
To build the desired report, we have to ingest the external data into our data lake, clean it up and aggregate it into the required format. To do so, we’re going to structure our data storage into the Bronze, Silver, and Gold architecture.
Let’s get our hands dirty now by writing a few of our pipelines. By the end of this exercise, this is what our top-level pipelines & datasets view (lineage) should look like:
Bronze - Data Ingestion
Bronze pipelines copy external data into our data lake. To forecast the working capital, we will need to ingest the following datasets:
- Positions/Account status - EOD (end of the day) snapshot of the account details, generated by the bank’s custodian
- Transactions - EOD snapshot of the all of the transactions from the current business day (inflow/outflow/interest credit)
- People - list of new or updated customer profiles, contains information like names, emails, and addresses
- Products - EOD snapshot of all the product offerings, contains information like interest rates, lock-in periods, etc
Silver - Data cleanup
For each of our bronze tables, we build a silver pipeline, which involves the following steps:
- Load Bronze - Read bronze data for the given day using Source gem.
- Deduplication - For loading silver data, we consider only the latest records. E.g. if a customer updated their address twice in a day, we would see two separate people dataset records, however, we want to have only a single record per person containing all intra-day updates. The Deduplicate gem helps us achieve that.
- Schema Flattening - Some of our datasets come with columns with nested information. To turn complex data structures into simple columns we leverage the FlattenSchema gem.
- Addition of Milestone Keys - These keys are used to uniquely identify the loaded rows. This helps particularly with handling slowly changing dimensions. We use a SchemaTransform gem here, which is used for adding, removing, or renaming columns in a DataFrame.
- Validation and Reconciliation - This is most critical to any ETL/ELT pipeline to ensure data integrity and veracity to be able to make critical business decisions. In our case, all the transactions happening on a business date for an account should sum to the account balance change from the previous business date.
Transactions Sum (Today) = Balance (Today) - Balance (Yesterday)
Since this is a complex operation, we leverage a reusable Subgraph to make it modular and easier to use for other developers. We also make this subgraph run in Phase = -1. This ensures that we only run the silver ingest (phase = 0) once the reconciliation has finished successfully. We reconcile data as follows:
- Compute the balance change for each account between successive business dates:
- By aggregating the latest bronze transactions (using Aggregate gem)
- By differencing balances in existing silver account status (using lag() in Window gem)
- Join the above two balance changes, compare and fail if the data doesn’t match
- Silver Append - Once we’re happy with our data, we append the output to the existing Silver data. Note, that we partition our data on “business_date” and “import_ts” columns to optimize the performance while reading silver data.
Gold - Aggregations
Now that we’ve finished the Silver loads, it’s time to cross the final mile with our Gold data aggregation.
Given our problem statement, the Gold data generation just involves joining the above loaded silver datasets with appropriate preprocessing and reformatting. The generated report now contains all the necessary data points needed for the forecasting of the working capital.
While we just saw how Prophecy makes the ETL writing process so quick and easy, it doesn’t stop there. Under the hood, Prophecy also did the heavy lifting of converting all our pipelines to high-quality and standardized python code (all checked into git). To further top that, it also computed lineage right down to the column level in all our pipelines.
The complete video walkthrough will be coming out soon on our website. Follow us on LinkedIn to not miss any updates!
How can I try Prophecy?
Prophecy is available as a SaaS product where you can add your Databricks credentials and start using it with Databricks. You can use an Enterprise Trial with Prophecy's Databricks account for a couple of weeks to kick the tires with examples. We also support installing Prophecy in your network (VPC or on-prem) on Kubernetes. Sign up for your account now:Sign up for your free Account!
We're super excited to share our progress with you, get in touch with us - we're looking to learn!