Technical Approach to Design an Extensible and Scalable Data Processing Framework

Modern distributed data processing applications provide curated and succinct output datasets to the downstream analytics to produce optimized dashboarding and reporting to support multiple sets of stakeholders for informed decision-making. The output of the data processing pipeline must be pertinent to the objective as well as provide summarized and to-the-point information from the backend data processing pipeline. The middleware data processing thus becomes the backbone of these analytics applications to consume voluminous datasets from multiple upstreams and process the complex analytics logic to generate the summarized outcomes, which are then consumed by analytics engines to generate different kinds of reports and dashboards for multiple purposes. The most broadly defined objective for these analytics systems is as listed below:

Forecasting application to predict the outcome for the future based on the historical trend to decide the future strategies for the organization.
Reporting for senior leadership to display the performance of the organization and assess profitability.
Reporting to external stakeholders for companies’ performance and future guidance.
Regulatory reporting to external and internal regulators.
Various kinds of compliance and risk reporting.
Provide processed and summarized output to data scientists, data stewards, and data engineers to aid them in their data analysis needs.

There can be many more needs for the organization, which will require the analytics processing output to generate the summarized information, which will be consumed by the analytics application for generating the reports, charts, and dashboards.