For numerous years, the foundation of our Data Warehouse architectures has revolved around the traditional ETL dataflow process, consisting of the following stages:
- Extract
- Transform
- Load
In this process, data is initially extracted from diverse sources in its raw format, followed by transformations, typically within a SQL database, before finally loading the data into a data mart or a semantic data model.
However, a few years ago, we introduced a new layer in this architecture to better accommodate semi-structured and unstructured data. This innovation coincided with the emergence of data lakes, which prompted a shift to ELT. I’ll argue that we still need the last “Load” step, resulting in a more fitting term, ELTL.
Simultaneously, Microsoft overhauled many of their Data Platform architectures, introducing entirely new labels for the process:
- Ingest
- Store
- Prepare
- Serve
This prompts a contemplation: should we now refer to it as an ISPS dataflow, or should we maintain the traditional terminology that we have relied upon for years?