The key to building a flexible centralised insights platform.
Centralised data management has become a cornerstone for businesses seeking efficiency, consistency, and improved decision-making. By consolidating data into a single, integrated platform, organisations can streamline operations, enhance collaboration, and gain deeper insights into their operations. This centralisation is driven by the need to harness the full potential of data assets scattered across various systems and departments. However, the journey towards centralised data management is not without its challenges. Businesses often grapple with complexities such as disparate data sources, inconsistent data formats, and siloed information.
More importantly as businesses evolve and grow, the scalability and adaptability of centralised data systems become critical considerations. Without a flexible and agile infrastructure, organisations risk being constrained by rigid data architectures that cannot keep pace with changing business requirements. Not every data feed is understood by the people involved at the onset of the journey. Understanding the full dataset is an iterative process and takes time. This is especially applicable to businesses that do not have a data team embedded within themselves but yet want to benefit from a good enough centralised data. So, where do you start? How do you ensure the initial implementation is flexible to change? In this blog, we explore the various building blocks for creating a centralised insights platform that is adaptable and agile.
The Foundation: Data sources
The principal question to ask yourself is “Where is my data coming from?”
Modern business systems typically churn out data from four kinds of data sources:
- Relational Database
- Web API
- Structured flat files
- Data streams
Regardless of the type of source (yes, there are many more!), this data will be stored in some model- which will facilitate the extraction of business value, making it easier for end users to access and utilise the data effectively.
Are you finding it hard to understand where your data is and where it's coming from? We are experts in finding, interpreting, and communicating your data – what it means and how it can be understood in a business context. Contact us to learn how we can help.
The Modelling Strategy: Transaction Snapshot
In agile data projects, the solution must have a loading strategy that's modular, bite-sized, and replaceable. Different approaches like ETL, ELT, Inmon EDW, Kimball Star Schema, Data Vault/2.0, and Transaction Snapshot can be considered for data loading and modelling.
Each method has its benefits and considerations, so the choice depends on the project's needs. For the purposes of this blog we will delve into the transaction snapshot method. This technique passes through the three basic Medallion layers of data transformation (bronze-silver-gold). It's flexible and easy for business stakeholders to understand and support, requiring less time on the enterprise model. However, data model purists may avoid it as it doesn't represent the enterprise relationship between data elements. In the initial phase of an organisation's data maturity journey, each system feed is modelled in isolation and linked together in later layers as relationship clarifications emerge. This layered approach is known as the Medallion Architecture (bronze-silver-gold).
It follows a simple journey:
- Data gets imported as it is into the bronze layer.
- Relevant data from bronze gets extracted to the silver layer, simultaneously fixing basic format issues.
- Data from the silver layer is linked to a basic enterprise-wide properties (common dimensions) in the gold layer
- Data from the gold layer can be integrated with other gold silos to create aggregated and related data points of interest.
Example:
Let us consider three data sets (a) Current employees (b) Organisation structure (c) Past employees. Let us assume that these all come from their respective separate business systems but have a common data to identify the employee’s department.
Bronze: This will contain the raw version of the data set from all three systems
Silver: This will contain the latest (or tracked changes) of each employee, organisation, and assignment details. All spaces trimmed from string etc.
Gold: This will contain the common attribute for the organisation’s department and employee. Stitch: this will contain a combined data set of current and past employees.
Modelling strategies are not one-size-fits-all. We can help you understand your unique requirements in the context of the resources and skills available to you. Book a discovery call with us to learn more.
Creating Secure Data Pipelines:
When designing a data centralisation solution, it's crucial to modularise the solution, allowing flexibility as the platform and business processes evolve.
When executing a data project swiftly, all data must undergo:
- Data Ingestion
- Data Storage
- Data Inference
- Data Intelligence
These data pipelines need to operate within a secure environment, adhering to the enterprise IT security principles and safeguarding data during transit and at rest.
Data security, particularly access control and encryption, must remain steadfast from the outset of implementing a technical platform. It's crucial to ensure that all stored data is encrypted both during and after storage, and accessible only to authorised users. While businesses lacking in-house expertise in this area may find this phase daunting, they typically recognise its importance and eventual benefits.
A solution should consider existing IT security policies to safeguard the data asset being constructed. It's essential to avoid commencing an IT transformation project alongside the data transformation initiative. This could hinder platform usage within the planned implementation timeframe and create unnecessary dependencies between the two programs. In cases where implementation plans clash, it's preferable to establish a small user base for securely accessing the platform. This enables the materialisation of business value earlier, albeit by a smaller, yet focused, audience.
Data Ingestion
Data ingestion should steer clear of any service dependency on the centralized data store and ideally steer clear of incorporating business logic. Modern cloud platforms offer their native data ingestion toolsets, which are also equipped for data transformation logic. Typically, during the initial phases of the solution, the data specification isn't known, making it best to avoid transformation during the ingestion phase. The ELT (Extract Load Transform) strategy for data pipelines is favoured in modern cloud platforms, as it's more dependable to extract data from the source, load it to storage, and then consider any assumed transformations in later phases.
Ingestion should be designed to temporarily store the raw or close-to-raw data as seen by the ingestion module for a short period, facilitating the tracing of data load issues.
Data storage
Data ingested should be stored in a compressed format and retained for an extended duration. This enables the discovery of business rules at a later stage and facilitates changes to transformation assumptions as they become clearer. All layers of data storage must follow the principle of being reconstructed from the preceding layer. This permits retro-fitting of business rule discoveries and regeneration of data.
Data inference
In the early stages of gaining insights, data reconciliation with source systems is primarily handled by a designated stakeholder. While inferring data from relational sources may appear straightforward due to their easily discoverable models, dealing with a Web API from a SaaS service presents challenges as the data is not readily understood or documented. Typically, data inference occurs from the silver layer and is reconciled to the source using both silver and gold data.
Data Intelligence
In the initial stages of a project, it's preferable to conduct data intelligence outside of the persistent data storage solution. Modern BI tools provide advanced capabilities for this purpose. By keeping the data intelligence process separate from the data storage and transformation module, it allows for experimenting with various permutations and combinations without the need to remodel the gold layer. The data intelligence data model can be constructed within the BI tool itself using the gold data, enabling the creation of the most efficient model for the intended purpose. For example, a flattened model (resembling a spreadsheet) is suitable for analysing key influencers, while a Star Schema offers high speed.
Ei Square can assist you with a complete digital transformation journey or provide support at standalone points such as data ingestion, integration, intelligence, security governance, and more. Get in touch to discover what works best for you.
Bottom line:
While there's no one-size-fits-all solution for centralised data and insights, it's crucial to adopt an implementation strategy that prioritises flexibility without necessitating a complete overhaul. This approach ensures that as organisations embark on the journey of centralising data and insights, they can accommodate evolving requirements and incorporate changes seamlessly.
This iterative approach not only fosters agility but also promotes a culture of continuous improvement, enabling organisations to stay responsive to changing market dynamics and emerging opportunities. Ultimately, the key lies in striking a balance between structure and adaptability, allowing for innovation and growth while leveraging the full potential of centralised data and insights.