(6 minute read)
Extract, Load and Transform to get a Single Version of the Truth
Today, most companies find themselves swamped by data and they often struggle to find a comprehensive, solid and reliable set of data that they can trust, and they can use for data-driven decision making.
This solid, reliable and comprehensive set of data is required to make timely decisions that will help to obtain buy-in throughout the organisation: time is too precious to be wasted looking backwards, analysing whether the figures are the same or not across departments, the data needs to be used to look forward and help develop and adopt the best strategies for the company as a whole.
This robust piece of quality data to base our decision upon is, to some extent, the so-called “Single version of the truth”. There are several ways to manage your data to make sure you can trust it.
For example, in our previous blog we discussed 5 facts about the importance of connecting our data and planning processes, either within the same platform or across tools linked through Integration Platforms.
Integration platforms allow us to perform transformations in data sets before plugging it into other sources, this is what is known as Extract, Transform and Load, or ETL. But there is another approach, less known, slightly different but equally useful: Extract, Load and Transform or ELT.
Extract, Load and Transform is where the data transformation part is moved to become the last step of the process, being transformed after being loaded into the Target System.
The benefit of following an ELT approach versus an ETL is that data loads occur much faster, since there is no previous transformation work. It goes straight from the source to the target.
The flip side is that you may have to rely on the computational power and data management capabilities of the Target system to perform any required data transformation. And this is where Anaplan kicks in: in Anaplan’s Data Hub (which is the specific “Data Mart” or Data repository that guarantees that all our Anaplan models share the same version of the truth) it is easy is to create and maintain data validation rules and data quality reports, leveraging Anaplan’s calculation speed and modelling flexibility.
Introducing the Case
Employees and SKUs are two typical examples of dimensions that change quite often over time and live in different data sources. When the data sources need to be consolidated, problems can arise.
Let’s look at a practical example of how to use ELT within Anaplan, to create a single version of the truth data set of a SKU portfolio, within a transparent, quick and repeatable way.
- There are two different data sources that we need to be merged to get one single list of SKUs.
- These data sources are providing a list of SKUs that will have unique names and codes.
- Missing Fields
- Duplicated Records
- Same SKU’s have different codes.
- Different SKUs with same codes.
- Incorrect aggregation of data due to duplicated or inaccurate data.
- Growing difficulty to maintain a single version of the truth.
- Deploy a data validation system that makes us aware of the data quality right after loading it in a friendly and intuitive report
- Avoid labour-intensive data health checks, automated and repeatable data cleaning.
- Speed up our processes by being able to amend the data on the spot and let source systems continue operating as usual. This allows faster implementation with minor disruption for the company’s systems and teams. (Implementing a platform like Anaplan for planning, doesn’t mean that you need to modify your naming and coding conventions within data sources to achieve a unique version of the truth).
- However, if changes are to be made in source, Anaplan, is able to generate a report of the faulty data and problematic fields, so data owners are able to solve it in the source very quickly and easily.
7-quick and easy steps to follow
Chances are that the data does not perfectly match, so we will review how to deploy a set of “validation rules” to easily spot and fix data records.
1. Load all the row data, just as it is, from each source into the Data Hub.The Data Hub is a data repository where we place all the relevant data that we will need. You can read more about single Data Hubs here: https://www.anaplan.com/blog/4-reasons-invest-single-data-hub/.
2. Once the data is loaded, we can deploy some logic to scan all the records and flag the following:
- missing Name or Code fields
- unique Name and unique Code fields
- unique combinations of Name & Code
2a. Blanks: specific validation rules can be created within Anaplan and we can validate which data records have blank fields:
This logic will go through all of the records and flag out those fields with missing values, within tenths of a second. In point 4, we will use Anaplan’s filter capabilities to count and display only the problematic records.
2b. Unique Names and Codes. A Similar logic as explained before, to spot blank cells in Name and Code fields, applies to finding the Unique values within fields. Anaplan can mark the first time that a given value appears within a data set, scanning from the top to the bottom of the data set:
By using Booleans, we make the model easy to interpret and maintain for business users, with minor or no IT support at all.
2c. Identify unique combinations of Name & Code. At this step, unique combinations of name and codes are identified.
By combining this unique combination of name and code with the result of the previous rules: “Unique Name” and “Unique Code”, we can identify the data records that contain unique SKU items.
The records which do not comply with all the three criteria, can be easily spotted as flawed records, as is the case with “Diatex Paint Red 2L RP1060” above, which is a SKU with the same name and different codes.
3. Based on these validation rules, it is possible to create an overview of the total records loaded and how many of them present issues based on the previous logic.
4. Based on these validation rules, it is possible to create an overview of the total records loaded and how many of them present issues based on the previous logic.
5. Alternatively, there is a screen available, where all the problematic records are listed and ready to be manually edited. In the example below, we can see records with the same name and different codes. Here we can assign a single code for all the occurrences with the same name.
6. When amending records manually in Anaplan, there is a set of formulas that analyse the new codes or names suggested, to ensure that they are not creating new conflicts with the existing data. When hitting the button “Apply Changes”, we can be sure that not only are all the existing records amended but no new issues are generated.
7. Once again, after ensuring the data is clean and unified, we can download the list with the end results to be shared with other stakeholders in the project. At this step, we have a sound foundation against which to carry out our planning and decision-making processes within the Anaplan models.
In seconds, we have created a process that relies on Anaplan’s capacity to analyse huge amounts of data. to report all those unexpected values which do not comply with our validation rules immediately after loading every new set.
It will provide easy-to-take actions, such as amend the data on the spot or create precise reports to be shared with data owners.
This all translates into faster processes, more transparent data and better decision making with no IT help.
Would you like to learn More about Olivehorse and Anaplan?
You might be wasting a lot of time carrying out manual data consolidation on spreadsheets or amending views and tables from your databases. Anaplan is a highly customisable tool, that allows you to create lots more solutions, other than the example we have used within this blog. As well as being compatible with all the major data integration platforms.
If you would like to know how to achieve a clean and reliable baseline data, ready for running your planning processes or what-if scenarios, please feel free to contact us.
At Olivehorse, we offer a free taster session where we will spend half a day, on site with your team, and we will demonstrate how to improve your business processes and improve your return on investment. Click here to book your Free Anaplan Taster Session.