We start this topic with a Project created, a Data Source and Data Entities have been defined.
We have created and run a Profile against all entities in the AcmeData database.
We have created a Profile stage and run this for the AcmeData Entities.
In this short article we will be:
Looking at transforming data using information provided by the profiling stage.
In previous videos we have covered creating a project, and connecting a data source, defining data entities, creating and executed a profiling stage, and we have performed some data validation.
In this article we are going to be looking at cleansing data using basic replacement on conditions, and then we shall update records using rules, based on the Profiling stage output.
In today’s article we are going to create a new stage, update the transforms within this stage. If you have not yet viewed the previous topics, we recommend that you do so before proceeding.
From the welcome screen, click on Login,
We enter our iData credentials and press Login.
From the Projects page, select the project.
Select the Data Stages for the project.
We’ll create a New Stage, select the Validation and Clean stage, and Click on Create Stage.
Give the stage a name, and selecting Auto Map. We can now save the stage.
Selecting our new stage, we then want to open the transforms.
We shall start off by making a simple modification to the Clients transform age column.
Click on the Client transform to edit it.
Let’s assume that we want to rectify underage client records by setting the age to 18, if it is less than 18. Obviously, we are unlikely to want to do this with production data, but it will demonstrate the principle of how simple it is to create a rule and repair some data.
Take our simple age rule on line 24, uncomment it and modify it to change the data. All we need to do is add Make and setInt, to a value if its less than 18 like this.
Save this and select the main iData browser tab.
To run the stage,
Navigate to the stage, click on Run Stage and Run.
Once again select the View Report when it is enabled, and select the Client report on the Reports tab when it has completed.
The report summary shows us that we have transformed a number of Ages
Selecting the Details tab we can see the changed records and the changes made.
This also provides traceability of what the old and new values were.
We can also add annotations to track any changes to the data.
Selecting the Client transformation browser tab
If we add a line before the Age transformation rule on line 24. We can a TrackRowUpdate statement and mark them as an age change, any following rules that makes changes to the data will mark those records with this annotation.
Like this:
Select the main iData browser tab. Navigate to the stage in the project breadcrumb ribbon, and click on Run Stage and Run.
Once again select the View Report when it is enabled, and select the Client report on the Reports tab when it has completed. Click on the Details tab to view the results.
If we scroll over to the right, and hover our mouse over a record, we can see the comment has been added to the rows changed.
We can also use information directly from the profile to help us clean the data. If we go into the profiler and find our addresses.
Navigating back to Stages on the Projects Ribbon, Selecting our Profile Stage, Selecting Stage Reports, and Clicking on the latest Clean report.
If we open the Addresses report, and navigate to the AddressLine columns UniqueSuffixes row.
Clicking on the link to view the Addressline Suffixes:
We have these entries with the unique suffixes on the end of the road. We can export this to our clean stage and use it in our rules.
To do this we click Export,
Select a stage and chose our Clean stage. Then we press Export to Stage.
We should then see the message ‘The lookup has been successfully exported.’
This will have added this table to the Lookups within our Clean stage.
To view this, navigate back to our Stages from the Project Ribbon, select our Clean stage. Now we select Scripts, in here we have a Lookups folder.
We open this and we can see the table has been imported as a CSV.
Click on the UniqueSuffixes_AddressLine_Lookup link to edit it.
This will open up our editor in a new tab, as before.
This lookup table has the existing suffix, followed by a comma, the expected replacement value, followed by a comma, and finally the count from the profile, which we can ignore.
Not that there is an Asterix (*) to denote a wildcard, which we will need to retain for the suffixes.
We shall update these to standardise our street, road and avenue suffixes like this:
We can now press save and close.
To apply this replacement transform, we need to open the Addresses Transform.
Selecting the iData browser tab, Clicking on Clean stage from the Project Ribbon, selecting Transforms, and clicking on addresses
We can now edit our Addresses Transforms.
To apply the lookup for AddressLine inset a line creating a logical link to the new lookup before line 24, and update line 25 like this:
Press Save.
Now we can rerun our stage and look at the results.
Select the main iData browser tab. Navigate to the stage in the project breadcrumb ribbon and click on Run Stage and Run.
Once again select the View Report when it is enabled and select the Addresses report on the Reports tab when it has completed. Click on the Details tab to view the results.
We can now see the roads where the abbreviated suffixes have been replaced with our defined full values for them.