Introduction to Obfuscation with iData

Introduction to Obfuscation

Prerequisites

We start this topic with a Project created, a Data Source and Data Entities have been defined.

Video


Introduction

In this short article we will be:

Creating a new Validate and Clean stage

Editing transformation rules to:

  1. Replace data with random characters
  1. Replace values within ranges
  1. Replace values with random words
  1. Replace email addresses with simulated emails
  1. Replace email addresses using obfuscated name fields
  1. Replace values using patterns
  1. Partially mask fields
  1. Maintain data integrity in field values across tables


We recommend that you cover the previous topics before reading this article.

Steps

Today we are going to be looking at the obfuscating, and masking data.


In previous articles we have covered topics such as Project Creation, Data Sources and Data Entities. We recommend that you cover these topics as we will be using the functionality demonstrated within them in this article.


In this article we will be using a project and data source we have created in a previous topic. To start off we need to log on and select the project.


Logging In

From the welcome screen press the Login button taking us to the Login screen.

We enter our iData credentials and press Login.


Open the AcmeData project.

Create a New Stage

We are going to be creating a new data processing stage.

Click on Data Processing Stages and then New Stage. 


Data Masking and Obfuscation use the stage type of Validate and Clean. Select the Validate and Clean Stage tile.

Then press ‘Setup new processing stage’.

Give the stage a name.

We will be looking at Orders and Clients, so we will just drag these into the mapping.



If we had different source and destination tables, we could map their fields by clicking on the ‘Column Mapping’ button. 

For example


Click on Save.


We can now see our new stage listed. 



Simple Masking of the Data

Against our new Obfuscate stage, click on the Actions and select Edit Transforms.

We shall now update the Client table to add some Obfuscation transformations.


Click on dbo_Client.js to open the Editor.



IData has a large range of transformations to make obfuscation easy. We can do things like masking a field, we can give is a mask character, and tell it where we want masking to start. Lets take a look at this on line 31.

  1.     stream.Make($PAN).Mask('*',4);

This will mask everything from the 5th character onwards.

If we use a negative value it masks from the end of the field. 


We could also MaskExcept, so in this example we can mask all but the last four digits of our account number.

  1.    stream.Make($PAN).MaskExcept('*',-4);

Like this:

We can also do things like taking the comments and applying and Obfuscate to them, which will hide the values. Like this:

  1.     stream.Make($Comments).Obfuscate();

Like this:


And we could also replace a value such as age with a random number which matches a valid range between 18 and 99:
  1.    stream.Make($Age).RndInt(18,99);
Like this:


When we run this we get a new copy of the data with those values obfuscated.

To do this press Save.

Now select the main iData browser tab, select the Stage from the Project ribbon, and click on Run Stage, then Run.

When the View Reports button is enabled, select View Reports and choose Client from the Reports tab, and then select the Details tab.

The summary shows we can ow see changes to Age, Comments and PAN. 

Taking a look at the details, we can see some different values. Pan has been changed, Age has been updated, and we have changed the content of the comments.

More Complex Masking and Built-in Tags

To demonstrate more obfuscation functions, we’re going to make some more changes to the Client transformation.

Click on the browser tab for the Client Transformation editor. 

Updating the transform.

  1.    stream.Make($AddressID).RndIntRange(1,100);
  2.    stream.Make($Age).RndIntRangeNormalized(18,100,45,10);
  3.    stream.Make($Comments).RndWords(50);
  4.    stream.Make($CompanyID).RndIntRange(1,100);
  5.    stream.Make($Email).TagAsEmail();
  6.    stream.Make($Firstname).TagAsPersonFirstName();
  7.    stream.Make($JoinedDate).RndDateRange('1/1/2010','1/1/2020').ConvertToText("d","en-UK");
  8.    stream.Make($PAN).RndPatterns("99999999999");
  9.    stream.Make($Phone).TagAsPhoneNumber();
  10.    stream.Make($Surname).TagAsPersonSurname();
  11.    stream.MakeRndPersonCols();
Like this:


As we have discussed in other articles iData has a knowledge of certain data types, and the transformations can handle these for us using tagging.
We can tag rows as types of data that iData has built-in methods of transforming. Here we have tagged email so iData knows it has email properties, the firstname has been tagged as a first name and the surname has been tagged as a surname. 

And we have this additional command here at the end which takes those tagged fields and generates some personal details randomly, but are interrelated, so we should see that the email address uses components of the firstname and surname.

Press Save to save the transform, and we can run the stage again.

Rerun the Report

Select the Stage from the Project ribbon, and click on Run Stage, then Run and View Report.

As before, select the iData main browser tab, select Stages, and to run the stage click on the Action and Run.

Click on Run again. 

Once gain the initiator will kick off the worker processes against each entity. One it’s started we can click on the View Report button.

The status of each entity transformation will change to a tick. When the Client has been completed, we can Click the Reports tab, and select dbo_Client.

We can see we have generated synthetic firstname and surname for each of the clients and generated an email address that uses those values.

Maintaining Data Integrity Across Entities

In the previous example we generated an email address in our client table. If we make the same change in our orders table, there is an email address here as well. If we apply some obfuscation here, we would want that to match up with the entries we have in our client table.

Select the Obfuscate stage from the Project ribbon and click on Transforms. Select the dbo_Orders.js transform to edit it.



If we update line 25 to tag the email address and add the command to the end to generate tagged data like this:

  1.    stream.Make($Email).TagAsPersonEmail();
  2.    stream.MakeRndPersonCols();
To look like this:

Saving the transform, we can run this again from the iData main browser tab as before.


If we look at our orders table, let’s pick out client id number 345297 and this has been obfuscated to Holy Walker. 

When we then navigate to our client table sorting by ID, and page 20, and find client id number 345297 we can see that the email address is different. 

This is not what we want. The reason this happened is because iData uses the primary key of the table as a key for it’s obfuscation. In the orders table, the primary key is order id and the Client Table has a primary key of ClientID so we end up with different results.

Selecting the dbo_Orders.js transform editor browser tab.

If we make a small adjustment to our transform we can bring things into line.

  1.    stream.SetGenerationSeedCols('ClientID');
We can inset the line at line number 23 like this:

Save the transform and rerun the stage from the iData main tab once more.


Once we open the Orders detail report

And we open the Client Detail report and sort to find Client with Id 345297

Now both the client and the order are placed for Sophia.Amin@aol.com.

The only reason these entries match is because they share a primary key. There is nothing of the original data that we have used to generate these values, these are synthetically generated.

    • Related Articles

    • Introduction To Profiling

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. In this Topic we have AcmeData database defined as the Data Source and all tables imported as Data Entities.  Video Introduction In this ...
    • Introduction to Similar Records

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We will be using the AcmeData project we created in previous articles. Video Introduction In this short article we will be: Creating a ...
    • Introduction to Data Validation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We have created and run a Profile against all entities in the AcmeData database. Video Introduction In this short article will be: Creating a ...
    • Introduction to Synthetic Data Generation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. This will require the AcmeData source to be defined. Video Introduction In this short article we will be: Creating a new Data Generation ...
    • Introduction to Comparison and Assurance

      Prerequisites We start this topic with a Project created Within the project we have created AdventureWorks_Old as a Data Source and Data Entities have been defined. Video Introduction In this short article we will be: Adding a New Datasource and ...