How Does iData Perform End-2-End Process Obfuscating Production Data into Test Environments?

How Does iData Perform End-2-End Process Obfuscating Production Data into Test Environments?

Problem

Given an organisations large scale dependency on use of production data for testing (due to complexities/nuances), a proven and trusted method of obfuscation, without compromising data integrity, is vital. How does iData’s obfuscation tool perform this?

Solution

How iData Obfuscation Works

iData has a large selection of Obfuscation functions built in. With the flexibility of generating seed keys, or referencing table identity keys as seed, we provide a no-way-back approach to generating truly obfuscate or masked data in the none-production systems.
The majority of these functions and those for data generation do not use any of the original field to generate a new obfuscated value. Instead, the new value is generated by combining a seed value, such as the rows ID, with some criteria such as a numeric range, a character template pattern or a regular expression. iData then uses the seed value to randomly generate a completely new value that matches the criteria. The original value of the field is not recoverable because the original value was not used in any way to generate the obfuscated value. The seed value is usually set from the primary key of the table so that the output of each row is consistent from one run to the next. 
Data has specific functions for generating completely new personal details such as names email addresses phone numbers and addresses. These fields are populated by randomly selecting values from a pool of names and address components and combined together to give a unique result. Once again the values output have no relationship to the original values and so there is no method of reversing the output to discover what the original values were.

By design there are obfuscation functions that do use some information from the original field, but provide customisation through masking functions, that can permit part of the original value to be present in the output, users can specify how much of the original field should still be accessible as required. The masked part of the field is replaced with a fixed character and so anything in this part of the field is unrecoverable. However, it is down to the configurator to set these approaches up, and choose what can remain visible and what needs to be masked.

Also by design, there are two obfuscation functions that do use some information from the original field. In these cases the original value is used to create a template for generating new values e.g. “abcd-5678” is converted to a pattern “AAAA-9999” and then this pattern is used to randomly select new characters, numerals or symbols; the only information that is revealed is the original format of the field. 

Clarity on Hashing, Encryption and Tokenisation

With a hash or tokenisation functions the original field value is used and influences the outcome, so that an original input value produces a specific and unique output value.

When iData processes in the data through the library of obfuscation functions, we do not tokenise, hash or encrypt any of the values, but replaces the original value with a new value selected at random. For example the input value from the field e.g. password is thrown away, it has no influence on the output value so 'Password123' -> 'abcd' on one record and 'Password123-> 'cdef' on another.

This does not mean to say that an iData users could decide to build a custom transformation to do so, but using the documented features of iData will ensure the data is not exposed to any potential reverse engineering techniques.

Maintaining Referential Integrity

By selecting a consistent seed value from one table to the next (for example a Client ID across both Client and Orders), we ensure that the randomiser will use the same details to maintain values duplicated across tables.

Security Compliance

The anonymisation techniques provided by iData can provide privacy guarantees and may be used to generate efficient anonymisation processes in line with GDPR best practice, but only if their application is engineered appropriately – which means that the prerequisites (context) and the objective(s) of the anonymisation process must be clearly set out in order to achieve the targeted anonymisation while producing some useful data. The optimal solution should be decided on a case-by-case basis, possibly by using a combination of different techniques.

Even when sensitive fields are completely removed, there still exists a risk of identifying potential record matches by linking data to other sources.  iData can provide the capability to generate data to augment the data sets to overcome this potential risk. 

iData will provide this level of assurance if the transformation and governance rules are applied by the iData user correctly.
    • Related Articles

    • Introduction to Synthetic Data Generation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. This will require the AcmeData source to be defined. Video Introduction In this short article we will be: Creating a new Data Generation ...
    • Introduction to Data Sources and Data Entities

      Prerequisites User access to iData Single, empty project Access to the local server data ‘localhost\SQLEXPRESS’ database ‘AcmeData’ Video Introduction In this short article will be: Using an existing project Adding a data source for a SQL Server ...
    • Introduction to Similar Records

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We will be using the AcmeData project we created in previous articles. Video Introduction In this short article we will be: Creating a ...
    • Introduction to Data Validation

      Prerequisites We start this topic with a Project created, a Data Source and Data Entities have been defined. We have created and run a Profile against all entities in the AcmeData database. Video Introduction In this short article will be: Creating a ...
    • How Can iData Be Used To Provide Quality Assurance In A Data Migration Project?

      Problem Data migration projects are often protracted and complex affairs. Comprehensive quality assurance activities are an essential part of such projects, in order to prove that the output is acceptable to all stake holders, and is thus fit for its ...