Learning Objectives
By the end of this lesson you will be able to:
- Describe the process of translating mapping documents into ETL specifications
- Explain how DQR resolutions are reflected in ETL design
- Identify the key considerations for error handling and fallout management in migration ETL
From Mapping to ETL Specification
The mapping template (produced in GAM) is the input to the ETL design. Each mapping entry translates to one or more ETL components:
| Mapping Rule Type | ETL Component |
|---|---|
| Extraction rule | Source query / view definition |
| Exclusion rule | WHERE clause / filter logic |
| Transformation rule | CASE statements / look-up joins / parsing logic |
| Loading rule | Target insert/update statement and sequence |
| Data lineage rule | Audit record insert |
The translation is not always one-to-one. A single transformation rule may require multiple ETL steps. A complex exclusion may depend on the output of a transformation. The ETL design documents these dependencies explicitly.
The ETL Content Matrix
The ETL Content Matrix is the design specification that bridges mapping and build. It documents, for each entity in scope:
- Source query (or reference to it)
- Transformation logic (or reference to the mapping document version)
- Target load specification
- Error handling approach
- Validation checkpoints
- Data lineage record structure
The Content Matrix is under version control and versioned with each release. When a DQR resolution changes a transformation, the Content Matrix is updated and the affected release is incremented.
Incorporating DQR Resolutions
DQR resolutions feed into ETL design in several ways:
In-flight transformations: A resolved DQR adds a transformation step to the ETL. For example, DQR0012 (Equipment Type codes) produces a cross-reference table that is joined in the transformation to standardise the codes.
Exclusion rules: A DQR that concludes “these records cannot be migrated in this release” adds an exclusion rule to the extraction query.
Validation rules: A DQR that defines an acceptable quality threshold (“at least 90% of maintenance records must have a valid Equipment Type before migration”) adds a validation checkpoint to the ETL.
Default values: A DQR that agrees a default value for missing mandatory fields adds a COALESCE or ISNULL logic to the transformation.
Each DQR resolution is traced to the specific ETL component it produced. This traceability is essential for rollback analysis: if a release is faulty, the team needs to know which DQR resolutions are included in it.
Error Handling and Fallout
Migration ETL must have explicit error handling at every stage. The key design decisions:
On extraction errors: A record that cannot be extracted (locked, inaccessible, corrupt) should be logged and flagged as a DQR item, not silently omitted.
On transformation errors: A record that fails a transformation (e.g. cannot be parsed) should fall out to a fallout table with the error code and record identifier. The fallout table is reviewed in the next DQR Board meeting.
On load errors: A record that fails target validation should be logged with the constraint violation. Load errors may indicate a mapping deficiency (DQR item) or a target system configuration issue.
Fallout management: All fallout records are tracked in the DQR process. The fallout count for each entity is reported as a migration readiness metric. A release is not accepted if the fallout count exceeds the agreed threshold.
Testing ETL Designs
ETL designs should be tested before build using:
Desk review: The mapping analyst, the ETL developer, and a business subject matter expert review the Content Matrix together. Business rules that are correct in the mapping may be misinterpreted in the ETL logic.
Sample data test: A small extract of real legacy data is run through the transformation logic. Results are reviewed by the BDE for the relevant LDS. This catches category errors (the transformation produces plausible but wrong results) that pure technical testing misses.
Volume test: Once the ETL is built, a full extract is run to identify performance issues before the trial run.
Key Takeaways
- The ETL Content Matrix bridges mapping documents and the ETL build
- DQR resolutions feed into the ETL as transformations, exclusions, validations, and defaults
- Error handling must be explicit: extraction errors, transformation errors, and load errors each have distinct handling
- ETL designs should be reviewed with business users before build, not just by technical reviewers
Book Reference
Practical Data Migration by Johny Morris (BCS, The Chartered Institute for IT):
- Chapter 12 - Migration Design and Execution