Learning Objectives
By the end of this lesson you will be able to:
- Explain what Gap Analysis & Mapping produces and how it fits within the PDM process
- Identify the five types of mapping rule in PDM
- Describe the One Way Street problem and its implications
What Is GAM?
Gap Analysis & Mapping (GAM) is the process of mapping legacy data structures to target data structures, identifying the gaps, and producing the extraction, transformation, and load rules that the ETL will execute.
GAM sits between Landscape Analysis (which documents what exists) and Migration Design & Execution (which builds the ETL). It is the analytical bridge from “here is the data we have” to “here is how we move it.”
Inputs to GAM
GAM cannot begin until the following are available:
- Legacy Data Store list - the output of LA
- Legacy data models - the structures of the legacy LDS
- Migration model - the consolidated view of legacy entities
- Target model - the structure of the target system
- Master Data Management decisions - how shared entities are resolved
- System Retirement Plans - the data owner requirements that constrain the mapping
- Key Data Stakeholders - available to answer business questions about the data
Types of Gap
GAM identifies three types of gap:
Data Model Gaps
- Internal inconsistencies - the same entity represented differently in different LDS (e.g. Customer ID format differs between the CRM and the Finance system)
- Legacy model inconsistencies - the legacy model does not reflect business reality (e.g. a field documented as “optional” is actually mandatory in practice)
- Target model inconsistencies - discovered when the target model is compared with what the legacy data actually contains
Topographical Gaps
- Gaps that only become visible when the actual data volumes and distributions are examined - fields that are theoretically populated but practically empty; relationships that are theoretically enforced but practically broken
The Five Mapping Rule Types
PDM defines five types of mapping rule:
| Rule Type | Description |
|---|---|
| Extraction | From where: navigation from source to data, selection criteria |
| Exclusion | What is not migrated: scope boundaries, policy-driven exclusions, DQR fallout |
| Transformation | How: look-ups, external data sources, parsing, combining, data type conversions |
| Loading | To where: destination, sequence, loading method |
| Data Lineage | Audit trail: the rule that tracks where each piece of migrated data came from in the legacy |
All five types are documented in the mapping template. Missing any one of them creates gaps in the ETL specification that will surface as errors during build and test.
Types of Fix
When gaps are identified, the mapping must specify how they are resolved:
- Manual audit (business user reviews and corrects)
- Manual fix (in the legacy system before migration)
- Automated fix (in-flight during extraction, transformation, or load)
- Third-party data enrichment
- Exclusion (the record falls out and is handled by Transitional Business Processes)
- Post-migration fix (accepted as a target-system data quality issue)
- Discovery of a missing LDS (the gap is explained by data being held elsewhere)
The One Way Street Problem
The One Way Street problem is one of the most important concepts in GAM.
A One Way Street is a series of operations on legacy data that makes it impossible to get back to the original legacy data item from the data stored in the Target.
Example: if a legacy system stores customer names as “First Last” and the target system stores them as separate first name and last name fields, and the ETL parses them - then after migration, the original concatenated form is lost. If fallback becomes necessary, the original data cannot be reconstructed from the target.
The One Way Street problem impacts two PDM functions:
- Fallback - if fallback requires reverting the target to the exact original state, a One Way Street makes this impossible
- Audit/Data Lineage - if regulators require traceability from target data back to source data, a One Way Street breaks that chain
Identifying One Way Streets during GAM - before the ETL is built - allows the team to either avoid them (by keeping the original data elsewhere) or accept them with full awareness of the consequences.
Key Takeaways
- GAM produces the five types of mapping rule (Extraction, Exclusion, Transformation, Loading, Data Lineage) for each legacy data store
- Three types of gap are identified: data model gaps (internal, legacy, target), topographical gaps
- Fixes can be manual, automated, or accepted; each has different implications for timeline and risk
- The One Way Street problem arises when transformation operations destroy the ability to trace migrated data back to its source - it affects fallback and data lineage
Book Reference
Practical Data Migration by Johny Morris (BCS, The Chartered Institute for IT):
- Chapter 11 - Gap Analysis and Mapping