Learning Objectives
By the end of this lesson you will be able to:
- Describe what happens during the ETL build phase in a PDM migration
- Explain how release management governs the build
- Identify the key quality checks before a build release is accepted
📦 The DHGS Story - Episode 5: Cutover weekend. The DHGS build is tested against Whit Bissell’s acceptance criteria, and the migration runs unit by unit across the cutover weekend. The legacy Kelsey Pit spreadsheet is decommissioned only once Whit signs the certificate - the moment the whole project has been working toward since Module 01.
What did DHGS get right, and where did it nearly come unstuck? Module 08 lays the full story out end to end.
What Happens in the Build Phase?
The build phase translates the ETL design (Content Matrix, mapping documents, release plan) into working ETL code. The development team implements the extraction, transformation, and load processes defined in the design, releasing them in fortnightly cycles against the production release plan.
The build phase runs in parallel with ongoing analytical work - DQR items continue to be resolved, new LDS may still be discovered, mapping documents continue to be refined. The release management process governs which resolved items enter each release.
Release-Driven Build
Every piece of ETL code produced during the build phase is tied to a release. A developer does not simply “fix” a transformation and move on - they:
- Identify the mapping document and version that requires change
- Update the mapping document (version increment)
- Implement the change in the ETL code
- Associate the change with the next planned release
- Test the change against sample data
- Document the DQR item or mapping change that drove the development
This discipline exists to ensure that the team always knows exactly what version of the logic produced what output. Without it, the trial runs become untraceable.
Build Quality Checks
Before a release is accepted into the release candidate for trial run, it must pass:
Unit tests: Each ETL component is tested in isolation against sample data. Does the extraction query return the expected records? Does the transformation produce the expected output for a sample of test cases?
Integration tests: The end-to-end ETL pipeline is run against a controlled test dataset. Do record counts match expectations? Do validation checkpoints pass?
Regression tests: Previously passing test cases are re-run against the new release to confirm that the new changes have not broken existing functionality.
Mapping review: The relevant BDE confirms that the transformation output for a sample of records looks correct from a business perspective. Technical tests pass; business review confirms the data makes sense.
The DQR-Build Loop
The relationship between the DQR process and the build phase is iterative:
DQR Board → Resolution agreed → Mapping updated → Build release planned
↑ ↓
Fallout reviewed ← Trial run ← Release accepted ← Build release tested
Each trial run produces fallout records - records that failed transformation or load. These are reviewed by the DQR Board. New DQR items are raised if the fallout reveals previously unknown issues. Resolutions feed into the next release.
This loop continues until the fallout rate is below the agreed threshold and all Priority 1 DQR items are resolved.
Managing Build Debt
“Build debt” accumulates when DQR items are resolved with workarounds rather than correct fixes - for example, applying a default value to a field that should be properly populated. PDM does not prohibit workarounds (Golden Rule 3: no perfect quality needed), but the workarounds must be:
- Documented in the mapping as intentional
- Agreed by the Data Owner (through the DQR process)
- Traceable through the data lineage record
Undocumented workarounds become invisible technical debt that resurfaces at the worst possible moment.
Key Takeaways
- The build phase is governed by the release management process - every code change is tied to a versioned release
- Build quality checks include unit tests, integration tests, regression tests, and business mapping reviews
- The DQR-build loop is iterative: each trial run produces fallout that feeds the next DQR resolution cycle
- Build workarounds must be documented, agreed, and traceable - not silently applied
Book Reference
Practical Data Migration by Johny Morris (BCS, The Chartered Institute for IT):
- Chapter 12 - Migration Design and Execution