Learning Objectives
By the end of this lesson you will be able to:
- Describe how DHGS approached the landscape analysis phase
- Explain how the Gap Analysis Mapping was applied to the DHGS equipment data
- Identify the main data quality issues that the DQR process surfaced
- Understand how the Key Data Stakeholder Map structured business engagement
The Analysis Phase
Strategy tells you what you are doing. Analysis tells you what you are working with. For DHGS, the analysis phase was where the true scale of the data quality challenge became visible.
The analysis phase had four main workstreams running in parallel:
- Landscape Discovery - cataloguing all source systems and confirming what data they held
- Gap Analysis Mapping - mapping source fields to target fields and identifying transformation requirements
- Data Quality Assessment - profiling the data to surface issues that would prevent successful migration
- Key Data Stakeholder Mapping - identifying who in the business owned each area of data and who had the authority to make decisions about it
None of these could wait for the others to finish. The team ran them concurrently, feeding findings between workstreams as they emerged.
Landscape Discovery
The Landscape Discovery Survey (LDS) for DHGS confirmed what the initial scoping had suggested: the data was spread across more systems than anyone had a clear picture of.
The primary source - the KP Equipment Table - was well understood by the technical team but less well understood by the business. Several fields in the table had meanings that were not documented and were known only to specific individuals who had been using the system for years. Capturing that knowledge was as important as profiling the data itself.
Beyond the primary source, the landscape analysis identified:
- Departmental spreadsheets used as working references for field teams, some of which contained updates that had never been reconciled back to the central system
- A historical archive system that was the only source for certain older service records
- Reference data maintained by a third party, where the DHGS team had no direct access to modify values
For each source, the LDS recorded: what it held, who owned it, how it was maintained, what its data quality was known to be, and whether it would be decommissioned after migration. This last point matters because a source that will not be decommissioned is a potential ongoing data divergence problem - the migration does not end the relationship between the two systems.
Gap Analysis Mapping
The Gap Analysis Mapping (GAM) was the core technical document of the analysis phase. It mapped each source field to its target equivalent, defined the transformation rules required, and flagged the fields where source and target did not align cleanly.
For the DHGS equipment data, the GAM was built from the KP Equipment Table as the primary source and the Target Equipment Register as the destination.
Some of the key mapping challenges included:
Equipment type codes. The source used a code set that had evolved over time and contained deprecated values, duplicates, and codes whose meaning had changed. The target required a clean, controlled list. The GAM flagged this as requiring a full reconciliation between source and target code sets, with a business-approved mapping table.
Location identifiers. The source used free-text location descriptions in some records and structured codes in others. The target required structured codes only. Records with free-text locations could not be transformed automatically - they required manual review or a default assignment.
Status fields. The source had multiple overlapping status indicators that had been used inconsistently. The target had a single status field with a defined set of values. The GAM documented the decision logic for deriving the target status from the combination of source fields - a logic that required business sign-off before it could be implemented.
Mandatory fields with no source equivalent. Several fields that were mandatory in the target had no equivalent in the source. For some of these, a default value could be applied. For others, the data needed to be sourced from elsewhere or entered manually.
Each of these issues was captured in the GAM with a status: open, agreed, or resolved. The project team tracked the open issues as a workstream in their own right.
Data Quality Profiling
Running alongside the GAM was a systematic data quality profiling exercise against the KP Equipment Table. The profiling looked at:
- Completeness - were mandatory fields populated?
- Conformance - did values conform to the expected format and reference lists?
- Consistency - were related fields internally consistent?
- Uniqueness - were there duplicate records?
The profiling results were not encouraging. A significant proportion of records had at least one data quality issue. Some issues were straightforward - empty fields that could be populated with defaults. Others were structural - records that referenced locations that no longer existed, or equipment types that had been retired from the reference list.
The most significant finding was the volume of records with non-conforming equipment type codes. This was not a small clean-up job. It required a systematic reconciliation of every equipment type in the source against the target reference list, with a business decision required for each code that did not map directly.
The DQR List
The findings from profiling and mapping were consolidated into a Data Quality Rules (DQR) List. Each item on the list was a defined data quality issue, with:
- A reference number
- A description of the issue
- The volume of records affected
- The source fields involved
- The proposed resolution
- The business owner responsible for approving the resolution
- A status
Two DQR items from the DHGS project illustrate the range of issues the team dealt with:
DQR0004 - Pricing Errors. A subset of equipment records carried price data that was clearly incorrect - values that were either zero, negative, or implausibly high. These records could technically migrate, but they would create problems in the target system’s reporting and contract management functions. The resolution required the business to review affected records and either correct the values or flag them for post-migration review.
DQR0012 - Equipment Type. Records using deprecated or unrecognised equipment type codes. The volume was large enough that an automated remapping approach was needed, but the mapping itself required business approval. The resolution was a mapping table, produced by the business data owners, which the transformation rules would apply during load.
The DQR List became the primary tool for managing the IT-Business interface during the analysis phase. Rather than an ad hoc stream of queries going back and forth between the technical team and the business, the list gave both sides a structured, prioritised view of what needed to be resolved and who was responsible for resolving it.
The Key Data Stakeholder Map
Knowing who owns the data is a prerequisite for getting decisions made. The Key Data Stakeholder Map for DHGS identified a business data owner for each of the Key Business Data Areas, along with the individuals who had operational knowledge of specific parts of the dataset.
For the equipment data, this was more complex than it might appear. Different parts of the equipment estate were owned by different teams. Field operations owned the day-to-day accuracy of equipment locations and status. A central asset management team owned the reference data. Finance had an interest in the pricing fields. Getting decisions made on DQR items that cut across these boundaries required the project team to facilitate rather than simply request.
The stakeholder map also recorded availability - who was available when, and who their deputy was for migration decisions if they were unavailable. In a project running to a fixed deadline, a key decision-maker being unreachable for two weeks is a real risk. The stakeholder map made it possible to plan around it.
Business Transformation Register
One element of the DHGS analysis that deserves particular attention is the Business Transformation Register (BTR). This is the record of planned changes to the business that the migration needs to reflect.
For DHGS, the BTR captured changes such as:
- Sites that were being merged or renamed before go-live
- Equipment categories that were being reclassified
- Organisational changes that affected the customer hierarchy
The BTR meant that the migration team was not simply moving the current state of the data - they were moving a defined future state. Transformation rules had to encode not just the mapping from source to target fields, but also the business changes that would apply at go-live.
This added complexity to the GAM and the ETL design. But it also meant that the target system went live with data that was current, rather than data that reflected a business structure that no longer existed.
Key Takeaways
- The landscape analysis confirmed that DHGS data was spread across more systems than initially understood, with significant implicit knowledge held by individuals rather than documented
- The Gap Analysis Mapping identified transformation challenges around equipment type codes, location identifiers, status fields, and mandatory fields with no source equivalent
- Data profiling revealed a substantial proportion of records with quality issues - the most significant being non-conforming equipment type codes at scale
- The DQR List structured the remediation workload and gave both the technical team and the business a shared, prioritised view of what needed to be resolved
- The Key Data Stakeholder Map enabled the project team to direct the right questions to the right people, and to plan around availability constraints
- The Business Transformation Register ensured the migration reflected the future state of the business, not its current state