Learning Objectives
By the end of this lesson you will be able to:
- Explain why large data migrations must be decomposed into manageable parts
- Describe the four dimensions of project decomposition in PDM
- Explain the role of data models in decomposition
Why Decompose?
A data migration covering multiple legacy systems, multiple business functions, and millions of records cannot be managed as a single undifferentiated mass of work. The complexity is too great, the interdependencies too numerous, and the risk too concentrated.
PDM breaks large migrations into manageable parts through four dimensions:
- Key Business Data Areas - the business domains being migrated
- Policies - the constraints that shape how each area is migrated
- Migration Form - big bang, phased, or parallel for each area
- Technology - the systems and tools involved
This decomposition is captured in the MSG and refined as the project progresses. It provides the foundation for planning, resource allocation, and risk management.
A good decomposition is meaningful in both technical and business terms. Most of the time it is intuitive: either the project is small enough to do in one go, or the business is already split cleanly by division, geography, or type of sale. But even when the seams are obvious, you still express the breakdown as Key Business Data Areas, because those areas are how you analyse your legacy stores, organise the work, and find the data owner who signs each system off.
Data Models in PDM
Before decomposing, the project needs a common model of the data being migrated. PDM uses four types of model:
| Model Type | Purpose |
|---|---|
| Conceptual Entity Model | High-level view of key data entities and their relationships - used for decomposition |
| Legacy (or Migration) Data Model | The actual structure of legacy data stores - used in gap analysis |
| Target Model | The structure of the target system - owned by the target system team |
| Individual Data Store Model | Detailed model of a specific legacy store - used for internal consistency checks |
The Conceptual Entity Model is the starting point for decomposition. It identifies the top-level business entities - Customer, Product, Equipment, Order, and so on - and shows how they relate to each other. Key Business Data Areas are then defined around groups of related entities. We cover the four models in detail in the entity diagrams lesson.
Choosing a Model Type
PDM favours Entity Relationship Diagrams (ERDs), simple box-and-crow’s-feet notation, because they are:
- Understandable by both technical and business stakeholders
- Sufficient for the level of detail needed in migration planning
- Widely known and tool-independent
Object models and ontological models can be used, but introduce complexity without adding value for most data migrations.
What Gets Decomposed?
The decomposition identifies, for each Key Business Data Area:
- Which legacy data stores contribute to it
- Which business function owns it
- Which policies apply to it
- What migration form is appropriate
- In what sequence it should be migrated (the Unit of Migration)
The DHGS case study illustrates this clearly: Customer data and Equipment data are both in scope, but they have different data owners, different data quality profiles, and different interdependencies, so they are decomposed separately and migrated in a sequence determined by those dependencies.
Key Takeaways
- Large migrations are decomposed along four dimensions: Key Business Data Areas, Policies, Migration Form, and Technology
- A good decomposition is meaningful to the business and the technologists alike
- The Conceptual Entity Model is the analytical tool that enables decomposition
- PDM favours simple ERD notation for models - understandable by all stakeholders
Book Reference
Practical Data Migration by Johny Morris (BCS, The Chartered Institute for IT): Chapter 7, “Metadata and Key Business Data Areas”.