Data Migration Glossary
77+ data migration terms defined — from ETL and CDC to agentic AI and zero-downtime migration.
A
- Agentic AI
- An AI system that can autonomously reason, plan, and execute multi-step tasks using tools and memory, without requiring human instruction at each step. In data migration, agentic AI handles profiling, mapping, and validation independently. Agentic Data Migration Guide →
- Agentic Data Migration
- A migration approach that uses autonomous AI agents — each specialised in a specific phase (profiling, mapping, transformation, reconciliation) — to execute the full migration lifecycle without human intervention at each step. What is Agentic Data Migration? →
- API Migration
- The process of moving data via application programming interfaces rather than direct database connections. Common when source systems expose APIs but not direct database access.
- Audit Trail
- A chronological record of all actions, changes, and decisions made during a data migration. Required for compliance in regulated industries (SOX, HIPAA, GDPR).
- Automated Schema Mapping
- The use of AI or rules-based algorithms to automatically match fields in a source schema to corresponding fields in a target schema, reducing or eliminating manual mapping effort.
B
- Batch Migration
- A migration approach that moves data in discrete groups (batches) at scheduled intervals rather than continuously. Suitable for non-time-sensitive data where brief inconsistency is acceptable.
- Big Bang Migration
- A migration strategy that moves all data in a single cutover event, typically during a maintenance window. Offers simplicity but involves downtime and high risk.
- Bulk Load
- A high-throughput method of inserting large volumes of data into a target system, bypassing row-by-row processing. Examples include PostgreSQL COPY, SQL Server BCP, and Oracle SQL*Loader.
C
- CDC (Change Data Capture)
- A technology that tracks and captures every change (insert, update, delete) made to a source database in real-time, enabling continuous synchronisation with a target system during migration. Zero Downtime Migration Guide →
- Cardinality
- The number of distinct values in a column relative to the total number of rows. High-cardinality columns (e.g. user IDs) have many unique values; low-cardinality columns (e.g. status flags) have few.
- Checksum Validation
- A data verification technique that computes a hash or aggregate value for source data and compares it to the same computation on migrated target data to confirm completeness and accuracy.
- Cloud Data Migration
- The process of moving data, applications, or workloads from on-premises systems to cloud platforms such as AWS, Azure, or Google Cloud. Includes lift-and-shift, re-platforming, and refactoring approaches. Cloud Migration Guide →
- COBOL Copybook
- A COBOL source file that defines the data layout (record structure) for mainframe flat files. Essential for migrating mainframe data, as the copybook is required to parse binary field formats correctly.
- Column-Level Reconciliation
- A post-migration verification that checks aggregate statistics (sum, min, max, average) for each column in the target against the source to confirm data was not distorted during transformation.
- Cutover
- The moment at which traffic and application access is switched from the source system to the target system, completing a data migration. A well-executed cutover takes seconds to minutes.
D
- Data Cleansing
- The process of detecting and correcting inaccurate, incomplete, duplicate, or improperly formatted data before or during migration. Also called data scrubbing or data quality remediation.
- Data Dictionary
- A document or database that describes the structure, format, business meaning, and relationships of all data elements in a system. Generated automatically by Profile AI during source profiling.
- Data Integration
- The ongoing process of combining data from multiple sources into a unified view for analytics or operational use. Distinct from data migration (which is typically a one-time or periodic transfer).
- Data Lake Migration
- Moving raw, unstructured, or semi-structured data into a cloud-based data lake (e.g. AWS S3, Azure Data Lake Storage) for analytics and ML workloads.
- Data Lineage
- The documentation of where data originated, how it was transformed, and where it currently resides. Critical for compliance and debugging in complex migration projects.
- Data Mapping
- The process of creating relationships between fields in a source data model and fields in a target data model. Can be performed manually, rule-based, or by AI agents. Automated Migration Tools →
- Data Migration
- The process of transferring data between storage systems, formats, or computing environments. Encompasses planning, extraction, transformation, loading, and validation to ensure complete data integrity. What is Data Migration? →
- Data Profiling
- The automated analysis of source data to understand its structure, quality, completeness, and relationships before migration begins. Performed by Profile AI in the DataMigration.AI platform.
- Data Quality
- A measure of how well data meets the requirements of its intended use. Dimensions include accuracy, completeness, consistency, timeliness, uniqueness, and validity.
- Data Reconciliation
- The process of comparing source and target data after migration to verify accuracy, completeness, and integrity. Can be performed at row, column, or aggregate levels. What is Data Migration? →
- Data Transformation
- The conversion of data from a source format or structure into a target format, including type changes, field renaming, business rule application, and derived field calculation.
- Data Validation
- The process of checking that migrated data conforms to defined rules, formats, and constraints in the target system. Includes null checks, range validation, referential integrity, and business rule verification.
- Data Warehouse Migration
- Moving a data warehouse from one platform to another — for example, from Teradata to Snowflake or from SQL Server to BigQuery. Involves schema translation, query conversion, and ETL pipeline migration.
- Database Migration
- The transfer of data and schema from one database management system (DBMS) to another, or between versions of the same DBMS. Includes heterogeneous (cross-platform) and homogeneous (same platform) migration. Database Migration Guide →
- Deadlock
- A situation during migration where two or more database transactions are waiting for each other to release locks, causing both to stall indefinitely. Automated migration tools detect and resolve deadlocks through retry logic.
- Deduplication
- The identification and removal of duplicate records from a dataset during migration. Cleanse AI performs deduplication using both exact-match and fuzzy-match algorithms.
- Delta Migration
- A migration that moves only the data that has changed since the last migration run, rather than re-migrating the entire dataset. Enabled by CDC or timestamp-based change detection.
E
- ETL (Extract, Transform, Load)
- A data integration pattern that extracts data from source systems, transforms it into the target format, and loads it into the destination. The most common pattern for data migration and data warehousing. What is Data Migration? →
- ELT (Extract, Load, Transform)
- A variation of ETL where raw data is loaded into the target first and transformations are applied within the target system. Common in modern cloud data warehouses like Snowflake and BigQuery.
- Eventual Consistency
- A data consistency model used in distributed systems and zero-downtime migrations where the source and target may be temporarily out of sync but will converge to the same state over time.
F
- Failover
- An automatic switching mechanism that redirects application traffic from a failed or unavailable primary system to a standby system. Used in zero-downtime migrations to ensure business continuity during cutover.
- Foreign Key
- A database constraint that enforces referential integrity between tables. Migration tools must preserve foreign key relationships or temporarily disable them during bulk loading.
- Full Load
- A migration approach that copies the entire source dataset to the target in a single operation. Used for initial migration or when the source system is taken offline during migration.
G
- GEO (Generative Engine Optimisation)
- The practice of structuring web content to be accurately cited and summarised by AI-powered answer engines such as ChatGPT, Google AI Overviews, Perplexity, and Microsoft Copilot.
- Governance
- In data migration, governance refers to the policies, procedures, and controls that ensure data quality, security, compliance, and accountability throughout the migration lifecycle.
H
- Heterogeneous Migration
- A migration between different database platforms — for example, Oracle to PostgreSQL, SQL Server to Snowflake, or DB2 to MySQL. Requires schema translation, data type mapping, and often query rewriting.
- Homogeneous Migration
- A migration between instances of the same database platform — for example, Oracle 11g to Oracle 19c. Generally simpler than heterogeneous migration but still requires version-specific handling.
I
- Incremental Migration
- A strategy that moves data in phases or increments rather than all at once. Reduces risk, allows parallel validation, and minimises impact on source system performance.
- Indexing Strategy
- The plan for when and how database indexes are created on the target during migration. Building indexes after bulk loading is typically 5–10x faster than maintaining them during the load.
L
- Legacy System Migration
- The migration of data from outdated systems — mainframes, COBOL applications, AS/400 platforms, or unsupported databases — to modern infrastructure. Requires specialised tools to parse proprietary data formats.
- Lift and Shift
- A cloud migration strategy that moves applications and data to the cloud without modification. Fastest but typically does not realise the full cost or performance benefits of cloud-native architecture.
- Log-Based CDC
- A change data capture method that reads the database transaction log (redo log, WAL, or binlog) to capture changes without impacting source system performance. The most reliable CDC method for production databases.
- LLM (Large Language Model)
- A machine learning model trained on large text datasets that can understand and generate human language. LLMs are the reasoning engine behind agentic AI migration systems like DataMigration.AI.
M
- Mapping Specification
- A document or structured file that defines the field-level relationships between source and target schemas, including transformation rules, default values, and data type conversions.
- Migration Complexity Score
- A quantitative rating assigned by Profile AI to each source table based on factors including row count, column count, data type diversity, quality issues, and relationship depth.
- Migration Cutover Plan
- A detailed sequence of steps, responsibilities, and rollback procedures for the final switch from source to target system. A well-designed cutover plan takes 30 minutes or less to execute.
- Migration Downtime
- The period during which the source system is unavailable while data is being moved to the target. Zero-downtime migration techniques aim to eliminate this window entirely.
- Migration ROI
- The return on investment of a data migration project, calculated as cost savings and business value gains divided by total migration cost. AI-powered migrations typically achieve a 2–4x ROI improvement over manual approaches.
N
- Null Handling
- The rules governing how NULL (missing) values in the source are treated during transformation and loading. Incorrect null handling is a common source of data quality failures in migration.
O
- OLAP (Online Analytical Processing)
- Database workloads optimised for complex analytical queries across large datasets. Data warehouse migrations (e.g. to Snowflake or BigQuery) target OLAP systems.
- OLTP (Online Transaction Processing)
- Database workloads optimised for high-throughput insert, update, and delete operations. Most source database migrations originate from OLTP systems.
P
- Parallel Migration
- A migration approach that runs multiple data streams simultaneously to increase throughput and reduce total migration time. Transform AI uses parallel execution to maximise load performance.
- Phased Migration
- A strategy that migrates data in logical phases — often by business unit, geography, or data domain — rather than all at once. Reduces scope and risk per phase.
- Primary Key
- A column or set of columns that uniquely identifies each row in a database table. Preserving primary keys accurately is critical to maintaining referential integrity after migration.
- Profile AI
- DataMigration.AI's source profiling agent. Automatically discovers schemas, data types, relationships, quality scores, and migration complexity ratings without manual configuration.
R
- Re-platforming
- A cloud migration strategy that moves applications to the cloud with minor optimisations (e.g. switching from SQL Server to Azure SQL Managed Instance). Between lift-and-shift and full refactoring.
- Reconciliation Report
- A document produced after migration that certifies the completeness and accuracy of migrated data by comparing source and target at row, column, and aggregate levels. What is Data Migration? →
- Referential Integrity
- A database constraint that ensures relationships between tables remain consistent. Foreign keys must reference valid primary keys. Migration tools must handle referential integrity during load order sequencing.
- Rollback Plan
- A documented procedure for reverting a migration if critical errors are discovered after cutover. A robust rollback plan is a prerequisite for any production migration.
- Row Count Validation
- A basic reconciliation check that compares the number of rows in each source table to the corresponding target table after migration. A necessary but insufficient check on its own.
S
- Schema
- The logical structure of a database, including tables, columns, data types, indexes, and constraints. Schema migration is the process of recreating this structure in the target system.
- Schema Drift
- Unplanned changes to a source schema during migration — such as added columns or changed data types. Agentic migration systems detect and adapt to schema drift automatically.
- Schema Mapping
- The process of defining relationships between source and target schema elements. AI-powered schema mapping achieves 95%+ automation accuracy versus 0% for manual approaches. Automated Migration Tools →
- Shadow Migration
- A technique where the new system runs in parallel with the old system for a period, with writes going to both, allowing validation before full cutover. Enables safe zero-downtime migration.
- SQL Generation
- The automatic creation of SQL statements (INSERT, CREATE TABLE, UPDATE) from a migration specification. AI agents generate optimised SQL based on the target platform's dialect and performance characteristics.
T
- Table Partitioning
- A database technique that divides large tables into smaller, more manageable segments based on a partition key (e.g. date range). Migration tools must preserve or recreate partitioning strategies.
- Transformation Rule
- A business logic statement that specifies how source data should be modified during migration — for example, converting a date format, concatenating name fields, or deriving a category code.
U
- Upsert
- A database operation that inserts a new record if it does not exist, or updates the existing record if it does. Used in incremental and delta migrations to apply changes without full reloads.
V
- Validation Rule
- A constraint that data must satisfy to be considered valid in the target system — such as 'email must contain @', 'age must be 0–150', or 'account balance must be non-negative'.
W
- WAL (Write-Ahead Log)
- PostgreSQL's transaction log mechanism, used by CDC tools to capture database changes for replication and zero-downtime migration. Equivalent to Oracle Redo Log or MySQL Binlog.
- Waterfall Migration
- A sequential migration approach that completes each phase (plan → extract → transform → load → validate) before starting the next. Contrasts with iterative or agentic approaches that run phases in parallel.
Z
- Zero Downtime Migration
- A migration approach that keeps the source system fully operational throughout the migration process, using CDC, shadow migration, or dual-write techniques to achieve a sub-minute cutover. Zero Downtime Migration Guide →
Ready to Start Your Migration?
Put these concepts into practice with DataMigration.AI — 8 specialised AI agents, zero manual mapping, 100% accuracy.