# Data pipeline key concepts

Workato data pipelines extract, replicate, and sync data to maintain accurate and up-to-date datasets. Pipelines connect source applications to destination data warehouses, move data in bulk, and preserve schema integrity. The following sections define key concepts that explain how data pipelines process and manage data.

# Source applications and destinations

Data pipelines extract data from a source application, such as Salesforce, and sync it to a destination, such as Snowflake. A single pipeline retrieves data from multiple objects or fields within the source application and replicates that data in the specified destination.

# Object syncs

A sync refers to the overall process where the pipeline extracts data from the source and loads it into the destination. Each sync processes multiple objects in parallel and uses one of the following types:

  • Full sync

  • The initial bulk load extracts all records from the source and replicates them in the destination. By default, the pipeline fetches all available records when you leave the When first started, this pipeline should pick up records from field blank. Set a specific date to sync from that point onward to limit the scope.

  • Incremental sync

  • Each scheduled sync extracts only new, updated, or deleted records since the last successful sync.

A data pipeline starts with a full historical sync, which transfers all data from the source or from a specified date. After the initial sync completes, the pipeline switches to incremental syncs to capture new, updated, or deleted records.

Refer to the Sync types and execution guide for more information.

# Pipeline runs

Each data pipeline sync consists of multiple runs, with one run per selected object. A sync represents the entire activity for all objects, while a run tracks the execution for a single object within that sync.

The pipeline executes runs in parallel to improve performance. Each run extracts data from the source and loads it into the destination. Run-level data appears in the Runs tab, which helps you monitor pipeline execution. Refer to the Object runs section for more information.

# Schema replication and schema drift management

Schema drift refers to inconsistencies between the source and destination that occur when changes appear in the source data. These changes may include added or deleted fields, modified field types, or other structural updates. Unmanaged schema drift can cause transformation errors, data loss, and inaccurate analysis.

Workato pipelines detect schema drift during syncs and apply schema changes based on your pipeline configuration. Use the Auto-sync new fields option to apply schema updates automatically, or use Block new fields to review and manage changes manually. You can configure this behavior during pipeline setup.


Last updated: 5/7/2025, 7:07:03 AM