# Data pipelines

Data pipelines automate data replication and sync multiple objects from a source application to a destination in a single workflow. Traditional recipes process records individually or in small batches, which increases sync times and maintenance efforts. Data pipelines extract, replicate, and load data in bulk, which improves speed, scalability, and efficiency.

A single data pipeline ingests thousands of objects from a source application or file system into a destination data warehouse and matches schemas automatically.

# Why use a data pipeline?

Traditional data replication requires multiple recipes, each configured for a single object. Separate recipes increase setup time, extend sync durations, and create troubleshooting challenges.

Standard recipes process records in small batches, which reduces performance and requires users to resolve failures manually across multiple workflows.

Data pipelines consolidate multiple object syncs into a single workflow. Instead of processing small batches, pipelines extract, replicate, and load multiple objects in parallel, which accelerates syncs and enhances reliability. Schema updates apply automatically, so the destination remains consistent. Built-in Change Data Capture (CDC) identifies new, updated, and deleted records, which eliminates the need for manual updates.

# Key benefits

Data pipelines provide the following capabilities:

Automated schema management: Detects schema changes and applies them to the destination.
Optimized change tracking: Uses CDC to capture new, modified, and deleted records.
Reduced maintenance effort: Replaces multiple recipes with a single pipeline for simplified setup, monitoring, and error handling.
Improved observability: Logs and run history provide insights into schema changes, data volume, and errors.

# How data pipelines work

A data pipeline follows the extract, replicate, load, and sync process to automate data movement:

graph LR A[Extract] --> B[Replicate] B --> C[Load] classDef default fill:#67eadd,stroke:#67eadd,stroke-width:2px,color:#000;

Extract: The trigger retrieves data from the source application, such as Salesforce.
Replicate: The pipeline replicates the schema and ensures compatibility with the destination.
Load: The load action transfers records in bulk to the destination, such as Snowflake.

The pipeline syncs data on a scheduled interval. It executes the extract, replicate, and load process for all selected objects. The trigger extracts data from the source, and the load action replicates the schema and transfers records to the destination.

# Get started with data pipelines

Refer to the following guides to configure a data pipeline recipe to sync data between applications:

Connect to sources and destinations: Establish connections to source applications and destination data warehouses.
Configure a data pipeline: Set up the pipeline, define source objects, and choose sync settings.
Monitor and manage pipelines: Track sync progress and troubleshoot errors.

Last updated: 6/13/2025, 3:05:55 PM