How pipelines work
Source, ingestion, target — and everything between.
If you've already built your first pipeline, you've used every concept in this article. This is the longer explanation — what the moving parts are, how to reason about changes, and where the lines are drawn between a pipeline and the tools around it.
The shape of a pipeline
Every pipeline follows the same shape:
Source → ingestion (per object) → Target
- Source — a connector you're reading from. Could be a live database, a SaaS API, a file drop, or an uploaded file.
- Target — a connector you're writing to. Usually a warehouse or a data lake.
- In between — each object you sync gets its own ingestion mode that decides how rows land in the target.
The wizard you used in the first pipeline guide is just a UI for picking those pieces. Pipelines, along with connectors and automations, live in the Integrations hub.
Each object picks its own ingestion mode
When you pick the tables or objects to sync, each one becomes an artifact — Databasin's word for "one table or data set moving through this pipeline." Every artifact carries its own ingestion mode, set per artifact, not once for the whole pipeline. That means one pipeline can full-refresh a small lookup table while incrementally upserting a large orders table right beside it.
There are five modes — Snapshot, Delta, Historical, CDC, and Stored Procedure. Picking the right one is the single most important decision for each artifact, so it has its own article.
Ingestion modes covers all five, the options each one exposes, and how to translate the old "Full / Incremental / Append" vocabulary.
There's no in-flight transform stage
This surprises people coming from other tools: a Databasin pipeline does not have a visual transform builder. There's no step for joins, derived columns, or pivots between the source read and the target write. A pipeline's job is to move data faithfully.
For light, per-artifact shaping you have two options on the ingestion step:
- Custom SQL — replace the generated source
SELECTwith your own statement. - Custom WHERE — keep the generated query but add a filter clause.
Both are set per artifact, and the default ("Auto") just reads the object as-is.
For anything heavier — joins across tables, aggregations, business logic — do the raw sync first, then transform afterward:
- Use an automation to run SQL (or chain steps) after the pipeline finishes, or
- Query and reshape the landed data directly in Lakehouse SQL.
The short version: pipelines land clean copies; transformation happens downstream.
Pipelines are project-scoped
A pipeline lives inside one project. It uses connectors from that project, and its runs, logs, and metrics belong to that project. If you need the same sync in two projects, build it in each.
Scheduling and runs
Schedules live in their own article — see Scheduling and triggers. The short version: manual runs, one-click presets, or a custom cron expression, each with a timezone you choose.
Once a pipeline is running, Monitoring and alerts covers where to watch it and how failures reach you.
Common gotchas
"The same row is appearing twice"
Almost always means the artifact is on Historical (append-only — every run adds new rows) when you actually wanted Delta (upsert matched on merge keys). Switch the artifact to Delta and set proper merge keys. If you just need a clean full copy every run, Snapshot is dup-safe by design — it truncates and reloads.
"Why is this pipeline so slow?"
The usual suspects:
- Snapshot on a big table — you're rereading everything every run. Switch that artifact to Delta with a watermark column.
- Network distance between the source and Databasin — the run details show where time is going.
- Target contention — if the target is busy serving queries, writes slow down. Run heavy pipelines during off-peak hours.
"My source changed its schema"
Open the pipeline, edit the affected artifact (or add a new one), and re-pick the columns so the artifact matches the source again.