Under the Hood

Built right, from the ground up.

Databasin wasn't prototyped in a lab and scaled later. It was built in production at WashU Medicine — one of the most demanding data environments that exists — and every architectural decision reflects that.

Interactive Tool

See how we compare

Build your current data stack — Databricks, Snowflake, Fivetran, Tableau, and more — and see how it compares to Databasin on cost and capabilities. No form, no demo required.

Open the Stack Builder →

50 +

Connectors — EHR, ERP, CRM, cloud warehouses, AI APIs, and custom sources

80 %

Average cost reduction vs. comparable lake house and analytics stacks

Day 1

Time to a production-grade governed lake house — not a six-month implementation

"The architectural decisions we made at WashU Medicine — open formats, engine-agnostic storage, LLM-agnostic AI — weren't about flexibility for flexibility's sake. They were about never letting the platform become the bottleneck."

Chris Lundeberg · Co-Founder & CPO

Interactive Architecture

Click any layer to see how it's built.

The full Databasin stack — from source systems to AI query layer. Every design decision is documented.

Layer 00

Source Systems

Epic · Workday · Salesforce · DBs · APIs

→

Layer 01

Connectors

dozens of native · API builder · schema-aware

→

Layer 02

Bronze

Raw · immutable · schema-versioned

→

Layer 03

Silver

Validated · business rules · lineage

→

Layer 04

Gold

Governed · documented · trusted

→

Layer 05

Insights AI

NL queries · LLM-agnostic · gold only

→

Layer 00 — Source Systems

Every source system. One ingestion layer.

Databasin treats source system diversity as a given, not an exception. Whether you're running Epic Chronicles across a 20-hospital system, Workday Financials across 15 legal entities, or a SaaS stack that changes every quarter — the connector layer absorbs that complexity without passing it downstream.

Supported source categories

Epic EHR Workday Salesforce HubSpot PostgreSQL SQL Server REST APIs GraphQL + 190 more

Schema changes upstream don't break pipelines downstream. The connector layer isolates source schema volatility before it reaches your transformation and storage layers.

No polling production systems for analytics. Databasin extracts to a governed staging layer — your OLTP systems stay clean.

Layer 01 — Connectors

dozens of connectors. One schema-aware ingestion layer.

Every connector is built with awareness of the source system's actual data model — not just its API surface. The Epic connector understands Chronicles, Clarity, and Caboodle as distinct data environments. The Workday connector understands the business object hierarchy, effective-date logic, and calculated field limitations.

Key design decisions

Schema-aware Epic Workday BO model No-code API builder Self-healing dozens of pre-built

Schema-aware means the connector understands the data model, not just the endpoint. Generic REST connectors break at Epic and Workday because they don't understand how those systems actually structure their data.

The no-code API builder makes any HTTP endpoint connectable — no engineering ticket, no custom connector build required.

Layer 02 — Bronze

Raw data. Immutable. Schema-versioned.

Bronze is your audit trail and reprocessing foundation. Every record lands exactly as received — no transformation, no filtering. When upstream systems change their schemas, that change is logged at bronze before anything propagates downstream. This is where you detect issues, not where you discover them three dashboards later.

Bronze layer properties

Immutable storage Schema versioning Full audit trail Reprocessing foundation Delta Lake / Iceberg

Upstream changes are absorbed at bronze — not cascaded downstream. Pipeline issues at 2am are a schema-versioning problem, not an on-call problem.

Layer 03 — Silver

Business rules. Applied once. Trusted everywhere.

Silver is where your institutional definitions live in code. One readmission calculation. One headcount definition. One MRR formula. Applied at transformation time — not inside individual reports where they'll inevitably drift apart and generate arguments about whose number is right.

Silver layer properties

Business rule centralization Data quality checks Low-code ELT builder Full lineage Validated output

Metric definitions live in the platform, not in people's heads. When an analyst leaves, the definition doesn't leave with them.

Layer 04 — Gold

Governed. Documented. Ready for every consumer.

Gold serves every downstream consumer — BI tools, AI queries, financial reports, board decks, investor dashboards. The same number in every place, because every place sources from the same governed mart. No reconciliation, no arguments, no "which version are you using?"

Gold layer consumers

Databasin One AI BI tools Direct SQL Financial reporting API access Board decks

AI queries the gold layer only. The model can't hallucinate against raw data because it's architecturally constrained to query validated, documented gold marts.

Layer 05 — Insights AI

Natural language. Governed answers. One click to a dashboard.

The Insights AI layer sits on top of the gold layer — architecturally constrained to query validated, documented data only. LLM-agnostic: GPT-5 via Azure OpenAI, Claude, or your internally approved model. The model runs inside your security boundary. Your data never leaves your governance perimeter to reach an external AI service.

Supported models

GPT-5 / Azure OpenAI Claude BYO model Private deployment

Governance and AI are built together, not bolted together. The quality of the answer is a function of the quality of the data underneath it — and that quality is enforced at silver and gold, not left to the model.

Design Principles

Every architectural decision has a reason.

Open

Delta Lake and Apache Iceberg — vendor-neutral open formats. Your data is readable by any compatible engine, now and in the future. No exit tax.

→ No lock-in at the storage layer, ever.

Flexible

Full platform or individual modules. Drop Databasin into your existing Databricks, Snowflake, or Fabric environment in BYO mode. Built to fit how you work.

→ You choose the deployment that fits your environment.

Secure

Private enterprise install in your own Azure tenant. Your LLM, your endpoints, your governance rules. PHI never leaves your environment — by architecture, not by policy.

→ Sovereignty is a design requirement, not a compliance checkbox.

Intelligent

AI woven into the architecture — not bolted on. The Insights layer queries governed gold data only. Governed data in, trusted answers out. LLM-agnostic by design.

→ Intelligence requires governance. We build both together.

Query Engine Architecture

Three engines. One lakehouse. Zero lock-in.

Every engine queries the same governed Apache Iceberg tables. Choose the right engine for each workload — or use all three. Fully managed. No infrastructure to provision or maintain.

Apache Trino

Interactive SQL

Distributed SQL engine optimized for low-latency, ad-hoc analytics. Federated queries across your entire lakehouse with sub-second response times on properly partitioned data.

Low latency Federated queries ANSI SQL Concurrent users Iceberg native

When to use

Ad-hoc exploration, dashboards, and live queries. Analysts running interactive SQL against gold-layer tables. Best for queries that need answers in seconds, not minutes.

Apache Spark

Heavy Processing

Distributed compute engine built for large-scale data processing. Handles multi-terabyte ETL/ELT transformations, complex joins across billions of rows, and ML workloads — all against the same Iceberg tables.

Large-scale ETL ML workloads Batch processing Multi-TB transforms Iceberg native

When to use

Scheduled ETL, heavy transformations, and ML pipelines. Data engineers running nightly medallion refreshes, complex aggregations, or training models against lakehouse data.

DuckDB

Lightweight Analytics

In-process analytical engine that runs directly in the browser or on a single node. Instant startup, zero infrastructure overhead. Reads the same Iceberg tables without spinning up a cluster.

In-browser Zero startup Embedded analytics Single-node Iceberg native

When to use

Quick local analysis, embedded queries, and fast iteration. Analysts prototyping queries or running lightweight aggregations without waiting for a cluster to provision.

Apache Iceberg: the unifying format underneath every engine

All three engines read and write the same Apache Iceberg tables. One storage format, one governance layer, one source of truth — regardless of which engine runs the query. Switch engines per workload without migrating data, changing schemas, or duplicating tables.

Technical Deep-Dives

See the architecture applied to your specific environment.

Each use case has a dedicated technical breakdown — the data model, the connector pattern, the pipeline design, and the deployment approach specific to that environment.

Use Case 01

Health Systems

Epic EHR Data Management at AMC Scale

Chronicles, Clarity, and Caboodle as three distinct data environments with different access patterns and schema volatility profiles. How Databasin manages all three through a single connector with purpose-built ingestion logic — not generic REST extraction. Co-created at WashU Medicine.

Epic Chronicles Clarity Caboodle HIPAA Azure private install Medallion ELT

Explore this architecture →

Use Case 02

Workday Organizations

Workday Business Object Extraction & Governance

The Workday data model is a hierarchy of business objects — not a relational schema — with effective-date logic, calculated field constraints, and Prism limitations that make direct reporting deeply demanding. How Databasin resolves all of this at the ingestion layer.

Workday Financials HCM Business Object model Effective-date logic Delta Lake

Explore this architecture →