Under the Hood

Built right, from the ground up.

Databasin wasn't prototyped in a lab and scaled later. It was built in production at WashU Medicine — one of the most demanding data environments that exists — and every architectural decision reflects that.

Interactive Tool
See how we compare
Build your current data stack — Databricks, Snowflake, Fivetran, Tableau, and more — and see how it compares to Databasin on cost and capabilities. No form, no demo required.
Open the Stack Builder →
By the numbers
200+
Connectors — EHR, ERP, CRM, cloud warehouses, AI APIs, and custom sources
80%
Average cost reduction vs. comparable lake house and analytics stacks
Day 1
Time to a production-grade governed lake house — not a six-month implementation
"The architectural decisions we made at WashU Medicine — open formats, engine-agnostic storage, LLM-agnostic AI — weren't about flexibility for flexibility's sake. They were about never letting the platform become the bottleneck."
Chris Lundeberg · Co-Founder & CPO

Click any layer to see how it's built.

The full Databasin stack — from source systems to AI query layer. Every design decision is documented.

Layer 00
Source Systems
Epic · Workday · Salesforce · DBs · APIs
Layer 01
Connectors
200+ native · API builder · schema-aware
Layer 02
Bronze
Raw · immutable · schema-versioned
Layer 03
Silver
Validated · business rules · lineage
Layer 04
Gold
Governed · documented · trusted
Layer 05
Insights AI
NL queries · LLM-agnostic · gold only
Layer 00 — Source Systems
Every source system. One ingestion layer.
Databasin treats source system diversity as a given, not an exception. Whether you're running Epic Chronicles across a 20-hospital system, Workday Financials across 15 legal entities, or a SaaS stack that changes every quarter — the connector layer absorbs that complexity without passing it downstream.
Supported source categories
Epic EHRWorkdaySalesforceHubSpotPostgreSQLSQL ServerREST APIsGraphQL+ 190 more
Schema changes upstream don't break pipelines downstream. The connector layer isolates source schema volatility before it reaches your transformation and storage layers.
No polling production systems for analytics. Databasin extracts to a governed staging layer — your OLTP systems stay clean.
Layer 01 — Connectors
200+ connectors. One schema-aware ingestion layer.
Every connector is built with awareness of the source system's actual data model — not just its API surface. The Epic connector understands Chronicles, Clarity, and Caboodle as distinct data environments. The Workday connector understands the business object hierarchy, effective-date logic, and calculated field limitations.
Key design decisions
Schema-aware EpicWorkday BO modelNo-code API builderSelf-healing200+ pre-built
Schema-aware means the connector understands the data model, not just the endpoint. Generic REST connectors break at Epic and Workday because they don't understand how those systems actually structure their data.
The no-code API builder makes any HTTP endpoint connectable — no engineering ticket, no custom connector build required.
Layer 02 — Bronze
Raw data. Immutable. Schema-versioned.
Bronze is your audit trail and reprocessing foundation. Every record lands exactly as received — no transformation, no filtering. When upstream systems change their schemas, that change is logged at bronze before anything propagates downstream. This is where you detect issues, not where you discover them three dashboards later.
Bronze layer properties
Immutable storageSchema versioningFull audit trailReprocessing foundationDelta Lake / Iceberg
Upstream changes are absorbed at bronze — not cascaded downstream. Pipeline issues at 2am are a schema-versioning problem, not an on-call problem.
Layer 03 — Silver
Business rules. Applied once. Trusted everywhere.
Silver is where your institutional definitions live in code. One readmission calculation. One headcount definition. One MRR formula. Applied at transformation time — not inside individual reports where they'll inevitably drift apart and generate arguments about whose number is right.
Silver layer properties
Business rule centralizationData quality checksLow-code ELT builderFull lineageValidated output
Metric definitions live in the platform, not in people's heads. When an analyst leaves, the definition doesn't leave with them.
Layer 04 — Gold
Governed. Documented. Ready for every consumer.
Gold serves every downstream consumer — BI tools, AI queries, financial reports, board decks, investor dashboards. The same number in every place, because every place sources from the same governed mart. No reconciliation, no arguments, no "which version are you using?"
Gold layer consumers
Databasin One AIBI toolsDirect SQLFinancial reportingAPI accessBoard decks
AI queries the gold layer only. The model can't hallucinate against raw data because it's architecturally constrained to query validated, documented gold marts.
Layer 05 — Insights AI
Natural language. Governed answers. One click to a dashboard.
The Insights AI layer sits on top of the gold layer — architecturally constrained to query validated, documented data only. LLM-agnostic: GPT-5 via Azure OpenAI, Claude, or your internally approved model. The model runs inside your security boundary. Your data never leaves your governance perimeter to reach an external AI service.
Supported models
GPT-5 / Azure OpenAIClaudeBYO modelPrivate deployment
Governance and AI are built together, not bolted together. The quality of the answer is a function of the quality of the data underneath it — and that quality is enforced at silver and gold, not left to the model.

Every architectural decision has a reason.

Open
Delta Lake and Apache Iceberg — vendor-neutral open formats. Your data is readable by any compatible engine, now and in the future. No exit tax.
→ No lock-in at the storage layer, ever.
Flexible
Full platform or individual modules. Drop Databasin into your existing Databricks, Snowflake, or Fabric environment in BYO mode. Built to fit how you work.
→ You choose the deployment that fits your environment.
Secure
Private enterprise install in your own Azure tenant. Your LLM, your endpoints, your governance rules. PHI never leaves your environment — by architecture, not by policy.
→ Sovereignty is a design requirement, not a compliance checkbox.
Intelligent
AI woven into the architecture — not bolted on. The Insights layer queries governed gold data only. Governed data in, trusted answers out. LLM-agnostic by design.
→ Intelligence requires governance. We build both together.

Three engines. One lakehouse. Zero lock-in.

Every engine queries the same governed Apache Iceberg tables. Choose the right engine for each workload — or use all three. Fully managed. No infrastructure to provision or maintain.

Apache Trino
Interactive SQL
Distributed SQL engine optimized for low-latency, ad-hoc analytics. Federated queries across your entire lakehouse with sub-second response times on properly partitioned data.
Low latency Federated queries ANSI SQL Concurrent users Iceberg native
When to use
Ad-hoc exploration, dashboards, and live queries. Analysts running interactive SQL against gold-layer tables. Best for queries that need answers in seconds, not minutes.
Apache Spark
Heavy Processing
Distributed compute engine built for large-scale data processing. Handles multi-terabyte ETL/ELT transformations, complex joins across billions of rows, and ML workloads — all against the same Iceberg tables.
Large-scale ETL ML workloads Batch processing Multi-TB transforms Iceberg native
When to use
Scheduled ETL, heavy transformations, and ML pipelines. Data engineers running nightly medallion refreshes, complex aggregations, or training models against lakehouse data.
DuckDB
Lightweight Analytics
In-process analytical engine that runs directly in the browser or on a single node. Instant startup, zero infrastructure overhead. Reads the same Iceberg tables without spinning up a cluster.
In-browser Zero startup Embedded analytics Single-node Iceberg native
When to use
Quick local analysis, embedded queries, and fast iteration. Analysts prototyping queries or running lightweight aggregations without waiting for a cluster to provision.
Apache Iceberg: the unifying format underneath every engine
All three engines read and write the same Apache Iceberg tables. One storage format, one governance layer, one source of truth — regardless of which engine runs the query. Switch engines per workload without migrating data, changing schemas, or duplicating tables.

See the architecture applied to your specific environment.

Each use case has a dedicated technical breakdown — the data model, the connector pattern, the pipeline design, and the deployment approach specific to that environment.

Use Case 01
Health Systems
Epic EHR Data Management at AMC Scale
Chronicles, Clarity, and Caboodle as three distinct data environments with different access patterns and schema volatility profiles. How Databasin manages all three through a single connector with purpose-built ingestion logic — not generic REST extraction. Co-created at WashU Medicine.
Epic ChroniclesClarityCaboodleHIPAAAzure private installMedallion ELT
Use Case 02
Workday Organizations
Workday Business Object Extraction & Governance
The Workday data model is a hierarchy of business objects — not a relational schema — with effective-date logic, calculated field constraints, and Prism limitations that make direct reporting deeply demanding. How Databasin resolves all of this at the ingestion layer.
Workday FinancialsHCMBusiness Object modelEffective-date logicDelta Lake
Use Case 03
Enterprise OpEx
Consolidating a 5-Tool Enterprise Stack onto One Platform
The enterprise data stack accumulation problem: warehouse + connector layer + ETL orchestration + BI tool + AI API, each licensed separately, each with overlapping functionality, none of them governed together. How Databasin replaces all five — and how the migration is sequenced to minimize disruption.
BYO environmentDatabricks overlaySnowflake overlayStack consolidationGovernance migration
Your Environment
Custom Architecture
Talk to Chris. See the architecture for your stack.
Technical demos are led by Chris Lundeberg, Co-Founder & CPO — not a sales engineer reading from a script. Bring your current stack, your constraints, and your hardest data problem. We'll map out exactly how Databasin fits.
Your source systemsYour deployment requirementsYour governance posture
Ready to Go Deeper
Chris Lundeberg, Co-Founder & CPO of Databasin
Chris Lundeberg
Co-Founder & CPO
Connect on LinkedIn

Talk to Chris.
See the architecture
for your environment.

Technical demos are led by Chris Lundeberg, Co-Founder & CPO — not a sales engineer reading from a script. Start today with $50 in credits.