Lakehouse overview
Catalogs, schemas, and the Lakehouse surface.
Lakehouse is the SQL side of Databasin — a full warehouse surface where you query data that's been pipelined in, explore it in a proper editor, and run it across multiple engines from one place.
If Databasin One is "ask a question," Lakehouse is "know the answer and want to express it precisely in SQL."
The pieces
SQL Editor
A multi-tab SQL editor built on Monaco — the same engine behind VS Code. You get:
- Autocomplete on catalogs, schemas, tables, and columns.
- Syntax highlighting with SQL-aware language support.
- Query history and saved queries (toggle buttons in the editor toolbar).
- A results panel you can sort by clicking a column header. (Sorting only — there's no row filter in the results grid; narrow with a
WHEREclause instead.)
Open the SQL editor from a project
Object Explorer
A tree view of everything you can query: catalogs → schemas → tables → columns. Browse what's available without guessing. Toggle it from the editor toolbar.
Saved Queries
Name and save any query so you can rerun it later. A dedicated toolbar toggle opens the saved-queries panel.
Notebooks
An alternate surface for long-form analysis. Notebooks mix SQL cells with markdown and chart outputs — good for handoffs where the query isn't the whole story.
Catalogs and connectors
A catalog in Lakehouse maps to a connector:
- Every lakehouse engine (Trino, Doris, Spark, DuckDB) shows up as one catalog.
- Inside that catalog, you see whatever schemas and tables the engine exposes.
- Some connectors pull in external sources as read-only catalogs (e.g. Postgres, Snowflake).
So when you see warehouse.public.orders in a query, it's: catalog → schema → table.
The engines
A project can have more than one lakehouse engine at a time. The native set is Trino, Doris, Spark, and DuckDB; Databricks is supported too, as an external/federated connection rather than a native lakehouse engine.
| Engine | Best for |
|---|---|
| Trino | Interactive SQL and federated queries across catalogs. The default. |
| Doris | Real-time OLAP — low-latency, high-concurrency interactive analytics and BI. |
| Spark | Heavy ETL and large-scale processing. |
| DuckDB | Small-to-medium data; fast single-node prototyping. |
| Databricks | Querying an existing Databricks workspace from the same editor. |
The mechanics of switching engines — and which ones stream results — live in Multi-engine SQL. Doris is new enough to have its own guide: Apache Doris.
Clusters
Behind the scenes, Trino, Spark, and Doris answer queries from a running cluster. Clusters cost credits while running and sleep when idle:
- A cold cluster wakes on first query (roughly half a minute).
- It stays warm for a while so follow-up queries are instant.
- It sleeps automatically after a stretch of no activity.
DuckDB is single-node and doesn't ride a managed cluster the same way. Databasin handles wake/sleep transparently — you just see a small "Waking cluster…" indicator on the first query after a nap. See Clusters, wake and sleep.
Things that differ from a generic SQL editor
Streaming results
On Trino and Doris, results stream in as they're produced, so you can start reading the shape before the query finishes. Spark, DuckDB, and Databricks return their results in a batch when the query completes.
Query limits (optional)
A limit results toggle in the toolbar appends LIMIT 1000 to interactive queries. It only touches SELECT, WITH, and VALUES statements — DDL, DML, and SHOW run untouched. A cheap guardrail against accidentally SELECT *-ing a huge table.