Cross-Database SQL for Live Databases, Files, and S3

No exports
No staging tables
No warehouse

Try Cross-Database SQL View Example Queries

Cross-database SQL lets you run a single SQL statement that joins tables, files, and object storage from different systems. Instead of exporting Postgres data to CSV, staging Parquet into a warehouse, or building a replication pipeline, you query the live sources in place.

DBConvert Streams does this with an embedded DuckDB engine: each configured connection is attached as a virtual schema, the query is planned centrally, and results come back without persistent staging or data movement.

The Problem

Why Cross-Database Queries Are Hard in Traditional Tools

Most traditional tools execute queries against a single active connection. Cross-connection joins typically require database-level federation or external pipelines.

Most IDEs Execute Against One Active Connection

DBeaver, DataGrip, and similar tools run queries against a single active connection per SQL editor tab. Cross-database joins depend on database-level federation such as FDW or linked servers.

Compare: DBeaver · DataGrip

Warehouses Centralize Data Before Joins

BigQuery, Snowflake, and Redshift typically require data to be ingested or staged before cross-source joins. External querying is possible but requires additional configuration and managed integrations.

ELT Platforms Focus on Moving Data First

Airbyte, Fivetran, and dbt focus on extracting and loading data into a destination before it can be joined or analyzed. Ad-hoc cross-source queries are not their primary workflow.

Compare: Airbyte · Fivetran

Export-Import Cycles for Every Join

Joining data from two databases typically means exporting one dataset to CSV, importing it into a staging table, running the query, then discarding the staging data. Any scope change means repeating the cycle.

DBConvert Streams removes all of this overhead with a built-in federated SQL engine.

Architecture

Built-In Federated SQL Engine

An embedded DuckDB engine that attaches to your live sources and executes queries in place.

Attach

Each source is attached as a virtual schema

Plan

Query planned centrally by DuckDB optimizer

Execute

Results returned instantly from attached sources

No Staging

No persistent staging or replication required

SQL Console

Query and results in one vertical flow

Results open below the SQL code with page-size controls and pagination for large result sets.

SQL console showing query code above and paginated results below

The SQL console keeps query and results in one vertical flow, with pagination at the bottom of the results grid.

SQL Console docs

Examples

Tested SQL Examples

Copy-paste queries across databases and files.

Note: pg1 and my1 are connection aliases defined in DBConvert Streams. They represent configured PostgreSQL and MySQL connections.

Cross-Database JOIN (PostgreSQL + MySQL)

SELECT
    a.first_name || ' ' || a.last_name AS actor_name,
    f.title AS film_title
FROM pg1.public.actor a
JOIN pg1.public.film_actor fa ON a.actor_id = fa.actor_id
JOIN my1.sakila.film f ON fa.film_id = f.film_id
LIMIT 20;

Joins tables across PostgreSQL and MySQL in a single query — no export, no import, no staging.

Cross-Database UNION (PostgreSQL + MySQL)

(SELECT 'PostgreSQL' as source, film_id, title, release_year
FROM pg1.public.film
WHERE title LIKE 'A%'
LIMIT 5)

UNION ALL

(SELECT 'MySQL' as source, film_id, title, release_year
FROM my1.sakila.film
WHERE title LIKE 'A%'
LIMIT 5);

Combines results from PostgreSQL and MySQL into one dataset.

Use Cases

When to Use Cross-Database SQL

Migration Validation

Compare data between source and target before cutover. Verify row counts, checksums, and content match across migration runs.

Pre-Migration Filtering

Use SQL to inspect and filter source data before starting a migration. Identify schema mismatches or data quality issues upfront.

Join Operational DB with S3 Export

Combine live database records with archived Parquet or CSV exports in S3 for historical analysis without re-importing.

Audit Across Environments

Investigate production vs staging mismatches. Run comparison queries directly without building replication streams first.

Cross-System Reporting

Combine operational data from MySQL and PostgreSQL into unified reports without a data warehouse.

File + Database Analysis

Analyze Parquet or CSV exports together with live databases using full DuckDB SQL - aggregations, window functions, and transformations.

Load Mode

Your Query Becomes Your Pipeline

The SQL you write to explore is the same SQL that powers a stream. Point it at a target — another database, a file, S3 — and the data moves. No hand-off between tools. No rebuild.

Use SQL query results as a source for data migration in load mode
Query file pipeline outputs (CSV, Parquet, JSON) alongside databases

SQL query results used as a stream source in load mode configuration

Related workflows

Cross-database SQL sits between exploration, migration, and automation

Use the surrounding pages when ad-hoc query work turns into something bigger.

Run the queries in the explorer

Use the explorer workspace when you want schema browsing, inline inspection, and SQL in the same UI.

Use the query as a migration source

Move from validation into snapshot migration when the query output is ready to become the transfer source.

Automate the workflow

Use the API when you want to script query-driven workflows, stream setup, or operational checks.

Let your AI write the query

Over MCP, Claude or Cursor drafts the federated SQL, runs it read-only, and hands back the rows.

FAQ

Technical questions

Does cross-database SQL move my data?

No. Sources are attached as virtual schemas, and the query reads from them in place. There is no persistent staging, no copy step, and no warehouse ingest.

Which sources can I join in one query?

PostgreSQL, MySQL, and file sources (Parquet, CSV, JSON), including files in S3. Each configured connection becomes a virtual schema you can reference in SQL.

What SQL dialect is used?

DuckDB SQL. The embedded DuckDB engine plans and executes the query, so window functions, CTEs, aggregations, and file scans all work the same as in DuckDB itself.

Where does the filtering actually happen — in the source database or in DuckDB?

Both, depending on what the query asks for. When you write WHERE country = 'DE' against a PostgreSQL table, the filter is pushed down to PostgreSQL, which uses its own indexes and returns only the matching rows over the network. The same goes for column selection — only the columns you reference are pulled. Cross-source joins are different: PostgreSQL and MySQL do not see each other, so DuckDB pulls the relevant rows from each side (already filtered by the pushdowns above) and performs the join itself.

Do I need a data warehouse?

No. The whole point is that you do not stage data into BigQuery, Snowflake, or Redshift before joining. If you already have a warehouse, you can still query it as one of the sources.

Is this safe to run against production databases?

Queries are read-only against the attached sources and respect the credentials of the configured connection. Treat heavy joins the same way you would treat any analytical query against a live OLTP database.

Run the first query in the explorer

Open the explorer to run queries against your live connections. Move into migration or API automation when the SQL is doing the right job.

Open Data Explorer Read Federated Query Docs

See Migration Workflow Explore REST API