Benchmark Results — Duck Data Master

// HEADLINE BENCHMARK · APRIL 2026 · VERIFIED

1,029,157,161 rows.
7.67 seconds.
One GCP node.

14 years of NYC TLC yellow taxi data (2012–2025) — 168 monthly Parquet files, 19 columns, pulled specifically to cross 1 billion rows — scanned with a full multi-column aggregation query on a single GCP c3-standard-44-lssd instance. 19× faster than a pre-warmed Apache Spark cluster on the same query.

1.03B
Rows scanned
NYC taxi 2022–2025
7.67s
Wall time
cold read · NVMe RAID 0
134M
Rows/sec avg
including I/O + decompress
19×
Faster than Spark
pre-warmed 4-node cluster
Hardware: GCP c3-standard-44-lssd · 44 vCPU Intel Sapphire Rapids · 176 GB DDR5 · RAID 0 8×375GB NVMe (~22 GB/s)
Dataset:   NYC TLC yellow_tripdata_YYYY-MM.parquet · 2012–2025 · 168 monthly files · 19 columns
Query:     SELECT VendorID, SUM(total_amount), AVG(trip_distance), SUM(passenger_count), COUNT(*) FROM … GROUP BY VendorID
Method:    3 cold runs · no page cache warmup · <3% variance · April 8, 2026
Read the full benchmark breakdown →
// INSTANCE BENCHMARKS — ALL FEATURES

Real numbers.
Live instance. No tricks.

Every figure below was measured on a live Duck Data Master Guru instance running on GCP. Not simulated, not cherry-picked — actual API response times from a stress test across every feature.

2ms
SQL COUNT(*)
10 million rows
162ms
Full column profile
SUMMARIZE · 10M rows
52ms
Fuzzy match
Jaro-Winkler · 10M rows
10.8s
RandomForest train + score
2 features · 50k rows
ML-DSA-65
PQC signed exports
NIST FIPS 204 · 50k rows
343 MB
CSV exported
10M rows · streamed
3–6s
NL → SQL → execute
AI call + query end-to-end
90/100
Stability score
3rd benchmark run · all features
// TEST ENVIRONMENT

The instance that ran these tests.

This is an n2-standard-8 — the Standard compute tier. Every customer chooses their machine size. Bigger machines are available if your workload demands it.

Instance type GCP n2-standard-8 vCPU 8 cores RAM 31.3 GB Storage 300 GB pd-ssd Region us-central1-a OS Ubuntu 24.04 LTS Analytics engine DuckDB (latest) — file-backed session Extensions httpfs · spatial · delta · iceberg Dataset NYC Taxi 2024 — 12 months · 10M rows · 19 columns Test date 2026-05-18
// ALL FEATURES — MEASURED

Every tab. Every endpoint.

Each row is a live API call against the running instance. Times are wall-clock from request to response including network round-trip from Phoenix, AZ to GCP us-central1.

Feature Tab Time Dataset / Notes
SQL query — COUNT(*) Query 2 ms 10,000,000 rows · instant
SQL query — GROUP BY aggregation Transform 84 ms 10M rows · 4 categories · SUM + COUNT
Column profile (SUMMARIZE) Profile 162 ms 10M rows · 5 columns · min/max/avg/std/quartiles/nulls
Fuzzy match (Jaro-Winkler) Fuzzy 52 ms 10M rows · threshold 0.60 · 200 results returned
NL → SQL → execute (AI end-to-end) Query 3–6 s Gemini AI call + 15ms SQL execution · auto-schema injected · AI latency dominates
Python NL → code → execute Query 4–8 s AI generates Python → server executes → returns output + figures
Python execution (direct) Notebook < 100 ms COUNT query via pandas · 10M rows
Notebook cell (GROUP BY → DataFrame) Notebook < 150 ms Aggregation + pandas to_string output
AI notebook suggest Notebook 3–5 s Gemini generates setup + analysis code cells · no extra packages needed
RandomForest train + score ML Score 10.8 s 50,000 rows · 2 features · classification · 100 estimators · scored table written · auto-samples to 100k max
ML feature importance ML Score included above Returned with train result · top features ranked by importance
PQC keypair generate PQC Sign < 1 s ML-DSA-65 (NIST FIPS 204) · 3,904-char hex public key
PQC sign + export PQC Sign < 500 ms 50,000 rows · CSV export · ML-DSA-65 (NIST FIPS 204) signature · .sig file written
Table join (INNER) Join 1,563 ms 50k × 50k rows · 2.5M result rows written · disk guard active
REST API ingest Extract < 1 s External JSON API → DuckDB table · envelope auto-detected · 10 rows · 8 columns
CSV export (streaming) Export ~8 s 10,000,000 rows · 343 MB · streamed directly, no temp file
GCS bucket list Ingest < 500 ms 12 files listed · authenticated GCS client
Duck Master AI chat Chat 3–5 s Full session schema auto-injected · correct table list in reply
AI notebook generate Notebook 4–7 s AI generates full .ipynb · saved directly to /data/notebooks/ · instantly in JupyterLab
// RESPONSE TIME — VISUAL

SQL operations finish before
you finish blinking.

The analytics engine processes data at memory speed. Only AI calls (Gemini) add latency — the query execution itself is near-instant at any scale.

SQL COUNT(*) 10M rows
2ms
SQL GROUP BY + SUM 10M rows
84ms
SUMMARIZE profile 10M rows
162ms
Fuzzy match 10M rows
52ms
PQC sign 50k rows
204ms
NL → SQL → result (AI)
3–6s
RandomForest train 50k rows
10.8s
CSV export 343 MB
~8s

* Bar widths are log-scaled for readability. AI call latency (NL, Chat, Suggest) is Gemini network time, not compute time — SQL execution within those calls is <10ms.

// VS MANAGED CLUSTERS

What you're replacing.

Duck Data Master is purpose-built for analytics workloads that finish in seconds on a single node. Here's the honest comparison.

⚡ Databricks / Snowflake
Monthly cost$3,000–$15,000+
InfrastructureShared multi-tenant cluster
Startup time30s–5min (cold cluster)
Data moves to their cloudYes
AI/NL queriesAdd-on / extra cost
ML scoringDatabricks ML / MLflow
PQC signed exportsNot available
Built-in notebookYes (shared compute)
🦆 Duck Data Master Guru
Monthly cost$99 + GCP at cost + 10%
InfrastructureDedicated GCP VM, your account
Startup timeAlways on — 0s
Data moves to their cloudNever — stays in your region
AI/NL queriesIncluded · 2,000/day
ML scoringBuilt-in · RandomForest / GBM / LR
PQC signed exportsML-DSA-65 (NIST FIPS 204)
Built-in notebookYes · JupyterLab + in-dashboard
// METHODOLOGY

How we measured.

Dataset: NYC Yellow Taxi trip records, January–December 2024 — 12 Parquet files, ~10 million rows total, 19 columns including timestamps, fare amounts, GPS coordinates, and categorical fields. All files loaded into the analytics engine's persistent session before testing.

Timing method: Wall-clock time measured from HTTP request to full JSON response received. Tests run sequentially from Phoenix, AZ (home network) over HTTPS. Times include network round-trip latency (~20ms baseline to GCP us-central1) and exclude browser rendering.

AI call latency: NL query, Python NL, Chat, and Notebook Suggest all route through Gemini (Vertex AI). The 3–6 second range reflects typical Gemini response time. SQL execution within those calls is measured separately at <10ms.

ML training: scikit-learn RandomForestClassifier, 100 estimators, 80/20 train/test split, random_state=42. Training auto-samples to 100,000 rows maximum — sufficient for production model quality. A memory guard (4 GB free RAM required) prevents OOM on large feature sets.

PQC signatures: ML-DSA-65 (CRYSTALS-Dilithium3, NIST FIPS 204) implemented via dilithium-py. Signs the SHA-256 digest of the exported CSV. Public key is 1,952 bytes (3,904 hex chars). Signature files (.sig) are written alongside the export.

90/100 score: Measured across 13 features in 3 benchmark runs. The 10 points reflect real-world constraints — not system failures: GCS ingest requires a configured bucket, ML has no streaming progress bar yet (visual only, results are correct), and DB attach is environment-dependent. Core analytics (SQL, NL, Profile, Fuzzy, Join, Export, PQC, Chat, Notebook, Extract) all scored 88–95.

Honest note: These numbers represent a lightly loaded single-user instance. Performance scales with your GCP machine tier. The n2-standard-8 is the Standard tier — larger instances (up to 176 vCPU / 704 GB RAM) deliver proportionally faster results.
// VM TIERS — PICK YOUR COMPUTE

More machine when you need it.

All benchmarks above were run on the Standard tier (n2-standard-8). Scale up to your workload — you pay GCP's exact rate + 10%. Turn it off when you're done.

Tier vCPU RAM Rate Best for
Starter 4 16 GB $0.19/hr Up to ~25M rows, light analytics
Standard (these benchmarks) 8 32 GB $0.39/hr Up to ~100M rows, typical analytics
Pro 22 88 GB $1.07/hr Up to ~500M rows, heavy ETL
Power 44 176 GB $2.14/hr Billion-row datasets, ML at scale
Ultra 88 352 GB $4.29/hr Full data warehouse replacement
Guru 176 704 GB $8.57/hr Enterprise-scale, real-time at any volume

Stopped instance: ~$8–12/mo disk-only cost. Zero compute when idle. We tell you to stop it — our revenue doesn't depend on you forgetting.

// GET STARTED

Run these benchmarks on your data.

3-day free trial. Full Guru access. Your GCP instance provisions automatically. No credit card required.

Start Free Trial →
$99/mo platform fee · compute billed at GCP cost + 10% · cancel any time