Duck Data Master is a full-stack GCP analytics platform — AI text-to-SQL, a high-performance columnar engine, and elastic cloud infrastructure that grows with your workload. Here is the mission, the story, and what we will not compromise.
Databricks, Snowflake, and Redshift were built for petabyte-scale workloads at companies with dedicated data engineering teams. Most businesses are not those companies. They have tens or hundreds of millions of rows — not trillions — and they are paying $5k–$15k per month for a distributed cluster to run dashboards that should complete in milliseconds on a single machine.
Duck Data Master exists to end that overpayment.
Our mission is to give any business — any size, any dataset — the power of enterprise-grade analytics without the enterprise price tag. The platform runs entirely on Google Cloud Platform: AI text-to-SQL on Vertex AI, authentication on Firebase, and a high-performance columnar analytics engine that proved itself at 1,029,157,161 rows in 7.67 seconds — 134M rows/sec, 19× faster than pre-warmed Spark. Elastic by design. Scales with your workload. No cluster to manage. No DevOps tax. No data engineering degree required.
We believe your data belongs to you. The analytics engine processes your files directly — nothing is permanently stored on our infrastructure. Upload, ask, answer. That's the product. And when your workload grows, the platform grows with it — GCP scales automatically, you never hit a wall, and you never pay for idle capacity.
One flat monthly fee. No per-query billing. No cluster overhead. Start with a 3-day free trial — full Guru access — and cancel any time.
Scott Baker's relationship with computing started in the 1980s — Haverford Preparatory School, a Mac SE/30, and HyperCard. That early instinct for building systems never left.
He earned a Databricks Certified Associate Developer for Apache Spark (Scala track) and built a multi-node Apache Spark 3.5 cluster using pure functional programming patterns in Scala — specifically to benchmark it against our analytics engine in a single-node configuration. The result was unambiguous: 1,029,157,161 rows on a single GCP cloud instance — 5 analytical queries in 7.67 seconds, 134M rows/sec average, 6.9B rows/sec on the full cross-dataset scan. 19× faster than pre-warmed Spark. No cluster. No JVM. No DevOps overhead.
That benchmark became the foundation of this platform. The question was not whether our engine could beat Spark for the workloads most businesses actually run — it clearly could. The question was: how many companies are overpaying for distributed infrastructure they do not need?
Since late 2022, long-term disability has ended Scott's ability to work onsite. Rather than stepping back from engineering, he stepped all the way in — mastering Rust, C23, and production data engineering from first principles, then built the platform from scratch: analytics engine, distributed gateway, cloud infrastructure, and the product itself. No frameworks borrowed from elsewhere. Every layer owned.
The result is Duck Data Master — built for companies overpaying for cluster compute on workloads that belong on a single cloud instance.
1,029,157,161 rows in 7.67 seconds on a single cloud instance is not a marketing claim — it is a measured, reproducible benchmark on real NYC taxi data. 134M rows/sec average. 6.9B rows/sec peak. We build for verifiable performance, not impressive-sounding architecture diagrams.
Your files load into the analytics engine running inside your own cloud account. Nothing passes through Duck Data Master infrastructure. Data never leaves your cloud region — not during load, not during execution, not ever. Audit-safe by architecture.
Built on GCP from day one. Cloud Run scales to zero when idle and scales out automatically under load. Vertex AI handles the LLM inference. Firebase handles auth. You never manage infrastructure, you never hit capacity limits, and you never pay for idle compute.
We built a multi-node Spark cluster in Scala and benchmarked it honestly against our engine. We own every layer of the platform — analytics engine, gateway, cloud infrastructure, and product. No wrappers, no SaaS dependencies, no black boxes.
We do not sell methodology. We deliver outcomes. Upload your file, ask a question, get the answer. No analyst required. No cluster to manage. No waiting. The platform works or you cancel — no lock-in.
December 2022 – Present — Managing a long-term medical disability while aggressively evolving my technical stack. Currently seeking a remote role in Data Systems Engineering with Apache Spark (Scala).