Documentation — Duck Data Master Analytics Platform
Quick Start
Duck Data Master Guru is self-service — one command deploys the full analytics stack to your cloud instance. You can go from signup to querying your data in under five minutes.
Sign Up & Trial
The 3-day free trial gives you full Guru access from minute one. No credit card is required to start.
Starting your trial
- Go to signup.duckdatamaster.guru
- Create an account with your email and password
- Select Duck Data Master Guru — $99/mo platform fee
- Enter payment details. You will not be charged until the 3-day trial ends
- Cancel any time before the trial ends and pay nothing
Trial converts automatically. If you don't cancel before day 3, your card is charged for the first month. You can cancel at any time from the account screen inside the app.
Ingest Tab
The Ingest tab is the first tab — step one of every pipeline. Three data entry points are available inline, no modals:
- File Upload — drag-and-drop or click to browse. Loads directly into the analytics engine on your instance.
- Local Path — enter an absolute file path on your instance (e.g.
/data/orders.parquet). - GCS Bucket Browser — browse your dedicated GCS bucket, create folders, delete files/folders, and load selected files directly. See GCS File Manager below.
Every file loaded becomes a named table queryable immediately in the Query, Transform, Profile, Join, ML, and Fuzzy tabs.
GCS Bucket File Manager Guru
Your Guru instance includes a dedicated GCS bucket, auto-connected via the instance's service account. The Ingest tab shows a full file manager for that bucket — no Cloud Console required.
Navigation
- Breadcrumb path at the top — click any segment to jump to it
../button — go up one folder level- Click any folder row to navigate into it
Operations
- 📁 New Folder — creates a folder at the current path (uses a
.keepplaceholder blob in GCS) - 🗑 Delete file — per-row delete button on every file row
- 🗑 Delete folder — deletes the folder and all blobs inside it recursively
- ☑ Select + Load Selected → — check any files and load them into the analytics engine in one click
Folders in GCS are prefixes, not real objects. The dashboard creates a folder/.keep placeholder to make the folder visible. Deleting a folder removes all blobs with that prefix, including the .keep file.
Extract Tab Guru
The Extract tab goes beyond standard file upload — connect to open table formats, run geospatial queries, and read cloud data lakes directly without pre-loading files into memory.
Spatial Analytics
The spatial extension enables full geospatial SQL with 50+ ST_ functions:
Supports GeoJSON, WKT, and coordinate point geometry types.
Delta Lake & Apache Iceberg
Read open table format data lakes directly from GCS or S3 — no Spark cluster required:
Direct Cloud Query (httpfs)
Query Parquet, CSV, or JSON files on S3/GCS/Azure without loading them into memory first — the engine streams what it needs:
Nothing lands permanently on disk. httpfs streams data directly for each query — ideal for large data lakes where you only need a subset of files per session.
File Upload
The Ingest tab accepts files via drag-and-drop or click to browse. Load multiple files in one session — each becomes a separate named table.
How it works
When you drop a file, it loads directly into the analytics engine running on your cloud instance. Your file never leaves your cloud environment. No upload to Duck Data Master servers, no data movement, no shared compute.
Table naming
The table name is derived from the filename. sales_2024.csv becomes the table sales_2024. You can query it immediately after loading.
Loading multiple files
Drop multiple files at once or load them one at a time. Each becomes a separate table. You can then JOIN across tables in a single query.
Load from URL
Paste any public HTTPS URL pointing to a supported file into the URL box below the drop zone and click Load. Give the table a name first.
The file is fetched and loaded directly into the analytics engine. It does not pass through our backend.
Cloud Storage Connectors Guru
Connect to files stored in Amazon S3, Google Cloud Storage, or Azure Blob Storage from the Extract tab (Direct Cloud Query section) or the Ingest tab (GCS bucket browser). Enter your credentials in the Extract tab to authenticate against any external bucket or S3 path.
Your credentials are never stored. They are used to authenticate your cloud instance directly against your bucket. The file loads from your cloud storage into your cloud instance. Duck Data Master servers never touch your data.
Amazon S3
- Access Key ID — your AWS IAM access key
- Secret Access Key — your AWS IAM secret
- Region — e.g.
us-east-1(defaults to us-east-1 if blank) - Path —
s3://your-bucket/path/to/file.parquet - Endpoint — optional, only needed for S3-compatible storage like MinIO
Google Cloud Storage
- Service Account JSON — paste the full contents of your
.jsonkey file - Path —
gs://your-bucket/path/to/file.parquet
Azure Blob Storage
- Account Name — your storage account name
- Account Key — your storage account access key
- Path —
az://your-container/path/to/file.parquet
Supported File Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Auto-detects delimiter, types, encoding |
| TSV | .tsv, .txt | Tab-separated values |
| Excel | .xlsx | First sheet loaded by default |
| JSON | .json | Array of objects or newline-delimited |
| NDJSON | .ndjson, .jsonl | Newline-delimited JSON |
| Parquet | .parquet | Column-oriented, ideal for large datasets |
| Apache Arrow | .arrow, .ipc | Zero-copy columnar format |
Query Tab — SQL NL Mode
The AI bar at the top of the dashboard converts plain-English questions into SQL queries. Select SQL mode, type your question, press Enter, and the generated SQL runs immediately — results appear in the Query tab as a paginated table.
How it works
Your question and the schema of your loaded tables are sent to our AI backend (Gemini on Vertex AI). The AI returns SQL only — your actual data rows are never sent to the AI.
Example questions
Results panel
Results appear in the Query tab with row count and elapsed time. From there you can: Save as Table (registers result as a named session table), Export CSV, or push to → Notebook to inject the result as a pandas DataFrame cell.
AI query limits
Guru subscribers get 2,000 AI queries per day (resets at midnight UTC). The counter appears in the top-right RAM/usage bar of the dashboard.
Query Tab — Python NL Mode
Switch the top bar to Python mode to ask questions that return executable Python — pandas transformations, matplotlib charts, statistical summaries, and more. The AI generates a complete Python script, runs it server-side on your instance, and streams stdout and figures back to the Query tab.
How it works
Your question, the schema of your loaded tables, and any SQL you've already run are sent to the AI. The AI returns a Python script using the pre-loaded con (analytics connection) object. The script runs in an isolated exec environment on your VM — output, print statements, and matplotlib/plotly figures all render inline.
Pre-loaded in every Python execution
Example questions
Sending results to Notebook
The → Notebook button in the Query tab injects the generated Python code directly into a new Notebook cell — ready to edit, extend, and re-run with full Jupyter-style keyboard shortcuts.
AI query limits
Python NL queries share the same 2,000/day Guru quota as SQL NL queries.
Chat Tab — Duck Master AI
Duck Master is a conversational AI assistant in the Chat tab (last tab, 💬). Unlike the NL bar at the top (which generates and runs one query at a time), Duck Master maintains a full conversation — ask follow-up questions, request data profiles, get cleaning suggestions, and iterate.
How to use it
Click the Chat tab. Type your question. Duck Master responds in conversation, referencing your loaded tables by name. Chat history is preserved in localStorage for the session.
Pipeline walkthrough
Ask Duck Master: "walk me through a data pipeline step by step" — he'll guide you tab by tab: Ingest → Extract → Query → Transform → Profile → Join → ML Score → Fuzzy → Export → PQC Sign → Notebook.
What Duck Master can do
- Profile a table — identify column types, null %, unique counts, value ranges
- Spot data quality issues — mixed types, suspicious nulls, case inconsistencies
- Write transform SQL —
CREATE OR REPLACE TABLEwithCAST,TRIM,REGEXP_REPLACE - Answer questions about your specific data — references actual column names and counts
- Generate analysis queries — pivot tables, time series, ranked lists, cohort analysis
- Inject answers into Notebook — click "→ Notebook" to send any response directly into a new notebook cell
Load a file before asking about your data. Once a table is loaded, Duck Master sees its schema and row count and gives much more specific, actionable responses.
Transform Tab — SQL Editor
The Transform tab is a full CodeMirror SQL editor with syntax highlighting. Write any SQL query or multi-statement script, then click Run or press Ctrl+Enter.
Keyboard shortcuts
Ctrl+Enter— run the current queryTab— insert 2-space indent
Results appear inline below the editor. Use CREATE OR REPLACE TABLE cleaned AS SELECT ... to save a transformed table for use in other tabs.
SQL Reference
Duck Data Master uses a full analytical SQL engine — not a subset. The following features are all supported:
Window functions
PIVOT
CTEs
Other capabilities
- Joins — INNER, LEFT, RIGHT, FULL, CROSS, ASOF
- Aggregates — SUM, COUNT, AVG, MEDIAN, PERCENTILE_CONT, STDDEV, VARIANCE
- Regex — REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_EXTRACT
- Time series — DATE_TRUNC, DATE_DIFF, AT TIME ZONE
- Nested data — LIST, STRUCT, UNNEST
- Data profiling — SUMMARIZE table_name
Export Tab
The Export tab handles downloads and cloud write-back. Select any loaded table, choose a format, and download or push to cloud storage.
Download formats
- CSV — opens in Excel, Sheets, or any downstream tool
- Parquet — columnar format ideal for large datasets and pipelines
- JSON — records-oriented, ready for APIs or document stores
Files are generated on your cloud instance and transferred directly to your browser — they never pass through Duck Data Master servers.
Write back to cloud storage
Use the GCS Write-Back section in the Export tab to push any table directly to your GCS bucket — or use COPY TO in the Transform tab for full control:
PQC-signed exports
Check Sign with ML-DSA-65 before exporting to produce a tamper-evident .sig file alongside the data file. See Post-Quantum Signing for full details.
Fuzzy Match Tab Guru
The Fuzzy tab finds approximate string matches across two tables using Jaro-Winkler similarity — without exact-match SQL. Practical for deduplication, entity resolution, and joining messy real-world data where names aren't standardized.
When to use it
- "Acme Corp" vs "ACME Corporation" vs "Acme Corp." — same company, different strings
- Customer name matching across two CRM exports
- Product name deduplication in a merged catalog
- Address matching without exact zip/street agreement
Workflow
- Select Table A and the string column to match from
- Select Table B and the string column to match against
- Set the similarity threshold (0.0–1.0 — default 0.85)
- Click Run Fuzzy Match
- Results show matched pairs with their Jaro-Winkler score — export or save as a table
Threshold guidance: 0.95+ for near-exact matches. 0.85–0.94 for typical name variants. 0.75–0.84 for loose matching (more false positives). Review results and adjust.
AI Notebook Tab Guru
The Notebook tab is a full code + markdown cell environment with Jupyter-compatible keyboard shortcuts. Use it for custom analysis scripts, Python pipelines, and annotation — all running on your dedicated instance.
Jupyter keyboard shortcuts
Shift+Enter— run cell and move to nextCtrl+Enter— run cell in placeEsc— enter command mode (amber border)Enter— enter edit mode (green border)A— insert cell above (command mode)B— insert cell below (command mode)M— convert cell to MarkdownY— convert cell to CodeD, D— delete cell (double-tap D within 500ms)↑ / ↓— navigate cells in command mode
Cell features
- Auto-growing cells — CodeMirror expands as you type, no scroll needed
- Collapsible cells — click ▼ in the left gutter to fold long outputs
- Cell number
[N]shown in the left gutter - Amber left border = selected in command mode · Green border = edit mode
AI cell assist
Use the ✦ Suggest button (top-right of toolbar) to have Duck Master write a cell based on your description. The "→ Notebook" button in the Query tab and Chat tab injects results directly into a new cell.
Save & export
Click Save .ipynb to download the notebook as a standard Jupyter .ipynb file. It can be reopened in JupyterLab on your instance or any Jupyter environment.
Join Builder Guru
Build cross-table joins without writing SQL. The Join tab presents a visual form:
- Select Table A and Table B from any loaded tables
- Choose the join key column from each table
- Choose join type: INNER, LEFT, RIGHT, or FULL OUTER
- Optionally name the result — it saves as a new table you can query immediately
The generated SQL is shown before execution. Results display inline and are saved as last_result for export or further querying.
Tip: Load multiple Parquet files from your data lake, join them in the Join Builder, then write the joined result back to S3/GCS via the ETL tab — full pipeline in the dashboard, no code.
ML Scoring Guru
Train and score machine learning models directly on any loaded table — no separate ML platform required. The ML tab in the dashboard provides end-to-end model training and inference.
Supported models
- Classification: Random Forest, Gradient Boosting, Logistic Regression
- Regression: Random Forest, Gradient Boosting, Linear Regression
Workflow
- Select the target column (what you want to predict)
- Select feature columns (numeric columns are used automatically)
- Choose task type (Classification or Regression) and model
- Set train/test split percentage
- Click Train & Score
Output
- Model accuracy (classification) or R² + MAE (regression) on the test split
- Feature importance bar chart (Random Forest and Gradient Boosting)
- All rows scored and saved as a new table:
<table>_scoredwith columnddm_prediction
Example: Load a customer churn CSV, select churned as the target, train a Random Forest — the dashboard writes customers_scored with a churn probability for every customer. Export to Parquet and write back to S3 — all in the dashboard.
RAM & Performance
Performance scales with your cloud instance size. The analytics engine uses up to 85% of available instance RAM by default. Right-size your instance to your workload — you can scale up or down at any time.
A RAM gauge in the top-right of the dashboard header shows current usage. If RAM is running low, clear tables you no longer need before loading more files.
Instance sizing guide: A 4 vCPU / 16 GB instance handles hundreds of millions of rows comfortably. For billion-row workloads, use 16+ vCPU / 64 GB RAM with SSD-backed storage. A 10 GB Parquet file on a 32 GB instance is routine.
Cloud Analytics Instance Guru
When you sign up, a dedicated GCP analytics instance is provisioned automatically — no command required. The full Duck Data Master stack is live in under 5 minutes.
What gets provisioned
- GCP Compute Engine VM (your choice of tier — Starter 4 vCPU to Guru 176 vCPU)
- Analytics engine with all extensions: httpfs, spatial, delta, iceberg
- Python virtual environment — pandas, pyarrow, polars, plotly, scikit-learn, dilithium-py
- 12-tab FastAPI + Vanilla JS analytics dashboard (Duck Master AI built in)
- JupyterLab for custom notebook pipelines
- Caddy + Let's Encrypt TLS — your instance gets a subdomain at
uid.inst.duckdatamaster.guru - Dedicated GCS bucket auto-connected via the instance's service account
- systemd services — dashboard on :8000, Jupyter on :8888, Caddy reverse-proxy
Supported tiers
Starter (c3-standard-4 · 4 vCPU · 16 GB) up to Guru Top (c3-standard-176 · 176 vCPU · 704 GB). All Intel Sapphire Rapids. Scale up or down at any time from the portal.
Start / Stop Your VM Guru
Your portal shows your instance state in real time — LIVE, STOPPING, or STOPPED. The portal auto-refreshes every 30 seconds so you never need to reload the page to see a state change.
Stopping your instance
Click Stop Instance in the portal. The VM shuts down within seconds. You are no longer billed for compute — only the disk charge continues (~$8/mo). Stop your instance whenever you're done for the day.
We actively encourage you to stop your instance when you're not using it. Our revenue doesn't depend on you forgetting. Stop it, save money.
Starting your instance
When your instance is stopped, the dashboard button changes to ▶ Start VM + Open Dashboard. Click it — the portal starts your VM, polls every 8 seconds, and restores the button to Open Analytics Dashboard ↗ as soon as the instance is live. Boot time is typically 20–40 seconds.
Crash recovery
If your instance crashes unexpectedly (not a manual stop), the system detects it within 15 minutes and restarts it automatically. You receive an email notification when this happens. Manual stops are never auto-restarted.
Health-check reboot
If your instance is running but the analytics dashboard stops responding, the system performs a hard reboot automatically — stop then start — and notifies you by email. This clears stuck processes without data loss.
Monthly Budget Control Guru
Set a monthly compute spending limit directly in the portal. The system enforces it automatically — no surprise bills.
Setting your budget
- In the portal, locate the Monthly Budget gauge
- Click the amber dollar amount (e.g. $100 ✎)
- Type your new monthly limit and press Enter
- The gauge rescales immediately to reflect your new limit
Budget can be set anywhere from $1 to $10,000. Changes take effect immediately — the next patrol cycle (within 15 minutes) will enforce the new value.
How enforcement works
- $5 remaining — you receive a warning email. Instance keeps running.
- $0 remaining (limit reached) — instance is stopped automatically and you receive a notification email. No further compute charges accrue.
- To resume: raise your budget in the portal, then start your instance manually.
Budget resets on the 1st of each month when a new billing period begins. The warning flag also resets — you'll receive a fresh $5 warning if you approach the limit again the following month.
Compute cost reference
Compute is billed at GCP list price + 10% — exactly what Google charges, passed through with a 10% margin. Approximate hourly rates by instance tier:
| Tier | vCPU / RAM | Rate | ~Daily (8 hrs) |
|---|---|---|---|
| Starter | 4 vCPU · 16 GB | $0.19/hr | ~$1.56 |
| Standard | 8 vCPU · 32 GB | $0.39/hr | ~$3.12 |
| Pro | 22 vCPU · 88 GB | $1.07/hr | ~$8.58 |
| Power | 44 vCPU · 176 GB | $2.14/hr | ~$17.15 |
| Ultra | 88 vCPU · 352 GB | $4.29/hr | ~$34.30 |
| Guru | 176 vCPU · 704 GB | $8.57/hr | ~$68.60 |
Stop your instance when you're done. A stopped instance costs ~$8/mo in disk only — zero compute.
Duck Master AI on Your Instance Guru
Duck Master AI is built into the dashboard — accessible from the NL bar at the top of every tab and from the dedicated Chat tab. It runs against your instance's data via a persistent API key written at provisioning time.
Only your question and table schema (column names and types) are sent to the AI backend. Your actual data rows never leave your instance.
Post-Quantum Signed Exports Guru
Every Guru cloud instance ships with an ML-DSA-65 signing keypair (NIST FIPS 204 — post-quantum secure). You can sign any exported file with one click and provide tamper-evident proof of data provenance to any downstream system.
Keypair location
~/.duckpqc/signing.sec— secret key (mode 600 — never share)~/.duckpqc/duckpqc.pub— public key (share with clients for verification)
Keys persist across dashboard restarts. Use the PQC Sign tab to manage the full keypair lifecycle without touching the command line.
Keypair lifecycle (PQC Sign tab)
- ⚡ Generate Keypair — creates a new ML-DSA-65 keypair on your instance
- ↻ Rotate — generates a new keypair, overwriting the old one (recipients will need the new public key)
- ↑ Save to Bucket — backs up the public key to your GCS bucket for safe-keeping
- ↓ Restore from Bucket — restores a previously saved keypair from GCS
- 🗑 Delete Keys — removes both keys from the instance (use before rotating on a shared system)
- Copy — copies the public key to your clipboard for sharing with recipients
Signing an export
- Open the Export tab in the dashboard
- Choose your format (CSV, Parquet, or JSON)
- Check Sign with ML-DSA-65 (NIST FIPS 204 · post-quantum)
- Click Prepare download
- Download the data file and the
.sigfile — send both to your recipient along withduckpqc.pub
Signature format
The signature covers: sha256(file) | filename | unix_timestamp — making each signature file-specific and time-stamped. Replay attacks are impossible.
What ML-DSA-65 means
- NIST FIPS 204 — standardized post-quantum digital signature algorithm
- Security level 3 — equivalent to AES-192, resistant to both classical and quantum attacks
- Signature size: ~3.3 KB. Public key: ~1.3 KB. Verification is fast (<1ms).
- No PKI, no certificate authority, no certificate chain — self-sovereign key management
Use cases
- Client deliverables — proof the report came from your instance, unmodified
- Regulatory audit trail — time-stamped, cryptographically verifiable data provenance
- Data supply chain — downstream systems verify inputs before processing
- Compliance — demonstrate data integrity without sharing raw data
No other analytics platform at this price point offers post-quantum signed exports. Databricks and Snowflake do not include this. This is a Guru-exclusive feature, included at no extra charge.
Privacy, Security & Compliance
Duck Data Master is a SaaS product. Your dedicated analytics instance runs in Google Cloud Platform (GCP) infrastructure managed by Duck Data Master — you do not need a GCP account. Your instance is isolated; no other customer shares your compute, memory, or storage.
Data flow
- Data files — uploaded directly into the analytics engine on your dedicated instance. They never touch Duck Data Master application servers — only your dedicated compute node.
- SQL execution — runs entirely on your dedicated instance. Query results are returned to your browser session.
- AI queries — your plain-English question and your table schema (column names and types only — never data rows) are sent to Google AI (Vertex AI / Gemini) via our Cloud Run backend. Your actual data is never sent to the AI.
- Cloud storage credentials — used only to authenticate your instance against your S3/GCS/Azure bucket. Credentials are used per-session and are never stored by Duck Data Master.
- Exports — generated on your instance, transferred directly to your browser. Never routed through our application servers.
GCP infrastructure & billing
Your dedicated VM runs in a Duck Data Master GCP project (Google Cloud Platform billing account). You are billed for compute at GCP list price + 10% markup via Stripe, transparently. You do not need your own GCP account — we provision, manage, and maintain the infrastructure for you.
GCP data center: Your instance runs in us-central1 (Council Bluffs, Iowa, USA) by default. Google Cloud's data centers are physically secured, SOC 2 audited, and ISO 27001 certified.
Infrastructure compliance
Duck Data Master runs on Google Cloud Platform infrastructure, which holds the following certifications and authorizations. These apply to the underlying infrastructure — not Duck Data Master as an application:
- SOC 2 Type II — GCP data centers are independently audited for security, availability, and confidentiality controls
- ISO 27001 — GCP holds ISO 27001 information security management certification
- PCI DSS Level 1 — GCP infrastructure meets PCI DSS standards (note: Duck Data Master does not process cardholder data — Stripe handles all payment processing)
- HIPAA Eligible — GCP offers HIPAA-eligible services and executes BAAs with qualifying customers. Duck Data Master does not execute BAAs directly; if your workload requires a BAA, contact us to discuss your options
- FedRAMP — Certain GCP services are FedRAMP authorized. Duck Data Master is not itself a FedRAMP-authorized product
Duck Data Master data practices
- Your data rows are never stored by Duck Data Master application systems — only on your dedicated instance
- Your dedicated instance is deleted when your subscription ends (data is your responsibility to export first)
- Duck Data Master does not sell, share, or analyze your data
- All traffic between your browser and your instance is TLS-encrypted (Let's Encrypt certificate, auto-renewed by Caddy)
Privacy by isolation, not policy. One customer per VM. Your compute, memory, and GCS bucket are isolated from all other customers by GCP's virtualization layer. There is no shared multi-tenant database holding your data.
Billing & Cancellation
Plan
- Guru — $99/month platform fee + GCP compute at cost + 10%. Dedicated GCP analytics instance, auto-provisioned. 12-tab analytics dashboard + JupyterLab. All file formats. S3/GCS/Azure connectors. Spatial analysis, Delta/Iceberg, Fuzzy Match, ML Scoring, PQC signed exports. Direct engineer support.
3-day free trial. No credit card required to start. Full Guru access from minute one.
Compute billing
Compute is billed at GCP list price + 10%, charged via Stripe. You are only billed for hours your instance is running. Set a monthly budget cap in the portal — the instance stops automatically when you hit your limit. See Monthly Budget Control for details.
Cancelling
You can cancel at any time from the account screen inside the app. After cancellation, access continues through the end of your current billing period. No cancellation fees.
Payments
Payments are processed by Stripe. Duck Data Master never stores your card details.
Support
Email support@duckdatamaster.guru — most issues are resolved automatically within minutes. Complex questions reach the engineer directly.
Guru subscribers get priority response. For billing issues, include your account email.