// FEATURE DEEP DIVE · MAY 2026

How to Load CSV, Parquet, and JSON Into Cloud Analytics Without a Data Engineer

Scott Baker — Founder, Duck Data Master May 2026 · 7 min read · Databricks Certified Associate Developer · AWS Solutions Architect Associate

TL;DR: File format choice has a massive impact on query performance. Parquet is 10–50x faster than CSV for analytics. The Duck Data Master Ingest Tab loads any format — CSV, Parquet, JSON, Arrow, Delta — via file upload, URL, or direct cloud storage connection, with zero pipeline engineering required.

The first question any analytics workflow has to answer is: how do I get my data in? For most teams, the answer is a mess — a CSV export from the CRM, a Parquet file from the data lake, a JSON dump from an API, three different formats, three different loading workflows, and a data engineer in the loop for every one of them.

It doesn't have to be that way. Here's what you actually need to know about data formats and loading — and how to do it yourself.

File Format Comparison: What You're Actually Choosing

Format	Type	Analytics performance	File size	Human-readable?	Best for
Parquet	Columnar, binary	Excellent — 10–50x faster than CSV	Very small (4–10x compressed vs CSV)	No	Analytics workloads — use this whenever possible
CSV	Row-based, text	Slow — reads every column even if you only need one	Large	Yes	Excel exports, data sharing with non-technical users
JSON	Row-based, text	Slow — nested structures add parsing overhead	Large	Yes	API responses, config data, semi-structured records
Arrow (IPC)	Columnar, binary	Excellent — zero-copy in memory	Small	No	High-speed in-memory transfer between systems
Delta Lake	Parquet + transaction log	Excellent + ACID transactions	Small	No	Lakehouse workloads with upserts and deletes
Excel (.xlsx)	Row-based, binary	Slow, size-limited	Medium	With Excel only	Business reporting — convert to CSV/Parquet for analytics

The single most impactful thing most analytics teams can do: convert their CSVs to Parquet. A 1GB CSV becomes ~100MB Parquet. A query that scans 1GB CSV in 8 seconds scans the Parquet equivalent in under 200ms — because Parquet only reads the columns your query needs, while CSV reads every byte of every row.

Three Ways to Load Data in Duck Data Master

1. File Upload

Drag and drop any file — CSV, Parquet, JSON, Arrow, Excel — directly into the Ingest Tab. The analytics engine detects the format automatically, infers the schema, and loads the table. A 500MB Parquet file loads in seconds. Available on all plans.

2. Load From URL

Paste any public or authenticated URL — an S3 presigned link, a GitHub raw file, a public data portal download — and the engine fetches and loads it directly. No intermediate download to your laptop. Available on all plans.

3. Cloud Storage Connectors (Guru Plan)

Connect directly to your GCS bucket, S3 bucket, or Azure Blob container. Browse files, load tables directly from cloud storage without moving the data. The analytics engine reads from your cloud storage in place — no ETL pipeline, no copy, no data movement. Query files that live in your data lake directly from the Query Tab.

Schema Inference and Type Detection

One of the most painful parts of data loading — especially with CSVs — is schema inference. Is that column a date or a string? Is that number an integer or a float? Did the export tool wrap numbers in quotes?

The Duck Data Master Ingest Tab handles this automatically. It samples the file, infers column types, and creates the table with the correct schema. If a column has mixed types (a common CSV problem), it casts conservatively to VARCHAR to avoid data loss. You can override any inferred type from the schema editor before finalizing the load.

Performance: What Format Choice Actually Means

Operation	CSV (1M rows)	Parquet (1M rows)	Improvement
COUNT(*)	~400ms	2ms	200x faster
SUM on one column	~600ms	8ms	75x faster
GROUP BY + aggregate	~1.2s	38ms	32x faster
Full table scan (SELECT *)	~2s	~1.8s	Similar (reads all columns)

The full table scan is the only case where format barely matters — because you're reading every column anyway. For every real analytical query (aggregations, filters, GROUP BY), Parquet wins by a large margin.

Converting CSV to Parquet

If your data arrives as CSV, convert it once and store the Parquet version. In Python:

    import pandas as pd

    df = pd.read_csv('your_data.csv')

    df.to_parquet('your_data.parquet', index=False)

Or use the Python NL Mode in Duck Data Master — type "convert my loaded CSV table to Parquet and save it" and it generates and runs the conversion code automatically.

Load your data in minutes

Any format. Any size. Zero pipeline engineering. 3-day free trial.

Start Free Trial →

Questions? support@duckdatamaster.guru