// FEATURE DEEP DIVE · MAY 2026

How to Load CSV, Parquet, and JSON Into Cloud Analytics Without a Data Engineer

Scott Baker
Scott Baker — Founder, Duck Data Master
TL;DR: File format choice has a massive impact on query performance. Parquet is 10–50x faster than CSV for analytics. The Duck Data Master Ingest Tab loads any format — CSV, Parquet, JSON, Arrow, Delta — via file upload, URL, or direct cloud storage connection, with zero pipeline engineering required.

The first question any analytics workflow has to answer is: how do I get my data in? For most teams, the answer is a mess — a CSV export from the CRM, a Parquet file from the data lake, a JSON dump from an API, three different formats, three different loading workflows, and a data engineer in the loop for every one of them.

It doesn't have to be that way. Here's what you actually need to know about data formats and loading — and how to do it yourself.

File Format Comparison: What You're Actually Choosing

FormatTypeAnalytics performanceFile sizeHuman-readable?Best for
ParquetColumnar, binaryExcellent — 10–50x faster than CSVVery small (4–10x compressed vs CSV)NoAnalytics workloads — use this whenever possible
CSVRow-based, textSlow — reads every column even if you only need oneLargeYesExcel exports, data sharing with non-technical users
JSONRow-based, textSlow — nested structures add parsing overheadLargeYesAPI responses, config data, semi-structured records
Arrow (IPC)Columnar, binaryExcellent — zero-copy in memorySmallNoHigh-speed in-memory transfer between systems
Delta LakeParquet + transaction logExcellent + ACID transactionsSmallNoLakehouse workloads with upserts and deletes
Excel (.xlsx)Row-based, binarySlow, size-limitedMediumWith Excel onlyBusiness reporting — convert to CSV/Parquet for analytics

The single most impactful thing most analytics teams can do: convert their CSVs to Parquet. A 1GB CSV becomes ~100MB Parquet. A query that scans 1GB CSV in 8 seconds scans the Parquet equivalent in under 200ms — because Parquet only reads the columns your query needs, while CSV reads every byte of every row.

Three Ways to Load Data in Duck Data Master

1. File Upload

Drag and drop any file — CSV, Parquet, JSON, Arrow, Excel — directly into the Ingest Tab. The analytics engine detects the format automatically, infers the schema, and loads the table. A 500MB Parquet file loads in seconds. Available on all plans.

2. Load From URL

Paste any public or authenticated URL — an S3 presigned link, a GitHub raw file, a public data portal download — and the engine fetches and loads it directly. No intermediate download to your laptop. Available on all plans.

3. Cloud Storage Connectors (Guru Plan)

Connect directly to your GCS bucket, S3 bucket, or Azure Blob container. Browse files, load tables directly from cloud storage without moving the data. The analytics engine reads from your cloud storage in place — no ETL pipeline, no copy, no data movement. Query files that live in your data lake directly from the Query Tab.

Schema Inference and Type Detection

One of the most painful parts of data loading — especially with CSVs — is schema inference. Is that column a date or a string? Is that number an integer or a float? Did the export tool wrap numbers in quotes?

The Duck Data Master Ingest Tab handles this automatically. It samples the file, infers column types, and creates the table with the correct schema. If a column has mixed types (a common CSV problem), it casts conservatively to VARCHAR to avoid data loss. You can override any inferred type from the schema editor before finalizing the load.

Performance: What Format Choice Actually Means

OperationCSV (1M rows)Parquet (1M rows)Improvement
COUNT(*)~400ms2ms200x faster
SUM on one column~600ms8ms75x faster
GROUP BY + aggregate~1.2s38ms32x faster
Full table scan (SELECT *)~2s~1.8sSimilar (reads all columns)

The full table scan is the only case where format barely matters — because you're reading every column anyway. For every real analytical query (aggregations, filters, GROUP BY), Parquet wins by a large margin.

Converting CSV to Parquet

If your data arrives as CSV, convert it once and store the Parquet version. In Python:

import pandas as pd
df = pd.read_csv('your_data.csv')
df.to_parquet('your_data.parquet', index=False)

Or use the Python NL Mode in Duck Data Master — type "convert my loaded CSV table to Parquet and save it" and it generates and runs the conversion code automatically.

Load your data in minutes

Any format. Any size. Zero pipeline engineering. 3-day free trial.

Start Free Trial →

Questions? support@duckdatamaster.guru