JSON vs YAML vs TOML: Which Config Format Fits Which Project?
Three letters, three worlds: JSON, YAML, and TOML all solve the same problem -- structured data in plain text -- but they do it in fundamentally different ways. Pick the wrong one for your project and you pay with bugs, merge conflicts, and frustrated teammates. Pick the right one and the config quietly disappears into the background. This comparison shows -- with real examples, hard specifications (RFC 8259, YAML 1.2.2, TOML 1.0.0), and a clear decision tree -- when each format is the right call, and what pitfalls each one carries.
Three formats, three origin stories
JSON was extracted from a subset of JavaScript by Douglas Crockford in the early 2000s and first standardized in RFC 4627 (2006). The current authoritative spec is RFC 8259 (2017), parallel to ECMA-404. The design idea: a minimal, unambiguous format that any programming language can parse without ambiguity. JSON was never meant for human-written configuration files -- it was always a wire format for machine-to-machine exchange.
YAML started in 2001 as a reaction to XML, designed by Clark Evans, Ingy doet Net, and Oren Ben-Kiki. The name -- originally Yet Another Markup Language, later recursively YAML Aint Markup Language -- captures its ambition to be human-readable first. The current spec YAML 1.2.2 (October 2021) is a strict superset of JSON. TOML (Toms Obvious Minimal Language) was introduced in 2013 by Tom Preston-Werner, co-founder of GitHub, because he found YAML too complex and JSON too strict for human configuration. Version 1.0.0 shipped in January 2021.
JSON: strict, fast, machine-friendly
JSON has six types: object, array, string, number, boolean, and null. Thats it. No comments, no trailing commas, no multi-line strings, no native date or time types. All strings must be double-quoted, all keys too. That strictness is simultaneously the biggest strength and biggest weakness: machine-side, JSON is parseable in under 100 lines of code; human-side, its tedious to write and impossible to comment.
Thats exactly why JSON dominates where programs talk to other programs: REST APIs, GraphQL responses, NoSQL stores like MongoDB, web storage, browser messaging. Where JSON gets painful: any sizeable hand-edited configuration file. Thats why Microsoft introduced JSONC (JSON with Comments) for VS Code settings, and why JSON5 exists as an unofficial extension. Neither is part of the standard, and stock parsers will reject them.
The following example shows a typical database configuration with feature flags in JSON:
{
"database": {
"host": "db.example.com",
"port": 5432,
"user": "app",
"password": "s3cret",
"ssl": true
},
"features": [
"dark_mode",
"beta_search",
"new_checkout"
],
"timeout_ms": 5000
} Notice: no comments allowed, the last list item must not have a trailing comma, every key needs quotes. Edit by hand, forget a comma, and the parser hands back a cryptic error instead of a helpful one.
YAML: maximally readable, maximally tricky
YAML uses indentation instead of braces, supports comments with #, multi-line strings with | (literal) or > (folded), anchors (&name) and aliases (*name) for reuse, and merge keys (<<:). Lists use a dash, maps use a colon. Once you know YAML you can write a Kubernetes manifest in half the keystrokes JSON would need.
That very convenience is also the source of a long list of infamous bugs. YAML tries to infer types automatically -- and sometimes gets it spectacularly wrong. The 1.2 spec tightened things up, but many tools still ship 1.1 parsers.
The same configuration in YAML, including an anchor for reuse:
# Database connection
database:
host: db.example.com
port: 5432
user: app
password: s3cret
ssl: true
# Active feature flags
features:
- dark_mode
- beta_search
- new_checkout
timeout_ms: 5000
The classic YAML pitfalls, each one of which has shown up in real production outages:
- The Norway problem: in YAML 1.1,
NOparses as boolean false. A list of country codescountries: [DE, FR, NO, IT]turns into["DE", "FR", false, "IT"]. Fix: quote strings explicitly or use a YAML 1.2 parser. - Version numbers as numbers:
version: 1.10parses as the float 1.1, not as a string. Storing semver values unquoted silently loses digits. - Tabs are forbidden: YAML allows only spaces for indentation. An editor that silently inserts tabs breaks the file -- the error message is usually useless.
- Trailing whitespace and encoding: an invisible space at end of line can silently truncate block strings. CRLF vs LF differs across parsers.
TOML: explicit, type-safe, unambiguous
TOML was designed with a clear goal: a minimal, readable, unambiguously parseable configuration format without YAMLs pitfalls. Keys can be written without quotes, strings are always explicit, sections live inside square brackets. TOML has strong types: integer, float, boolean, string, datetime (RFC 3339), array, and table. Date and time are native -- not the case in JSON or YAML 1.2.
Particularly useful: arrays of tables with double brackets [[users]] -- this lets you write lists of objects without indentation blowing up. TOML excels for flat to medium-nested configuration. At very deep nesting it gets ugly because the full section path must be repeated on every block.
The same configuration in TOML:
# Database connection
[database]
host = "db.example.com"
port = 5432
user = "app"
password = "s3cret"
ssl = true
# Active feature flags
features = [
"dark_mode",
"beta_search",
"new_checkout",
]
timeout_ms = 5000
Notice the differences: trailing commas are allowed in TOML (like Python), comments start with #, strings are always quoted, section headers make the database block visually obvious. This is exactly the format that won in Rust (Cargo.toml) and Python (pyproject.toml) -- not because its trendy, but because at medium-size configs it produces the fewest bugs.
Decision tree: which format when?
The choice is not a taste war, its a technical decision. Four rules of thumb that hold up in practice:
- Machine-to-machine API or HTTP data exchange: always JSON. Its the lowest common denominator, every language parses it natively, and performance trumps comfort when no human ever sees the payload.
- DevOps and infrastructure (Kubernetes, Helm, Ansible, GitHub Actions, Docker Compose): YAML. Not because its perfect, but because the whole tooling ecosystem lives there. Don't fight it -- bring linters and schema validation.
- Application configuration edited by humans: TOML. Clearly readable, hard to write wrong, native datetime, boolean, and integer types. Ideal for CLI tools, desktop apps, build systems.
- Follow the ecosystem: Rust project -> TOML. Helm chart -> YAML. JavaScript tool -> JSON or TOML. Don't reopen every decision -- consistent tooling outweighs personal preference.
Performance: why JSON wins when it counts
Parsing is not a polite contest. Benchmarks from the standard libraries of Python, Go, and Rust consistently show JSON parsers 5 to 10 times faster than YAML parsers on equally sized documents. The reason: JSON only has six types with no type inference, no anchors, no multi-line block scalars. TOML sits in the middle -- noticeably faster than YAML, slightly slower than JSON.
Practically: for a 200-line service config read once at boot, performance is irrelevant. For an API parsing JSON bodies across millions of requests per day, 5x to 10x is a hard business case -- which is exactly why nobody proposes YAML as an API format. Using a tool like the YAML-to-JSON converter to serialize YAML configs once and ship JSON at runtime pushes that overhead out of the hot path.
Schema validation: when you want to check what you read
JSON Schema (currently Draft 2020-12) is the most mature standard for structural validation. Implementations exist in every major language, and editor integration (VS Code, JetBrains) delivers autocomplete for complex configs. YAML typically reuses the same schemas because YAML 1.2 is strictly JSON-compatible -- validation happens after parsing.
TOML currently has no widely adopted schema standard. There is taplo (a TOML toolchain with schema support via JSON Schema), but its not a replacement for a versioned, normative schema format like JSON has. For configs that must be structurally validated (for example by a CI system), thats a real downside -- many projects work around it by converting TOML to JSON at build time and validating against a JSON Schema.
Tooling: a short field guide
Whichever format you chose -- the right toolchain saves hours. These four are the established standard:
- jq is the Swiss Army knife for JSON on the command line. Filter, transform, output raw text with
jq -r. Once you learnjq, you never write a shell script thatgreps JSON again. - yq is the YAML counterpart, with similar syntax and JSON-convertible output. Two implementations exist (mikefarahs yq and kislyuks yq) -- syntax differs slightly, mikefarah-yq covers most projects.
- taplo formats, validates, and beautifies TOML. Schema support, LSP integration, CLI -- a no-brainer for any TOML project beyond a single Cargo.toml.
- For quick in-browser conversion with no local install: the YAML-JSON converter, the JSON formatter, and the CSV-to-JSON converter on CalcSI all run fully client-side -- nothing leaves your browser.
Frequently asked questions
Why does Kubernetes use YAML if it has so many pitfalls?
Historical and pragmatic. When Kubernetes started in 2014/15, YAML was already the cloud-native standard (Puppet, Chef, Ansible). Plus Kubernetes config is deeply nested -- a 500-line YAML Helm chart would be around 800 lines as JSON. The community fix is not to change format, but to build tooling around it: schemas (kubectl explain, kube-linter), templates (Helm, Kustomize), validators (conftest, datree). If you write Kubernetes YAML, learn the standard pitfalls -- the YAML-JSON converter helps with quick cross-checks on a manifest.
Can I use comments in JSON?
Not in the standard. RFC 8259 forbids comments, period. Two popular extensions exist: JSONC (JSON with Comments, introduced by VS Code) allows // and /* */. JSON5 goes further with trailing commas, unquoted keys, hex literals. Both are useful but not official JSON -- a languages built-in parser will reject them. If you need comments and can choose the format, pick TOML or YAML instead.
Why isn't the old INI format good enough?
INI has no standardized spec -- each parser handles edge cases differently. More importantly: INI cant nest. A list of databases each with host, port, and credentials is not cleanly representable in INI; you end up with crutches like db1_host, db1_port, db2_host. Thats precisely the gap TOML fills, often described as INI with types and a spec -- unambiguous, nestable, with clear data types, without YAMLs complexity.
Comments
Comments are powered by Disqus. Before they load, we need your consent — Disqus is a third-party service and sets its own cookies.