DataJanuary 29, 20267 min readBy MegaConvert Editorial

Converting CSV to Excel without breaking dates, numbers, or leading zeros

CSV-to-Excel looks like the simplest conversion possible. Until Excel decides your phone numbers are scientific notation and your ZIP codes have lost their leading zeros. A guide to the gotchas and how to avoid them.

CSV-to-Excel is the conversion most people assume is trivial, until they actually do it on real data and discover that Excel has 'helpfully' interpreted their data in ways they didn't ask for. ZIP codes lose their leading zeros. Phone numbers turn into scientific notation. Long strings of digits get rounded. Dates get reinterpreted as dates from the wrong locale.

Excel's auto-detection of column types is well-meaning but aggressive, and the wreckage it leaves in scientific data, identifiers, and international datasets is responsible for a non-trivial fraction of all data integrity issues in finance, biology, and analytics. (One famous study found that 20% of papers in genetics journals had data corruption from Excel auto-conversions.)

This guide explains what's actually happening and how to control it.

The fundamental problem: CSV has no types, Excel insists

CSV is plain text — every value is a string, including 'numbers' like 12345 and 'dates' like 2026-04-27. The CSV file format has no concept of types; it just stores characters separated by commas.

Excel, in contrast, has rich type inference: it tries to figure out whether each value is a number, date, currency, percentage, or text. When you open a CSV in Excel, it scans each column and guesses the type — and once it's guessed, it converts the displayed values into its internal numeric or date representation. This is the source of every CSV-to-Excel surprise.

Common corruption cases

Leading zeros disappear

ZIP codes (00501, 02134), product SKUs (007, 0042), and any identifier that's stored as digits with leading zeros gets interpreted as a number — and numbers don't have leading zeros, so they're stripped. The 5-digit ZIP becomes a 3-digit number with no obvious connection to the original.

Long numbers become scientific notation

Phone numbers, credit card numbers, IMEI numbers, scientific gene identifiers — anything with 12+ digits gets converted to scientific notation (1.23457E+13). The full precision is lost; the original number cannot be recovered from what Excel displays.

Numeric-looking strings get coerced

Gene names like 'OCT4', 'MARCH1', 'SEPT2' get auto-converted to dates ('1-Oct', '1-Mar', '1-Sep') because Excel interprets them as date abbreviations. This is so common in genetics that several journals require authors to confirm they haven't lost data to Excel auto-conversion.

Decimal separators flip

In European locales, the comma is the decimal separator and the period is the thousands separator. A CSV produced in the US with values like '1,234.56' (one thousand two hundred thirty-four and fifty-six cents) gets interpreted in a European Excel as '1234.56' or as broken values, depending on the column type. Conversely, European CSVs with '1.234,56' get broken in US Excel.

Date format depends on locale

A CSV value of '03/04/2026' is March 4 in the US (MM/DD/YYYY) and April 3 in most of the rest of the world (DD/MM/YYYY). Excel uses the operating system's regional settings to decide which interpretation to apply. The same CSV file produces different dates in different locales.

Solution 1: format columns as text before importing

The robust way to import CSV into Excel without auto-conversion: use Data → Get Data → From Text/CSV (or 'From File' → 'Text/CSV'). This opens the Power Query editor, which lets you specify each column's type explicitly before the data is imported. Mark columns containing identifiers, gene names, or anything with leading zeros as 'Text'. Excel will then preserve the values exactly as they appear in the CSV.

This is the right approach for any CSV with mixed-type data, and especially for any data where you don't fully control the source. Power Query's import dialog is the safety mechanism Excel needs but doesn't apply by default when you double-click a CSV.

Solution 2: prefix problem values in the CSV

If you control the CSV generation, you can prefix problem values with a single quote (') in the source. Excel treats single-quote-prefixed values as text and skips the auto-conversion. The single quote doesn't appear in the displayed value — it's just a hint to Excel.

This works but it's a CSV-meets-Excel hack rather than a proper CSV convention. Other tools that read the same CSV (Pandas, R, BigQuery, etc.) will see the single quote as a literal character and may need to be told to strip it. Use this trick only if Excel is the primary consumer.

Solution 3: convert directly to XLSX

If you convert your CSV to XLSX before opening, the XLSX file can carry explicit per-column type metadata. The conversion can preserve text-typed columns as text and numeric columns as numbers — without going through Excel's import-time auto-detection. This is fundamentally what MegaConvert's CSV-to-XLSX pipeline does.

The key is for the conversion tool to make conservative type-inference decisions. By default, MegaConvert keeps any column with leading zeros, mixed-format values, or 12+ digit numbers as text. Numeric-looking columns where every value is unambiguously a number become numeric. Date-looking columns become dates only when the format is unambiguous. The result is an XLSX that opens in Excel without auto-conversion drama.

Solution 4: use UTF-8 with BOM

Excel on Windows defaults to assuming a CSV is in Windows-1252 encoding. If your CSV has accented characters (José, Müller), special symbols (€, £), or non-Latin scripts (русский, 中文), they'll appear corrupted unless you save the CSV with a UTF-8 byte-order mark (BOM) at the start. The BOM tells Excel to treat the file as UTF-8.

Most modern CSV exporters can be configured to write a BOM. If you're producing CSVs that will end up in Excel, write them as 'UTF-8 with BOM' (sometimes labelled 'UTF-8 (with signature)' in tools). It's the single best one-line fix for international-character corruption in Excel.

The general rule

Excel is a presentation tool, not a data integrity tool. If your data must be preserved exactly as it appears — for scientific reproducibility, for legal records, for identifiers that downstream systems will join on — don't double-click a CSV in Excel. Use Power Query for explicit type control, or convert to XLSX with conservative type inference, or stay in CSV until you reach a tool that handles types correctly.

Convert CSV to XLSX — preserves text-typed columns, handles UTF-8 properly, doesn't lose your leading zeros.

Continue reading

More guides on file formats and conversion.

Documents9 min