FoundationsJanuary 15, 20266 min readBy Christopher Floied

Seven common file conversion mistakes (and how to avoid each one)

Lossy double-encoding. Ignoring transparency. Forgetting OCR. Trusting auto-detected types. The most common file conversion mistakes and the simple practices that prevent them.

File conversion looks simple in the moment — pick the input, pick the output, click convert. But there are a handful of recurring mistakes that turn what should be a clean operation into corrupted data, lost quality, or unexpected output. Here are the seven most common ones, why they happen, and how to avoid each.

1. Re-encoding lossy formats

MP3 to MP3, JPEG to JPEG, AAC to MP3 — every time you encode a lossy format, you throw away a little more data. The first encode is invisible to most listeners or viewers; the third or fourth becomes obvious. Repeated lossy encoding is sometimes called 'generation loss' (a term borrowed from VHS-era video copying) and it's the single most common quality-degradation mistake in conversion workflows.

Avoid by: keeping a lossless master (WAV, FLAC, PNG, TIFF) and only encoding to lossy formats once, at the moment of distribution. If you need to make changes, edit the master and re-encode from scratch — don't edit the lossy distribution copy.

2. Converting scanned PDFs without OCR

Trying to convert a scanned PDF (an image of pages saved as PDF) to DOCX without running OCR first results in a Word document containing pictures of pages, not editable text. The converter has nothing to work with — there's no text in the source, only pixels.

Avoid by: checking whether your PDF has selectable text before converting. If text selection works in your PDF viewer, you're good. If selecting grabs whole-page rectangles, you need OCR first. Use Adobe Acrobat, Apple Preview, or a free tool like OCRmyPDF to add a text layer, then convert.

3. Losing transparency by converting to the wrong format

Converting a transparent PNG to JPEG flattens all transparent pixels against a solid background (typically white). The result is a JPEG with a hard-edged white rectangle around what used to be a soft, anti-aliased shape. The original transparency is lost and can't be recovered without redoing the work.

Avoid by: checking whether your image has transparency before choosing a target format. If transparency matters, target a format that supports it: PNG, WebP, AVIF, TIFF. JPEG and GIF (in modern usage) don't support partial transparency.

4. Letting Excel auto-convert your CSV columns

Open a CSV in Excel and Excel will 'helpfully' interpret each column. ZIP codes lose leading zeros, phone numbers become scientific notation, gene names become dates, identifier strings become numbers with rounding errors. The data in the resulting XLSX is corrupted in subtle ways that can persist through downstream analysis without anyone noticing.

Avoid by: using Excel's Data → From Text/CSV import (instead of double-clicking the CSV), which lets you set explicit column types. Or convert CSV to XLSX with a tool that preserves text-typed columns by default. Never trust Excel's automatic type inference for data with identifiers, scientific values, or anything where leading zeros or precision matter.

5. Forgetting to embed fonts before exporting to PDF

If your DOCX or PPTX uses a custom font that isn't embedded, the resulting PDF may use a substituted font on devices that don't have the original installed. Layout shifts; the document looks different than intended. The fix happens at the source-document level, not at the conversion step.

Avoid by: enabling 'Embed fonts in the file' in your authoring tool's options before generating the PDF. In Microsoft Word: File → Options → Save → tick 'Embed fonts in the file'. In PowerPoint: same path. The embedded fonts increase the file size slightly but make the PDF self-contained.

6. Targeting overly modern formats without fallbacks

AVIF and WebP are excellent image formats — smaller, better quality. But they don't work in every context. Ship AVIF-only images and your audience on slightly-older browsers, email clients, or specialised tools sees broken images. Modern formats need fallbacks for production deployment.

Avoid by: using HTML's <picture> element with multiple <source> entries: AVIF first, WebP second, JPEG fallback. The browser picks whichever format it supports. For non-web contexts (email, print, legacy systems), stick with universally-supported formats: JPEG, PNG, PDF, MP3.

7. Trusting that 'lossless format = same file'

Converting between lossless formats preserves the data exactly, but doesn't preserve the file size or the metadata. A PNG and a TIFF of the same image have identical pixels but very different file sizes. A FLAC and a WAV have identical audio but different sizes. Metadata fields supported by one format but not the other are lost. The visible content is preserved; the wrapper changes meaningfully.

Avoid by: understanding that 'lossless' means 'no data loss', not 'identical file'. If file size matters, choose the lossless format with the best compression for your content type. If specific metadata matters (EXIF GPS data, ID3 tags, custom XMP fields), check that the target format preserves it.

The general principle

Most conversion mistakes come from misunderstanding what the conversion is actually doing. Lossy formats lose data; lossless ones don't. Scanned content needs OCR; digital content doesn't. Excel auto-detects types; CSV doesn't have any. Modern formats are smaller; older ones are more compatible. Understanding these distinctions ahead of time prevents nearly every common conversion error.

Browse our other guides for deeper dives on specific format families and conversion workflows.

Continue reading

More guides on file formats and conversion.

Foundations9 min