Ad Space — Top Banner

UnicodeDecodeError

Python Programming Language

Severity: Moderate

What Does This Error Mean?

A UnicodeDecodeError means Python tried to read text from a file or data source but found bytes it could not convert to characters using the specified encoding. Text files store characters as numbers (bytes), and there are different systems (called encodings) for how those numbers map to characters. If the encoding you tell Python to use does not match the actual encoding of the file, the bytes look like gibberish.

Affected Models

  • Python 3.x
  • Python 2.x

Common Causes

  • Opening a file with the wrong encoding — the file is UTF-8 but you opened it as ASCII (or vice versa)
  • Opening a file created on a different operating system — Windows often uses cp1252, while Mac and Linux default to UTF-8
  • Downloading data from the internet where the server sends a different encoding than expected
  • Reading a binary file (like an image) as if it were text
  • A file that was saved with a special characters like é, ñ, or Chinese/Japanese characters using a non-UTF-8 encoding

How to Fix It

  1. When opening a file, specify the encoding explicitly. The most common fix is to use UTF-8: open('file.txt', encoding='utf-8')

    UTF-8 is the modern standard encoding and handles virtually all characters from all languages.

  2. If UTF-8 does not work and the file was created on Windows, try the Windows encoding: open('file.txt', encoding='cp1252') or open('file.txt', encoding='latin-1')

    latin-1 (also called ISO-8859-1) accepts any byte value without raising an error — useful for identifying what is in the file.

  3. Use the chardet library to automatically detect the file's encoding: install it with pip install chardet, then run chardet myfile.txt in your terminal.

    Once you know the encoding, use it in your open() call.

  4. As a temporary workaround, open the file with errors='ignore' or errors='replace': open('file.txt', encoding='utf-8', errors='ignore'). This skips or replaces unreadable characters.

    Use errors='ignore' if you just want to skip bad characters. Use errors='replace' to substitute them with a placeholder (?).

  5. If you control the file that is being created, always save it as UTF-8. Most text editors have an option to choose encoding when saving — select UTF-8 (without BOM).

    UTF-8 without BOM is the safest choice for cross-platform compatibility.

When to Call a Professional

UnicodeDecodeErrors are always something you can fix yourself. The most important step is figuring out what encoding the file actually uses. The chardet library can detect the encoding automatically.

Frequently Asked Questions

What is UTF-8 and why is it so common?

UTF-8 is a text encoding that can represent every character from every language in the world. It uses between 1 and 4 bytes per character, and is backward-compatible with ASCII. It became the universal standard for the web and modern software because it handles all languages without conflict.

Why does the same file work on my colleague's computer but fail on mine?

Different operating systems use different default encodings. Windows defaults to cp1252 (or similar), while macOS and Linux default to UTF-8. If your colleague saved the file on Windows and you open it on Linux without specifying the encoding, Python tries UTF-8 first and fails on the Windows-specific characters.

What is the difference between UnicodeDecodeError and UnicodeEncodeError?

UnicodeDecodeError happens when reading text — converting bytes into characters fails. UnicodeEncodeError happens when writing text — converting characters into bytes fails. Decoding is reading in. Encoding is writing out.