Skip to content

mimetypes raises UnicodeDecodeError when map files are not unicode encoded #117807

@kulikjak

Description

@kulikjak

Bug report

Bug description:

Hi, when I use the mimetypes module and one of the known mime.types files include a non utf-8 encoded comment, the operation fails with UnicodeDecodeError:

......
  File "/usr/lib/python3.9/urllib/request.py", line 1506, in open_local_file
    mtype = mimetypes.guess_type(filename)[0]
  File "/usr/lib/python3.9/mimetypes.py", line 289, in guess_type
    init()
  File "/usr/lib/python3.9/mimetypes.py", line 362, in init
    db.read(file)
  File "/usr/lib/python3.9/mimetypes.py", line 204, in read
    self.readfp(fp, strict)
  File "/usr/lib/python3.9/mimetypes.py", line 215, in readfp
    line = fp.readline()
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 168: invalid start byte

The same can be forced with:

import mimetypes
mimetypes.init(files=["mimefile"]) 

and occurs because the file is opened in text mode expecting unicode encoding:

with open(filename, encoding='utf-8') as fp:

I am not sure whether there is a convention for which encoding the mime.types file will use, but I feel that at least comments should be allowed in any encoding?

CPython versions tested on:

3.9, 3.11

Operating systems tested on:

Linux, Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions