-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Open
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibPython modules in the Lib dirPython modules in the Lib dirtopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
#83518 changed handling of non-ASCII characters in encodings.normalize_encoding()
, but it is still inconsistent with codecs.lookup()
, and not even self-consistent. For example:
>>> import encodings
>>> encodings.normalize_encoding('a¤b')
'a_b'
>>> encodings.normalize_encoding('aæb')
'ab'
>>> encodings.normalize_encoding('a-¤')
'a'
>>> encodings.normalize_encoding('a-æ')
'a_'
>>> encodings.normalize_encoding('a-¤-b')
'a_b'
>>> encodings.normalize_encoding('a-æ-b')
'a__b'
You can even get an underscore at the end or repeated underscores in the middle.
cc @malemburg, @vstinner, @shihai1991
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibPython modules in the Lib dirPython modules in the Lib dirtopic-unicodetype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Projects
Status
No status