-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
bpo-30588: document codecs.escape_decode #14747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -242,6 +242,13 @@ wider range of codecs when working with binary files: | |
:func:`iterencode`. | ||
|
||
|
||
.. function:: escape_decode(data, errors=None) | ||
|
||
Decode the bytes-like object *data* and return a tuple (decoded object, | ||
length consumed). This is useful for decoding ascii escape sequences mixed | ||
with unicode characters. | ||
Comment on lines
+248
to
+249
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is "mixed with unicode characters" supposed mean in this context? data is a bytes-like object, it can't contain unicode runes. We should include an example of what this does that is different than using one of the text encoding codecs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Matthieu Dartiailh described on bugs.python.org, an example of ascii decode characters mixed with unicode is Here is the difference: >>> codecs.unicode_escape_decode('Δ\nΔ')
('Î\x94\nÎ\x94', 5)
>>> codecs.escape_decode('Δ\nΔ')
(b'\xce\x94\n\xce\x94', 5)
>>> codecs.escape_decode('Δ\nΔ')[0].decode('utf-8')
'Δ\nΔ' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A real-world example. We can assume that many more homegrown 'solutions' exist. Wouldn't it be great if kludgy, slow, error-prone workarounds people have come up with were replaced with something elegant and Python-worthy? Please consider that this function is so rarely seen outside the Python developer world because it is kept almost a secret. |
||
|
||
|
||
The module also provides the following constants which are useful for reading | ||
and writing to platform dependent files: | ||
|
||
|
@@ -1313,7 +1320,10 @@ encodings. | |
| | | Latin-1 source code. | | ||
| | | Beware that Python source | | ||
| | | code actually uses UTF-8 | | ||
| | | by default. | | ||
| | | by default. This does not | | ||
| | | work in the general case, | | ||
| | | see: | | ||
| | | :func:`escape_decode`. | | ||
+--------------------+---------+---------------------------+ | ||
|
||
.. versionchanged:: 3.8 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bytes-like object" seems incorrect; it accepts strings too.