Skip to content

bpo-30588: document codecs.escape_decode #14747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion Doc/library/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,13 @@ wider range of codecs when working with binary files:
:func:`iterencode`.


.. function:: escape_decode(data, errors=None)

Decode the bytes-like object *data* and return a tuple (decoded object,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bytes-like object" seems incorrect; it accepts strings too.

length consumed). This is useful for decoding ascii escape sequences mixed
with unicode characters.
Comment on lines +248 to +249
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "mixed with unicode characters" supposed mean in this context? data is a bytes-like object, it can't contain unicode runes. We should include an example of what this does that is different than using one of the text encoding codecs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Matthieu Dartiailh described on bugs.python.org, an example of ascii decode characters mixed with unicode is 'Δ\nΔ'.

Here is the difference:

>>> codecs.unicode_escape_decode(\nΔ')
(\x94\nÎ\x94', 5)
>>> codecs.escape_decode(\nΔ')
(b'\xce\x94\n\xce\x94', 5)
>>> codecs.escape_decode(\nΔ')[0].decode('utf-8')
\nΔ'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A real-world example. We can assume that many more homegrown 'solutions' exist.

Wouldn't it be great if kludgy, slow, error-prone workarounds people have come up with were replaced with something elegant and Python-worthy?

Please consider that this function is so rarely seen outside the Python developer world because it is kept almost a secret.



The module also provides the following constants which are useful for reading
and writing to platform dependent files:

Expand Down Expand Up @@ -1313,7 +1320,10 @@ encodings.
| | | Latin-1 source code. |
| | | Beware that Python source |
| | | code actually uses UTF-8 |
| | | by default. |
| | | by default. This does not |
| | | work in the general case, |
| | | see: |
| | | :func:`escape_decode`. |
+--------------------+---------+---------------------------+

.. versionchanged:: 3.8
Expand Down