-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Closed
Labels
stdlibPython modules in the Lib dirPython modules in the Lib dirtopic-parsertype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
Code which contains line breaks is not round-trip invariant:
import tokenize, io
source_code = r"""
1 + \
2
"""
tokens = list(tokenize.generate_tokens(io.StringIO(source_code).readline))
x = tokenize.untokenize(tokens)
print(x)
# 1 +\
# 2
Notice that the space between +
and \
is now missing. The current tokenizer code simply inserts a backslash when it encounters two subsequent tokens with a differeing row offset:
Lines 179 to 182 in 9c2bb7d
row_offset = row - self.prev_row | |
if row_offset: | |
self.tokens.append("\\\n" * row_offset) | |
self.prev_col = 0 |
I think this should be fixed. The docstring of tokenize.untokenize
says:
Round-trip invariant for full input:
Untokenized source will match input source exactly
To fix this, it will probably be necessary to inspect the raw line contents and count how much whitespace there is at the end of the line.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
stdlibPython modules in the Lib dirPython modules in the Lib dirtopic-parsertype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error