Skip to content

[lex] Provide unicode name for all control characters #7404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AlisdairM
Copy link
Contributor

This commit does not touch the new-line character as paper P2348. It resricts itself to consistent use of the unicode character name for space, horizontal tab, and vertical tab. Compared to PR #7359 it deliberately does not touch the grammar that would necessitate a review by core review. The intent is to rebase that PR if this one lands.

@AlisdairM
Copy link
Contributor Author

Note that I also did a review of all usage of the word "space" in [lex] and [cpp]. Some uses were not relevant, such as "name space". Others applied to the contents of literals, which might not be in the basic source encoding. This PR is considered to completely address the concern for control characters throughout the standard.

@cor3ntin
Copy link
Contributor

This looks reasonable and correct

source/lex.tex Outdated
@@ -824,7 +824,7 @@
\end{footnote}
operators, and other separators.
\indextext{whitespace}%
Blanks, horizontal and vertical tabs, newlines, formfeeds, and comments
Comments, \unicode{0020}{space}s, \unicode{0009}{character tabulation}s and \unicode{0009}{line tabulation}s, \unicode{000c}{form feed}s, and newlines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think plural works for the way we present Unicode chars. Please rephrase.

Also, we seem to use "new-line" elsewhere (e.g. near the changes above). Note the hyphen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically copied newlines from the original list, as I had seen both terms in the standard. Now the I search again, there are only 4 remaining, and half are in code comments. I will fix this one here, and file a separate PR to fix the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording for plurals attempted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it turns out the remaining three uses of "newline" are all in library clauses, which outnumber the one use of "new-line" in the library, so I will defer that question for now. The fix for new-line above is now in this PR though.

@AlisdairM
Copy link
Contributor Author

Responded to Jens's comments and force-pushed to avoid badness in the history.

@AlisdairM AlisdairM force-pushed the unicode_for_space branch from 758aa5d to ffdadc7 Compare July 17, 2025 14:16
@tkoeppe tkoeppe requested a review from jensmaurer July 17, 2025 14:34
This commit does not touch the new-line character as paper
P2348.  It resricts itself to consistent use of the unicode
character name for space, horizontal tab, and vertical tab.
Compared to PR cplusplus#7359 it deliberately does not touch the
grammar that would necessitate a review by core review.
The intent is to rebase that PR if this one lands.
@AlisdairM AlisdairM force-pushed the unicode_for_space branch from ffdadc7 to bd46297 Compare July 17, 2025 14:45
@tkoeppe
Copy link
Contributor

tkoeppe commented Jul 17, 2025

note to self: lex.phases, lex.token

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants