-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[DOC] Tweaks for String#dump #13883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[DOC] Tweaks for String#dump #13883
Conversation
doc/string/dump.rdoc
Outdated
s # => "\a\b\t\n\v\f\r" | ||
s.dump # => "\"\\a\\b\\t\\n\\v\\f\\r\"" | ||
|
||
Multi-byte characters are rendered in unicode notation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only for Unicode encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that there are multi-byte characters that are not in Unicode encodings? If so, I'll need examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@peterzhu2118, I'll need help with this:
|
For example: 'тест'.dump # => "\"\\u0442\\u0435\\u0441\\u0442\""
'тест'.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")"
Sorry, I don't understand what you mean by this. |
Thanks, @peterzhu2118. What you've written above answers both questions (however poorly they're posed). |
s = 'hello' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"hello\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x00h\\x00e\\x00l\\x00l\\x00o\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"h\\x00e\\x00l\\x00l\\x00o\\x00\".dup.force_encoding(\"UTF-16LE\")" | ||
|
||
s = 'тест' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"\\u0442\\u0435\\u0441\\u0442\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF\\x04B\\x045\\x04A\\x04B\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"B\\x045\\x04A\\x04B\\x04\".dup.force_encoding(\"UTF-16LE\")" | ||
|
||
s = 'こんにちは' | ||
s.encoding # => #<Encoding:UTF-8> | ||
s.dump # => "\"\\u3053\\u3093\\u306B\\u3061\\u306F\"" | ||
s.encode('utf-16').dump # => "\"\\xFE\\xFF0S0\\x930k0a0o\".dup.force_encoding(\"UTF-16\")" | ||
s.encode('utf-16le').dump # => "\"S0\\x930k0a0o0\".dup.force_encoding(\"UTF-16LE\")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move the examples of non-UTF8 encodings to a separate section with some text describing it (e.g. using hexadecimal format and adding dup.force_encoding(<encoding name>)
. This is because non-UTF8 is more of an edge case rather than a commonly used case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the cited lines to the end. I think you want other changes, but I'm not sure what exactly is needed. Can you fix up one, as a guide for me?
No description provided.