-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Closed
Labels
Description
Feature or enhancement
Currently, ElementTree.tostring(root, encoding="unicode", xml_declaration=True)
uses locale encoding.
I think ElementTree should use UTF-8, instead of locale encoding.
Example:
$ LANG=ja_JP.eucJP ./python.exe
Python 3.11.0a7+ (heads/bytes-alloc-dirty:7fbc7f6128, Apr 19 2022, 16:53:54) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import xml.etree.ElementTree as ET
>>> et = ET.fromstring("<t>hello</t>")
>>> ET.tostring(et, encoding="unicode", xml_declaration=True)
"<?xml version='1.0' encoding='eucJP'?>\n<t>hello</t>"
Code:
cpython/Lib/xml/etree/ElementTree.py
Lines 732 to 742 in bcf14ae
with _get_writer(file_or_filename, enc_lower) as write: | |
if method == "xml" and (xml_declaration or | |
(xml_declaration is None and | |
enc_lower not in ("utf-8", "us-ascii", "unicode"))): | |
declared_encoding = encoding | |
if enc_lower == "unicode": | |
# Retrieve the default encoding for the xml declaration | |
import locale | |
declared_encoding = locale.getpreferredencoding() | |
write("<?xml version='1.0' encoding='%s'?>\n" % ( | |
declared_encoding,)) |
Pitch
- UTF-8 is the most common encoding for XML.
- Locale encoding name (e.g.
cp932
oreucJP
) would be different from XML encoding name recommended by w3c (e.g.Shift_JIS
orEUC-JP
).