@@ -126,40 +126,30 @@ The Interpreter and Its Environment
126
126
Source Code Encoding
127
127
--------------------
128
128
129
- It is possible to use encodings different than ASCII in Python source files. The
130
- best way to do it is to put one more special comment line right after the ``#! ``
131
- line to define the source file encoding::
132
-
133
- # -*- coding: encoding -*-
129
+ By default, Python source files are treated as encoded in UTF-8. In that
130
+ encoding, characters of most languages in the world can be used simultaneously
131
+ in string literals, identifiers and comments --- although the standard library
132
+ only uses ASCII characters for identifiers, a convention that any portable code
133
+ should follow. To display all these characters properly, your editor must
134
+ recognize that the file is UTF-8, and it must use a font that supports all the
135
+ characters in the file.
134
136
137
+ To declare an encoding other than the default one, a special comment line
138
+ should be added as the *first * line of the file. The syntax is as follows::
135
139
136
- With that declaration, all characters in the source file will be treated as
137
- having the encoding *encoding *, and it will be possible to directly write
138
- Unicode string literals in the selected encoding. The list of possible
139
- encodings can be found in the Python Library Reference, in the section on
140
- :mod: `codecs `.
140
+ # -*- coding: encoding -*-
141
141
142
- For example, to write Unicode literals including the Euro currency symbol, the
143
- ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value
144
- 164. This script, when saved in the ISO-8859-15 encoding, will print the value
145
- 8364 (the Unicode code point corresponding to the Euro symbol) and then exit::
142
+ where *encoding * is one of the valid :mod: `codecs ` supported by Python.
146
143
147
- # -*- coding: iso-8859-15 -*-
144
+ For example, to declare that Windows-1252 encoding is to be used, the first
145
+ line of your source code file should be::
148
146
149
- currency = u"€"
150
- print ord(currency)
147
+ # -*- coding: cp-1252 -*-
151
148
152
- If your editor supports saving files as ``UTF-8 `` with a UTF-8 *byte order mark *
153
- (aka BOM), you can use that instead of an encoding declaration. IDLE supports
154
- this capability if ``Options/General/Default Source Encoding/UTF-8 `` is set.
155
- Notice that this signature is not understood in older Python releases (2.2 and
156
- earlier), and also not understood by the operating system for script files with
157
- ``#! `` lines (only used on Unix systems).
149
+ One exception to the *first line * rule is when the source code starts with a
150
+ :ref: `UNIX "shebang" line <tut-scripts >`. In this case, the encoding
151
+ declaration should be added as the second line of the file. For example::
158
152
159
- By using UTF-8 (either through the signature or an encoding declaration),
160
- characters of most languages in the world can be used simultaneously in string
161
- literals and comments. Using non-ASCII characters in identifiers is not
162
- supported. To display all these characters properly, your editor must recognize
163
- that the file is UTF-8, and it must use a font that supports all the characters
164
- in the file.
153
+ #!/usr/bin/env python
154
+ # -*- coding: cp-1252 -*-
165
155
0 commit comments