Skip to content

Commit 40ba60f

Browse files
committed
Issue python#29381: Clarify ordering of UNIX shebang line as source encoding line
1 parent 3b23004 commit 40ba60f

File tree

1 file changed

+19
-29
lines changed

1 file changed

+19
-29
lines changed

Doc/tutorial/interpreter.rst

Lines changed: 19 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -126,40 +126,30 @@ The Interpreter and Its Environment
126126
Source Code Encoding
127127
--------------------
128128

129-
It is possible to use encodings different than ASCII in Python source files. The
130-
best way to do it is to put one more special comment line right after the ``#!``
131-
line to define the source file encoding::
132-
133-
# -*- coding: encoding -*-
129+
By default, Python source files are treated as encoded in UTF-8. In that
130+
encoding, characters of most languages in the world can be used simultaneously
131+
in string literals, identifiers and comments --- although the standard library
132+
only uses ASCII characters for identifiers, a convention that any portable code
133+
should follow. To display all these characters properly, your editor must
134+
recognize that the file is UTF-8, and it must use a font that supports all the
135+
characters in the file.
134136

137+
To declare an encoding other than the default one, a special comment line
138+
should be added as the *first* line of the file. The syntax is as follows::
135139

136-
With that declaration, all characters in the source file will be treated as
137-
having the encoding *encoding*, and it will be possible to directly write
138-
Unicode string literals in the selected encoding. The list of possible
139-
encodings can be found in the Python Library Reference, in the section on
140-
:mod:`codecs`.
140+
# -*- coding: encoding -*-
141141

142-
For example, to write Unicode literals including the Euro currency symbol, the
143-
ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value
144-
164. This script, when saved in the ISO-8859-15 encoding, will print the value
145-
8364 (the Unicode code point corresponding to the Euro symbol) and then exit::
142+
where *encoding* is one of the valid :mod:`codecs` supported by Python.
146143

147-
# -*- coding: iso-8859-15 -*-
144+
For example, to declare that Windows-1252 encoding is to be used, the first
145+
line of your source code file should be::
148146

149-
currency = u"€"
150-
print ord(currency)
147+
# -*- coding: cp-1252 -*-
151148

152-
If your editor supports saving files as ``UTF-8`` with a UTF-8 *byte order mark*
153-
(aka BOM), you can use that instead of an encoding declaration. IDLE supports
154-
this capability if ``Options/General/Default Source Encoding/UTF-8`` is set.
155-
Notice that this signature is not understood in older Python releases (2.2 and
156-
earlier), and also not understood by the operating system for script files with
157-
``#!`` lines (only used on Unix systems).
149+
One exception to the *first line* rule is when the source code starts with a
150+
:ref:`UNIX "shebang" line <tut-scripts>`. In this case, the encoding
151+
declaration should be added as the second line of the file. For example::
158152

159-
By using UTF-8 (either through the signature or an encoding declaration),
160-
characters of most languages in the world can be used simultaneously in string
161-
literals and comments. Using non-ASCII characters in identifiers is not
162-
supported. To display all these characters properly, your editor must recognize
163-
that the file is UTF-8, and it must use a font that supports all the characters
164-
in the file.
153+
#!/usr/bin/env python
154+
# -*- coding: cp-1252 -*-
165155

0 commit comments

Comments
 (0)