Skip to content

Commit 60a1b35

Browse files
committed
Issue python#12067: Rewrite Comparisons section in the language reference
Some of the details of comparing mixed types were incorrect or ambiguous. Added default behaviour and consistency suggestions for user-defined classes. Based on patch from Andy Maier.
1 parent 19048c3 commit 60a1b35

File tree

2 files changed

+169
-39
lines changed

2 files changed

+169
-39
lines changed

Doc/reference/expressions.rst

Lines changed: 161 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1058,10 +1058,6 @@ must be plain or long integers. The arguments are converted to a common type.
10581058

10591059

10601060
.. _comparisons:
1061-
.. _is:
1062-
.. _is not:
1063-
.. _in:
1064-
.. _not in:
10651061

10661062
Comparisons
10671063
===========
@@ -1101,48 +1097,158 @@ The forms ``<>`` and ``!=`` are equivalent; for consistency with C, ``!=`` is
11011097
preferred; where ``!=`` is mentioned below ``<>`` is also accepted. The ``<>``
11021098
spelling is considered obsolescent.
11031099

1100+
Value comparisons
1101+
-----------------
1102+
11041103
The operators ``<``, ``>``, ``==``, ``>=``, ``<=``, and ``!=`` compare the
1105-
values of two objects. The objects need not have the same type. If both are
1106-
numbers, they are converted to a common type. Otherwise, objects of different
1107-
types *always* compare unequal, and are ordered consistently but arbitrarily.
1108-
You can control comparison behavior of objects of non-built-in types by defining
1109-
a ``__cmp__`` method or rich comparison methods like ``__gt__``, described in
1110-
section :ref:`specialnames`.
1104+
values of two objects. The objects do not need to have the same type.
1105+
1106+
Chapter :ref:`objects` states that objects have a value (in addition to type
1107+
and identity). The value of an object is a rather abstract notion in Python:
1108+
For example, there is no canonical access method for an object's value. Also,
1109+
there is no requirement that the value of an object should be constructed in a
1110+
particular way, e.g. comprised of all its data attributes. Comparison operators
1111+
implement a particular notion of what the value of an object is. One can think
1112+
of them as defining the value of an object indirectly, by means of their
1113+
comparison implementation.
1114+
1115+
Types can customize their comparison behavior by implementing
1116+
a :meth:`__cmp__` method or
1117+
:dfn:`rich comparison methods` like :meth:`__lt__`, described in
1118+
:ref:`customization`.
1119+
1120+
The default behavior for equality comparison (``==`` and ``!=``) is based on
1121+
the identity of the objects. Hence, equality comparison of instances with the
1122+
same identity results in equality, and equality comparison of instances with
1123+
different identities results in inequality. A motivation for this default
1124+
behavior is the desire that all objects should be reflexive (i.e. ``x is y``
1125+
implies ``x == y``).
1126+
1127+
The default order comparison (``<``, ``>``, ``<=``, and ``>=``) gives a
1128+
consistent but arbitrary order.
11111129

11121130
(This unusual definition of comparison was used to simplify the definition of
11131131
operations like sorting and the :keyword:`in` and :keyword:`not in` operators.
11141132
In the future, the comparison rules for objects of different types are likely to
11151133
change.)
11161134

1117-
Comparison of objects of the same type depends on the type:
1118-
1119-
* Numbers are compared arithmetically.
1120-
1121-
* Strings are compared lexicographically using the numeric equivalents (the
1122-
result of the built-in function :func:`ord`) of their characters. Unicode and
1123-
8-bit strings are fully interoperable in this behavior. [#]_
1124-
1125-
* Tuples and lists are compared lexicographically using comparison of
1126-
corresponding elements. This means that to compare equal, each element must
1127-
compare equal and the two sequences must be of the same type and have the same
1128-
length.
1129-
1130-
If not equal, the sequences are ordered the same as their first differing
1131-
elements. For example, ``cmp([1,2,x], [1,2,y])`` returns the same as
1132-
``cmp(x,y)``. If the corresponding element does not exist, the shorter sequence
1133-
is ordered first (for example, ``[1,2] < [1,2,3]``).
1134-
1135-
* Mappings (dictionaries) compare equal if and only if their sorted (key, value)
1136-
lists compare equal. [#]_ Outcomes other than equality are resolved
1135+
The behavior of the default equality comparison, that instances with different
1136+
identities are always unequal, may be in contrast to what types will need that
1137+
have a sensible definition of object value and value-based equality. Such
1138+
types will need to customize their comparison behavior, and in fact, a number
1139+
of built-in types have done that.
1140+
1141+
The following list describes the comparison behavior of the most important
1142+
built-in types.
1143+
1144+
* Numbers of built-in numeric types (:ref:`typesnumeric`) and of the standard
1145+
library types :class:`fractions.Fraction` and :class:`decimal.Decimal` can be
1146+
compared within and across their types, with the restriction that complex
1147+
numbers do not support order comparison. Within the limits of the types
1148+
involved, they compare mathematically (algorithmically) correct without loss
1149+
of precision.
1150+
1151+
* Strings (instances of :class:`str` or :class:`unicode`)
1152+
compare lexicographically using the numeric equivalents (the
1153+
result of the built-in function :func:`ord`) of their characters. [#]_
1154+
When comparing an 8-bit string and a Unicode string, the 8-bit string
1155+
is converted to Unicode. If the conversion fails, the strings
1156+
are considered unequal.
1157+
1158+
* Instances of :class:`tuple` or :class:`list` can be compared only
1159+
within each of their types. Equality comparison across these types
1160+
results in unequality, and ordering comparison across these types
1161+
gives an arbitrary order.
1162+
1163+
These sequences compare lexicographically using comparison of corresponding
1164+
elements, whereby reflexivity of the elements is enforced.
1165+
1166+
In enforcing reflexivity of elements, the comparison of collections assumes
1167+
that for a collection element ``x``, ``x == x`` is always true. Based on
1168+
that assumption, element identity is compared first, and element comparison
1169+
is performed only for distinct elements. This approach yields the same
1170+
result as a strict element comparison would, if the compared elements are
1171+
reflexive. For non-reflexive elements, the result is different than for
1172+
strict element comparison.
1173+
1174+
Lexicographical comparison between built-in collections works as follows:
1175+
1176+
- For two collections to compare equal, they must be of the same type, have
1177+
the same length, and each pair of corresponding elements must compare
1178+
equal (for example, ``[1,2] == (1,2)`` is false because the type is not the
1179+
same).
1180+
1181+
- Collections are ordered the same as their
1182+
first unequal elements (for example, ``cmp([1,2,x], [1,2,y])`` returns the
1183+
same as ``cmp(x,y)``). If a corresponding element does not exist, the
1184+
shorter collection is ordered first (for example, ``[1,2] < [1,2,3]`` is
1185+
true).
1186+
1187+
* Mappings (instances of :class:`dict`) compare equal if and only if they have
1188+
equal `(key, value)` pairs. Equality comparison of the keys and elements
1189+
enforces reflexivity.
1190+
1191+
Outcomes other than equality are resolved
11371192
consistently, but are not otherwise defined. [#]_
11381193

11391194
* Most other objects of built-in types compare unequal unless they are the same
11401195
object; the choice whether one object is considered smaller or larger than
11411196
another one is made arbitrarily but consistently within one execution of a
11421197
program.
11431198

1199+
User-defined classes that customize their comparison behavior should follow
1200+
some consistency rules, if possible:
1201+
1202+
* Equality comparison should be reflexive.
1203+
In other words, identical objects should compare equal:
1204+
1205+
``x is y`` implies ``x == y``
1206+
1207+
* Comparison should be symmetric.
1208+
In other words, the following expressions should have the same result:
1209+
1210+
``x == y`` and ``y == x``
1211+
1212+
``x != y`` and ``y != x``
1213+
1214+
``x < y`` and ``y > x``
1215+
1216+
``x <= y`` and ``y >= x``
1217+
1218+
* Comparison should be transitive.
1219+
The following (non-exhaustive) examples illustrate that:
1220+
1221+
``x > y and y > z`` implies ``x > z``
1222+
1223+
``x < y and y <= z`` implies ``x < z``
1224+
1225+
* Inverse comparison should result in the boolean negation.
1226+
In other words, the following expressions should have the same result:
1227+
1228+
``x == y`` and ``not x != y``
1229+
1230+
``x < y`` and ``not x >= y`` (for total ordering)
1231+
1232+
``x > y`` and ``not x <= y`` (for total ordering)
1233+
1234+
The last two expressions apply to totally ordered collections (e.g. to
1235+
sequences, but not to sets or mappings). See also the
1236+
:func:`~functools.total_ordering` decorator.
1237+
1238+
* The :func:`hash` result should be consistent with equality.
1239+
Objects that are equal should either have the same hash value,
1240+
or be marked as unhashable.
1241+
1242+
Python does not enforce these consistency rules.
1243+
1244+
1245+
.. _in:
1246+
.. _not in:
11441247
.. _membership-test-details:
11451248

1249+
Membership test operations
1250+
--------------------------
1251+
11461252
The operators :keyword:`in` and :keyword:`not in` test for collection
11471253
membership. ``x in s`` evaluates to true if *x* is a member of the collection
11481254
*s*, and false otherwise. ``x not in s`` returns the negation of ``x in s``.
@@ -1192,6 +1298,13 @@ The operator :keyword:`not in` is defined to have the inverse true value of
11921298
operator: is not
11931299
pair: identity; test
11941300

1301+
1302+
.. _is:
1303+
.. _is not:
1304+
1305+
Identity comparisons
1306+
--------------------
1307+
11951308
The operators :keyword:`is` and :keyword:`is not` test for object identity: ``x
11961309
is y`` is true if and only if *x* and *y* are the same object. ``x is not y``
11971310
yields the inverse truth value. [#]_
@@ -1418,15 +1531,24 @@ groups from right to left).
14181531
cases, Python returns the latter result, in order to preserve that
14191532
``divmod(x,y)[0] * y + x % y`` be very close to ``x``.
14201533
1421-
.. [#] While comparisons between unicode strings make sense at the byte
1422-
level, they may be counter-intuitive to users. For example, the
1423-
strings ``u"\u00C7"`` and ``u"\u0043\u0327"`` compare differently,
1424-
even though they both represent the same unicode character (LATIN
1425-
CAPITAL LETTER C WITH CEDILLA). To compare strings in a human
1426-
recognizable way, compare using :func:`unicodedata.normalize`.
1427-
1428-
.. [#] The implementation computes this efficiently, without constructing lists or
1429-
sorting.
1534+
.. [#] The Unicode standard distinguishes between :dfn:`code points`
1535+
(e.g. U+0041) and :dfn:`abstract characters` (e.g. "LATIN CAPITAL LETTER A").
1536+
While most abstract characters in Unicode are only represented using one
1537+
code point, there is a number of abstract characters that can in addition be
1538+
represented using a sequence of more than one code point. For example, the
1539+
abstract character "LATIN CAPITAL LETTER C WITH CEDILLA" can be represented
1540+
as a single :dfn:`precomposed character` at code position U+00C7, or as a
1541+
sequence of a :dfn:`base character` at code position U+0043 (LATIN CAPITAL
1542+
LETTER C), followed by a :dfn:`combining character` at code position U+0327
1543+
(COMBINING CEDILLA).
1544+
1545+
The comparison operators on unicode strings compare at the level of Unicode code
1546+
points. This may be counter-intuitive to humans. For example,
1547+
``u"\u00C7" == u"\u0043\u0327"`` is ``False``, even though both strings
1548+
represent the same abstract character "LATIN CAPITAL LETTER C WITH CEDILLA".
1549+
1550+
To compare strings at the level of abstract characters (that is, in a way
1551+
intuitive to humans), use :func:`unicodedata.normalize`.
14301552
14311553
.. [#] Earlier versions of Python used lexicographic comparison of the sorted (key,
14321554
value) lists, but this was very expensive for the common case of comparing for

Misc/NEWS

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,14 @@ C API
7373

7474
- Issue #27867: Function PySlice_GetIndicesEx() is replaced with a macro.
7575

76+
Documentation
77+
-------------
78+
79+
- Issue #12067: Rewrite Comparisons section in the Expressions chapter of the
80+
language reference. Some of the details of comparing mixed types were
81+
incorrect or ambiguous. Added default behaviour and consistency suggestions
82+
for user-defined classes. Based on patch from Andy Maier.
83+
7684
Build
7785
-----
7886

0 commit comments

Comments
 (0)