|
67 | 67 | "isinstance(my_dict, abc.Mapping)\n",
|
68 | 68 | "#这里用isinstance而不是type来检查某个参数是否为dict类型,因为这个参数有可能不是dict,而是一个比较另类的映射类型。"
|
69 | 69 | ]
|
| 70 | + }, |
| 71 | + { |
| 72 | + "cell_type": "markdown", |
| 73 | + "metadata": {}, |
| 74 | + "source": [ |
| 75 | + "What are hashable objects?\n", |
| 76 | + "- An object is hashable if it has a hash value which never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which **compare equal must have the same hash value**.\n", |
| 77 | + "\n", |
| 78 | + "str, bytes, numeric types are hashable. Tuple is hashable **if all its elements are hashable**.\n", |
| 79 | + "\n", |
| 80 | + "Normally, all user defined objects are hashable because their hash value is their id(). If an object implements a custom `__eq__()` that takes into account its internal state, it may be hashable only if all its attributes are immutable.\n", |
| 81 | + "\n", |
| 82 | + "Here are different ways to construct a dictionary:" |
| 83 | + ] |
| 84 | + }, |
| 85 | + { |
| 86 | + "cell_type": "code", |
| 87 | + "execution_count": 2, |
| 88 | + "metadata": {}, |
| 89 | + "outputs": [ |
| 90 | + { |
| 91 | + "data": { |
| 92 | + "text/plain": [ |
| 93 | + "True" |
| 94 | + ] |
| 95 | + }, |
| 96 | + "execution_count": 2, |
| 97 | + "metadata": {}, |
| 98 | + "output_type": "execute_result" |
| 99 | + } |
| 100 | + ], |
| 101 | + "source": [ |
| 102 | + "a = dict(one = 1, two = 2, three = 3)\n", |
| 103 | + "b = {'one':1,'two':2,\"three\":3}\n", |
| 104 | + "c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n", |
| 105 | + "d = dict([('two', 2), ('one', 1), ('three', 3)])\n", |
| 106 | + "e = dict({'three': 3, 'one': 1, 'two': 2}) \n", |
| 107 | + "a == b == c == d == e" |
| 108 | + ] |
| 109 | + }, |
| 110 | + { |
| 111 | + "cell_type": "markdown", |
| 112 | + "metadata": {}, |
| 113 | + "source": [ |
| 114 | + "## 3.2 字典推导\n" |
| 115 | + ] |
| 116 | + }, |
| 117 | + { |
| 118 | + "cell_type": "code", |
| 119 | + "execution_count": 4, |
| 120 | + "metadata": {}, |
| 121 | + "outputs": [ |
| 122 | + { |
| 123 | + "name": "stdout", |
| 124 | + "output_type": "stream", |
| 125 | + "text": [ |
| 126 | + "d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])\n", |
| 127 | + "d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])\n", |
| 128 | + "d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])\n" |
| 129 | + ] |
| 130 | + } |
| 131 | + ], |
| 132 | + "source": [ |
| 133 | + "# dialcodes.py\n", |
| 134 | + "# BEGIN DIALCODES\n", |
| 135 | + "# dial codes of the top 10 most populous countries\n", |
| 136 | + "DIAL_CODES = [\n", |
| 137 | + " (86, 'China'),\n", |
| 138 | + " (91, 'India'),\n", |
| 139 | + " (1, 'United States'),\n", |
| 140 | + " (62, 'Indonesia'),\n", |
| 141 | + " (55, 'Brazil'),\n", |
| 142 | + " (92, 'Pakistan'),\n", |
| 143 | + " (880, 'Bangladesh'),\n", |
| 144 | + " (234, 'Nigeria'),\n", |
| 145 | + " (7, 'Russia'),\n", |
| 146 | + " (81, 'Japan'),\n", |
| 147 | + " ]\n", |
| 148 | + "\n", |
| 149 | + "d1 = dict(DIAL_CODES) # <1>\n", |
| 150 | + "print('d1:', d1.keys())\n", |
| 151 | + "d2 = dict(sorted(DIAL_CODES)) # <2>\n", |
| 152 | + "print('d2:', d2.keys())\n", |
| 153 | + "d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1])) # <3>\n", |
| 154 | + "print('d3:', d3.keys())\n", |
| 155 | + "assert d1 == d2 and d2 == d3 # <4>\n", |
| 156 | + "# END DIALCODES\n" |
| 157 | + ] |
| 158 | + }, |
| 159 | + { |
| 160 | + "cell_type": "markdown", |
| 161 | + "metadata": {}, |
| 162 | + "source": [ |
| 163 | + "## 3.3 常见的映射方法\n", |
| 164 | + "对于`dict` `defaultdict` `OrderedDict`的常见方法举例\n", |
| 165 | + "\n", |
| 166 | + "> 后面两个数据类型是`dict`的变种,位于`collections`模块内\n", |
| 167 | + "\n", |
| 168 | + "`update(m, [**kargs])` duck typing, `m` can be a mapping or an iterable of key-value pairs. The method will first check if `m` has a `keys()` method, if not, it will iterate over `m` assuming it is an iterable of key-value pairs.\n", |
| 169 | + "\n", |
| 170 | + "`d[k]`和`d.get(k)`的区别在于:如果键`k`不在字典中,`d[k]`会报错,而`d.get(k,default)`会返回defualt值。" |
| 171 | + ] |
| 172 | + }, |
| 173 | + { |
| 174 | + "cell_type": "code", |
| 175 | + "execution_count": null, |
| 176 | + "metadata": {}, |
| 177 | + "outputs": [], |
| 178 | + "source": [ |
| 179 | + "# index0.py with slight modification\n", |
| 180 | + "\"\"\"Build an index mapping word -> list of occurrences\"\"\"\n", |
| 181 | + "\n", |
| 182 | + "import sys\n", |
| 183 | + "import re\n", |
| 184 | + "\n", |
| 185 | + "WORD_RE = re.compile(r'\\w+')\n", |
| 186 | + "\n", |
| 187 | + "index = {}\n", |
| 188 | + "with open(sys.argv[1], encoding='utf-8') as fp:\n", |
| 189 | + " for line_no, line in enumerate(fp, 1):\n", |
| 190 | + " for match in WORD_RE.finditer(line):\n", |
| 191 | + " word = match.group()\n", |
| 192 | + " column_no = match.start()+1\n", |
| 193 | + " location = (line_no, column_no)\n", |
| 194 | + " # this is ugly; coded like this to make a point\n", |
| 195 | + " occurrences = index.get(word, []) # <1>\n", |
| 196 | + " occurrences.append(location) # <2>\n", |
| 197 | + " index[word] = occurrences # <3>\n", |
| 198 | + "\n", |
| 199 | + "# print in alphabetical order\n", |
| 200 | + "for word in sorted(index, key=str.upper): # <4> \n", |
| 201 | + " print(word, index[word])\n", |
| 202 | + " \n", |
| 203 | + "# <4> 没有调用str.upper 而是把方法的引用传递给sorted\n", |
| 204 | + "# 以便在排序时将单词规范为统一形式" |
| 205 | + ] |
| 206 | + }, |
| 207 | + { |
| 208 | + "cell_type": "code", |
| 209 | + "execution_count": null, |
| 210 | + "metadata": {}, |
| 211 | + "outputs": [], |
| 212 | + "source": [ |
| 213 | + "import sys\n", |
| 214 | + "import re\n", |
| 215 | + "\n", |
| 216 | + "WORD_RE = re.compile(r'\\w+')\n", |
| 217 | + "\n", |
| 218 | + "index = {}\n", |
| 219 | + "with open(sys.argv[1], encoding='utf-8') as fp:\n", |
| 220 | + " for line_no, line in enumerate(fp, 1):\n", |
| 221 | + " for match in WORD_RE.finditer(line):\n", |
| 222 | + " word = match.group()\n", |
| 223 | + " column_no = match.start()+1\n", |
| 224 | + " location = (line_no, column_no)\n", |
| 225 | + " index.setdefault(word, []).append(location) # <1> only one line, one query on key\n", |
| 226 | + "\n", |
| 227 | + "# print in alphabetical order\n", |
| 228 | + "for word in sorted(index, key=str.upper):\n", |
| 229 | + " print(word, index[word])" |
| 230 | + ] |
| 231 | + }, |
| 232 | + { |
| 233 | + "cell_type": "markdown", |
| 234 | + "metadata": {}, |
| 235 | + "source": [ |
| 236 | + "## 3.4 映射的弹性键查询(处理找不到键的情况)\n", |
| 237 | + "- 通过`defaultdict`来实现\n", |
| 238 | + "- 自定义`dict`的子类,实现`__missing__`方法\n", |
| 239 | + "\n", |
| 240 | + "### 3.4.1 `defaultdict`: 处理找不到键的情况\n", |
| 241 | + "\n", |
| 242 | + "> 具体而言,在实例化一个`defaultdict`的时候,需要给构造方法提供一个可调用对象,这个可调用对象会在`__getitem__`碰到找不到的键的时候被调用,让`__getitem__`返回某种默认值。\n", |
| 243 | + "\n", |
| 244 | + "比如,我们新建了这样一个字典:`dd = defaultdict(list)`,如果键`'new-key'`在`dd`中还不存在的话,表达式`dd['new-key']`会按照以下的步骤来行事。\n", |
| 245 | + "\n", |
| 246 | + "1. 调用list() 建立一个新列表\n", |
| 247 | + "2. 把新列表作为值,`'new-key'`作为键,放到`defaultdict`中\n", |
| 248 | + "3. 返回列表的引用(?)\n", |
| 249 | + "\n", |
| 250 | + "> 而这个用来生成默认值的可调用对象存放在名`default_factory`的实例属性里。" |
| 251 | + ] |
| 252 | + }, |
| 253 | + { |
| 254 | + "cell_type": "code", |
| 255 | + "execution_count": null, |
| 256 | + "metadata": {}, |
| 257 | + "outputs": [], |
| 258 | + "source": [ |
| 259 | + "import sys\n", |
| 260 | + "import re\n", |
| 261 | + "import collections\n", |
| 262 | + "\n", |
| 263 | + "WORD_RE = re.compile(r'\\w+')\n", |
| 264 | + "\n", |
| 265 | + "index = collections.defaultdict(list) # <1> list method as default factory\n", |
| 266 | + "with open(sys.argv[1], encoding='utf-8') as fp:\n", |
| 267 | + " for line_no, line in enumerate(fp, 1):\n", |
| 268 | + " for match in WORD_RE.finditer(line):\n", |
| 269 | + " word = match.group()\n", |
| 270 | + " column_no = match.start()+1\n", |
| 271 | + " location = (line_no, column_no)\n", |
| 272 | + " index[word].append(location) # <2> can always success\n", |
| 273 | + "\n", |
| 274 | + "# print in alphabetical order\n", |
| 275 | + "for word in sorted(index, key=str.upper):\n", |
| 276 | + " print(word, index[word])" |
| 277 | + ] |
| 278 | + }, |
| 279 | + { |
| 280 | + "cell_type": "markdown", |
| 281 | + "metadata": {}, |
| 282 | + "source": [ |
| 283 | + "> 如果在创建`defaultdict` 的时候没有指定`default_factory`,查询不存在的键会触发`KeyError`。\n", |
| 284 | + "\n", |
| 285 | + "default_factory 只会在`__getitem__`里被调用,而在其他的方法里不会被调用。比如,当key不存在时,`dd.get(k)`会返回`None`,而不会调用`default_factory`。\n", |
| 286 | + "\n", |
| 287 | + "这一切的背后其实都靠的是`__missing__`方法。\n", |
| 288 | + "\n", |
| 289 | + "### 3.4.2 `__missing__`方法\n" |
| 290 | + ] |
70 | 291 | }
|
71 | 292 | ],
|
72 | 293 | "metadata": {
|
|
0 commit comments