updating 03-notes

linearalgebrayhz · linearalgebrayhz · commit 2440bc02d23b · 2024-07-15T01:53:41.000+08:00
diff --git a/03-dict-set/03-notes.ipynb b/03-dict-set/03-notes.ipynb
@@ -67,6 +67,227 @@
     "isinstance(my_dict, abc.Mapping)\n",
     "#这里用isinstance而不是type来检查某个参数是否为dict类型，因为这个参数有可能不是dict，而是一个比较另类的映射类型。"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What are hashable objects?\n",
+    "- An object is hashable if it has a hash value which never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which **compare equal must have the same hash value**.\n",
+    "\n",
+    "str, bytes, numeric types are hashable. Tuple is hashable **if all its elements are hashable**.\n",
+    "\n",
+    "Normally, all user defined objects are hashable because their hash value is their id(). If an object implements a custom `__eq__()` that takes into account its internal state, it may be hashable only if all its attributes are immutable.\n",
+    "\n",
+    "Here are different ways to construct a dictionary:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "a = dict(one = 1, two = 2, three = 3)\n",
+    "b = {'one':1,'two':2,\"three\":3}\n",
+    "c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n",
+    "d = dict([('two', 2), ('one', 1), ('three', 3)])\n",
+    "e = dict({'three': 3, 'one': 1, 'two': 2}) \n",
+    "a == b == c == d == e"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.2 字典推导\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])\n",
+      "d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])\n",
+      "d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# dialcodes.py\n",
+    "# BEGIN DIALCODES\n",
+    "# dial codes of the top 10 most populous countries\n",
+    "DIAL_CODES = [\n",
+    "        (86, 'China'),\n",
+    "        (91, 'India'),\n",
+    "        (1, 'United States'),\n",
+    "        (62, 'Indonesia'),\n",
+    "        (55, 'Brazil'),\n",
+    "        (92, 'Pakistan'),\n",
+    "        (880, 'Bangladesh'),\n",
+    "        (234, 'Nigeria'),\n",
+    "        (7, 'Russia'),\n",
+    "        (81, 'Japan'),\n",
+    "    ]\n",
+    "\n",
+    "d1 = dict(DIAL_CODES)  # <1>\n",
+    "print('d1:', d1.keys())\n",
+    "d2 = dict(sorted(DIAL_CODES))  # <2>\n",
+    "print('d2:', d2.keys())\n",
+    "d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1]))  # <3>\n",
+    "print('d3:', d3.keys())\n",
+    "assert d1 == d2 and d2 == d3  # <4>\n",
+    "# END DIALCODES\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3 常见的映射方法\n",
+    "对于`dict` `defaultdict` `OrderedDict`的常见方法举例\n",
+    "\n",
+    "> 后面两个数据类型是`dict`的变种，位于`collections`模块内\n",
+    "\n",
+    "`update(m, [**kargs])` duck typing, `m` can be a mapping or an iterable of key-value pairs. The method will first check if `m` has a `keys()` method, if not, it will iterate over `m` assuming it is an iterable of key-value pairs.\n",
+    "\n",
+    "`d[k]`和`d.get(k)`的区别在于：如果键`k`不在字典中，`d[k]`会报错，而`d.get(k,default)`会返回defualt值。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# index0.py with slight modification\n",
+    "\"\"\"Build an index mapping word -> list of occurrences\"\"\"\n",
+    "\n",
+    "import sys\n",
+    "import re\n",
+    "\n",
+    "WORD_RE = re.compile(r'\\w+')\n",
+    "\n",
+    "index = {}\n",
+    "with open(sys.argv[1], encoding='utf-8') as fp:\n",
+    "    for line_no, line in enumerate(fp, 1):\n",
+    "        for match in WORD_RE.finditer(line):\n",
+    "            word = match.group()\n",
+    "            column_no = match.start()+1\n",
+    "            location = (line_no, column_no)\n",
+    "            # this is ugly; coded like this to make a point\n",
+    "            occurrences = index.get(word, [])  # <1>\n",
+    "            occurrences.append(location)       # <2>\n",
+    "            index[word] = occurrences          # <3>\n",
+    "\n",
+    "# print in alphabetical order\n",
+    "for word in sorted(index, key=str.upper):  # <4> \n",
+    "    print(word, index[word])\n",
+    "    \n",
+    "# <4> 没有调用str.upper 而是把方法的引用传递给sorted\n",
+    "#     以便在排序时将单词规范为统一形式"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "import re\n",
+    "\n",
+    "WORD_RE = re.compile(r'\\w+')\n",
+    "\n",
+    "index = {}\n",
+    "with open(sys.argv[1], encoding='utf-8') as fp:\n",
+    "    for line_no, line in enumerate(fp, 1):\n",
+    "        for match in WORD_RE.finditer(line):\n",
+    "            word = match.group()\n",
+    "            column_no = match.start()+1\n",
+    "            location = (line_no, column_no)\n",
+    "            index.setdefault(word, []).append(location)  # <1> only one line, one query on key\n",
+    "\n",
+    "# print in alphabetical order\n",
+    "for word in sorted(index, key=str.upper):\n",
+    "    print(word, index[word])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.4 映射的弹性键查询（处理找不到键的情况）\n",
+    "- 通过`defaultdict`来实现\n",
+    "- 自定义`dict`的子类，实现`__missing__`方法\n",
+    "\n",
+    "### 3.4.1 `defaultdict`: 处理找不到键的情况\n",
+    "\n",
+    "> 具体而言，在实例化一个`defaultdict`的时候，需要给构造方法提供一个可调用对象，这个可调用对象会在`__getitem__`碰到找不到的键的时候被调用，让`__getitem__`返回某种默认值。\n",
+    "\n",
+    "比如，我们新建了这样一个字典：`dd = defaultdict(list)`，如果键`'new-key'`在`dd`中还不存在的话，表达式`dd['new-key']`会按照以下的步骤来行事。\n",
+    "\n",
+    "1. 调用list() 建立一个新列表\n",
+    "2. 把新列表作为值，`'new-key'`作为键，放到`defaultdict`中\n",
+    "3. 返回列表的引用(?)\n",
+    "\n",
+    "> 而这个用来生成默认值的可调用对象存放在名`default_factory`的实例属性里。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "import re\n",
+    "import collections\n",
+    "\n",
+    "WORD_RE = re.compile(r'\\w+')\n",
+    "\n",
+    "index = collections.defaultdict(list)     # <1> list method as default factory\n",
+    "with open(sys.argv[1], encoding='utf-8') as fp:\n",
+    "    for line_no, line in enumerate(fp, 1):\n",
+    "        for match in WORD_RE.finditer(line):\n",
+    "            word = match.group()\n",
+    "            column_no = match.start()+1\n",
+    "            location = (line_no, column_no)\n",
+    "            index[word].append(location)  # <2> can always success\n",
+    "\n",
+    "# print in alphabetical order\n",
+    "for word in sorted(index, key=str.upper):\n",
+    "    print(word, index[word])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> 如果在创建`defaultdict` 的时候没有指定`default_factory`，查询不存在的键会触发`KeyError`。\n",
+    "\n",
+    "default_factory 只会在`__getitem__`里被调用，而在其他的方法里不会被调用。比如，当key不存在时，`dd.get(k)`会返回`None`，而不会调用`default_factory`。\n",
+    "\n",
+    "这一切的背后其实都靠的是`__missing__`方法。\n",
+    "\n",
+    "### 3.4.2 `__missing__`方法\n"
+   ]
   }
  ],
  "metadata": {