Skip to content
This repository was archived by the owner on Dec 2, 2021. It is now read-only.

Commit 2440bc0

Browse files
updating 03-notes
1 parent 66b9a66 commit 2440bc0

File tree

1 file changed

+221
-0
lines changed

1 file changed

+221
-0
lines changed

03-dict-set/03-notes.ipynb

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,227 @@
6767
"isinstance(my_dict, abc.Mapping)\n",
6868
"#这里用isinstance而不是type来检查某个参数是否为dict类型,因为这个参数有可能不是dict,而是一个比较另类的映射类型。"
6969
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"What are hashable objects?\n",
76+
"- An object is hashable if it has a hash value which never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which **compare equal must have the same hash value**.\n",
77+
"\n",
78+
"str, bytes, numeric types are hashable. Tuple is hashable **if all its elements are hashable**.\n",
79+
"\n",
80+
"Normally, all user defined objects are hashable because their hash value is their id(). If an object implements a custom `__eq__()` that takes into account its internal state, it may be hashable only if all its attributes are immutable.\n",
81+
"\n",
82+
"Here are different ways to construct a dictionary:"
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 2,
88+
"metadata": {},
89+
"outputs": [
90+
{
91+
"data": {
92+
"text/plain": [
93+
"True"
94+
]
95+
},
96+
"execution_count": 2,
97+
"metadata": {},
98+
"output_type": "execute_result"
99+
}
100+
],
101+
"source": [
102+
"a = dict(one = 1, two = 2, three = 3)\n",
103+
"b = {'one':1,'two':2,\"three\":3}\n",
104+
"c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n",
105+
"d = dict([('two', 2), ('one', 1), ('three', 3)])\n",
106+
"e = dict({'three': 3, 'one': 1, 'two': 2}) \n",
107+
"a == b == c == d == e"
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {},
113+
"source": [
114+
"## 3.2 字典推导\n"
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": 4,
120+
"metadata": {},
121+
"outputs": [
122+
{
123+
"name": "stdout",
124+
"output_type": "stream",
125+
"text": [
126+
"d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])\n",
127+
"d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])\n",
128+
"d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])\n"
129+
]
130+
}
131+
],
132+
"source": [
133+
"# dialcodes.py\n",
134+
"# BEGIN DIALCODES\n",
135+
"# dial codes of the top 10 most populous countries\n",
136+
"DIAL_CODES = [\n",
137+
" (86, 'China'),\n",
138+
" (91, 'India'),\n",
139+
" (1, 'United States'),\n",
140+
" (62, 'Indonesia'),\n",
141+
" (55, 'Brazil'),\n",
142+
" (92, 'Pakistan'),\n",
143+
" (880, 'Bangladesh'),\n",
144+
" (234, 'Nigeria'),\n",
145+
" (7, 'Russia'),\n",
146+
" (81, 'Japan'),\n",
147+
" ]\n",
148+
"\n",
149+
"d1 = dict(DIAL_CODES) # <1>\n",
150+
"print('d1:', d1.keys())\n",
151+
"d2 = dict(sorted(DIAL_CODES)) # <2>\n",
152+
"print('d2:', d2.keys())\n",
153+
"d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1])) # <3>\n",
154+
"print('d3:', d3.keys())\n",
155+
"assert d1 == d2 and d2 == d3 # <4>\n",
156+
"# END DIALCODES\n"
157+
]
158+
},
159+
{
160+
"cell_type": "markdown",
161+
"metadata": {},
162+
"source": [
163+
"## 3.3 常见的映射方法\n",
164+
"对于`dict` `defaultdict` `OrderedDict`的常见方法举例\n",
165+
"\n",
166+
"> 后面两个数据类型是`dict`的变种,位于`collections`模块内\n",
167+
"\n",
168+
"`update(m, [**kargs])` duck typing, `m` can be a mapping or an iterable of key-value pairs. The method will first check if `m` has a `keys()` method, if not, it will iterate over `m` assuming it is an iterable of key-value pairs.\n",
169+
"\n",
170+
"`d[k]`和`d.get(k)`的区别在于:如果键`k`不在字典中,`d[k]`会报错,而`d.get(k,default)`会返回defualt值。"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": null,
176+
"metadata": {},
177+
"outputs": [],
178+
"source": [
179+
"# index0.py with slight modification\n",
180+
"\"\"\"Build an index mapping word -> list of occurrences\"\"\"\n",
181+
"\n",
182+
"import sys\n",
183+
"import re\n",
184+
"\n",
185+
"WORD_RE = re.compile(r'\\w+')\n",
186+
"\n",
187+
"index = {}\n",
188+
"with open(sys.argv[1], encoding='utf-8') as fp:\n",
189+
" for line_no, line in enumerate(fp, 1):\n",
190+
" for match in WORD_RE.finditer(line):\n",
191+
" word = match.group()\n",
192+
" column_no = match.start()+1\n",
193+
" location = (line_no, column_no)\n",
194+
" # this is ugly; coded like this to make a point\n",
195+
" occurrences = index.get(word, []) # <1>\n",
196+
" occurrences.append(location) # <2>\n",
197+
" index[word] = occurrences # <3>\n",
198+
"\n",
199+
"# print in alphabetical order\n",
200+
"for word in sorted(index, key=str.upper): # <4> \n",
201+
" print(word, index[word])\n",
202+
" \n",
203+
"# <4> 没有调用str.upper 而是把方法的引用传递给sorted\n",
204+
"# 以便在排序时将单词规范为统一形式"
205+
]
206+
},
207+
{
208+
"cell_type": "code",
209+
"execution_count": null,
210+
"metadata": {},
211+
"outputs": [],
212+
"source": [
213+
"import sys\n",
214+
"import re\n",
215+
"\n",
216+
"WORD_RE = re.compile(r'\\w+')\n",
217+
"\n",
218+
"index = {}\n",
219+
"with open(sys.argv[1], encoding='utf-8') as fp:\n",
220+
" for line_no, line in enumerate(fp, 1):\n",
221+
" for match in WORD_RE.finditer(line):\n",
222+
" word = match.group()\n",
223+
" column_no = match.start()+1\n",
224+
" location = (line_no, column_no)\n",
225+
" index.setdefault(word, []).append(location) # <1> only one line, one query on key\n",
226+
"\n",
227+
"# print in alphabetical order\n",
228+
"for word in sorted(index, key=str.upper):\n",
229+
" print(word, index[word])"
230+
]
231+
},
232+
{
233+
"cell_type": "markdown",
234+
"metadata": {},
235+
"source": [
236+
"## 3.4 映射的弹性键查询(处理找不到键的情况)\n",
237+
"- 通过`defaultdict`来实现\n",
238+
"- 自定义`dict`的子类,实现`__missing__`方法\n",
239+
"\n",
240+
"### 3.4.1 `defaultdict`: 处理找不到键的情况\n",
241+
"\n",
242+
"> 具体而言,在实例化一个`defaultdict`的时候,需要给构造方法提供一个可调用对象,这个可调用对象会在`__getitem__`碰到找不到的键的时候被调用,让`__getitem__`返回某种默认值。\n",
243+
"\n",
244+
"比如,我们新建了这样一个字典:`dd = defaultdict(list)`,如果键`'new-key'`在`dd`中还不存在的话,表达式`dd['new-key']`会按照以下的步骤来行事。\n",
245+
"\n",
246+
"1. 调用list() 建立一个新列表\n",
247+
"2. 把新列表作为值,`'new-key'`作为键,放到`defaultdict`中\n",
248+
"3. 返回列表的引用(?)\n",
249+
"\n",
250+
"> 而这个用来生成默认值的可调用对象存放在名`default_factory`的实例属性里。"
251+
]
252+
},
253+
{
254+
"cell_type": "code",
255+
"execution_count": null,
256+
"metadata": {},
257+
"outputs": [],
258+
"source": [
259+
"import sys\n",
260+
"import re\n",
261+
"import collections\n",
262+
"\n",
263+
"WORD_RE = re.compile(r'\\w+')\n",
264+
"\n",
265+
"index = collections.defaultdict(list) # <1> list method as default factory\n",
266+
"with open(sys.argv[1], encoding='utf-8') as fp:\n",
267+
" for line_no, line in enumerate(fp, 1):\n",
268+
" for match in WORD_RE.finditer(line):\n",
269+
" word = match.group()\n",
270+
" column_no = match.start()+1\n",
271+
" location = (line_no, column_no)\n",
272+
" index[word].append(location) # <2> can always success\n",
273+
"\n",
274+
"# print in alphabetical order\n",
275+
"for word in sorted(index, key=str.upper):\n",
276+
" print(word, index[word])"
277+
]
278+
},
279+
{
280+
"cell_type": "markdown",
281+
"metadata": {},
282+
"source": [
283+
"> 如果在创建`defaultdict` 的时候没有指定`default_factory`,查询不存在的键会触发`KeyError`。\n",
284+
"\n",
285+
"default_factory 只会在`__getitem__`里被调用,而在其他的方法里不会被调用。比如,当key不存在时,`dd.get(k)`会返回`None`,而不会调用`default_factory`。\n",
286+
"\n",
287+
"这一切的背后其实都靠的是`__missing__`方法。\n",
288+
"\n",
289+
"### 3.4.2 `__missing__`方法\n"
290+
]
70291
}
71292
],
72293
"metadata": {

0 commit comments

Comments
 (0)