Chap3. Dictionaries and Sets
The dict type is a fundamental part of Python’s implementation.Class and instance attributes, module namespaces, and function keyword arguments are some of the core Python constructs represented by dictionaries in memory. The __builtins__.__dict__ stores all built-in types, objects, and functions.
Because of their crucial role, Python dicts are highly optimized.Hash tables are the engines behind Python’s high-performance dicts.Other built-in types based on hash tables are set and frozenset.
0x01 Modern dict Syntax
Dict Comprehension
A dictcomp (dict comprehension) builds a dict instance by taking key:value pairs from any iterable.
1 | dial_codes = [ |
country_dial.items()
-> [(k1,v1),(k2,v2)….]
Unpacking Mappings
函数形参前缀**表示以字典的形式接收,实参前缀**表示解包
This works when keys are all strings and unique across all arguments.要求键均为字符串且不重复
TypeError: keywords must be strings
TypeError:
__main__.dump()got multiple values for keyword argument ‘x’
1 | def dump(**kwargs): |
**也可以用在字典字面量内部
Later occurrences overwrite previous ones 允许键重复,但后面的键值会覆盖前面的键值
1 | print({'a': 0, **{'x': 1}, 'y': 2, **{'z': 3, 'x': 4}}) |
Merging Mappings with |
Python 3.9 supports using | and |= to merge mappings.
1 | d1 = {'a':1, 'b':3} |
0x02 Pattern Matching with Mappings
Thanks to destructuring, pattern matching is a powerful tool to process records structured like nested mappings and sequences, which we often need to read from JSON APIs and databases with semi-structured schemas, like MongoDB, EdgeDB, or PostgreSQL.
1 | def get_creators(record: dict) -> list: |
- The order of the keys in the patterns is irrelevant, even if the subject is an OrderedDict as b2. 键的顺序无关
- In contrast with sequence patterns, mapping patterns succeed on partial matches. 支持部分匹配(case中没有title字段)
- There is no need to use
**extrato match extra key-value pairs. 由于支持部分匹配,就不需要**extra来接收多余的键值对,当然也可以这么做
0x03 Standard API of Mapping Types
What Is Hashable
An object is hashable if it has a hash code which never changes during its lifetime (it needs a
__hash__()method), and can be compared to other objects (it needs an__eq__()method). Hashable objects which compare equal must have the same hash code.
- Numeric types and flat immutable types
strandbytesare all hashable - Container types are hashable if they are immutable and all contained objects are also hashable.
- A
frozensetis always hashable because every element it contains must be hashable - User-defined types are hashable by default
- their hash code is their
id() __eq__()method inherited from the object class simply compares the object IDs
- their hash code is their
Inserting or Updating Mutable Values
dict access with d[k] raises an error when k is not an existing key
以中括号形式访问,若key不存在会抛出异常
d.get(k, default)可以避免这个问题,找不到key时返回默认值
1 | import re |
输出
a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)] …
上面的代码统计了Zen of Python中每个单词出现的位置,列表中的每个元组表示(行号,列号)
可以看到每次循环搜索了两次字典。对于更新字典中的可变类型的值,有更优雅的写法——setdefault
1 | import re |
setdefault returns the value, so it can be updated without requiring a second search.
1 | my_dict.setdefault(key, []).append(new_value) |
0x04 Automatic Handling of Missing Keys
Sometimes it is convenient to have mappings that return some made-up value when a missing key is searched.
two approaches to this:
- use
defaultdictinstead of a plaindict - subclass
dictand add a__missing__method
defaultdict
A collections.defaultdict instance creates items with a default value on demand whenever a missing key is searched using d[k] syntax.
when instantiating a defaultdict, you provide a callable to produce a default value whenever
__getitem__is passed a nonexistent key argument.given a defaultdict created as
dd = defaultdict(list), if ‘new-key’ is not in dd, the expressiondd['new-key']does the following steps:
- Calls list() to create a new list.
- Inserts the list into dd using ‘new-key’ as key.
- Returns a reference to that list.
1 | import collections |
__missing__ method
if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.