流畅的Python-3
2023-07-08 14:39:36

Chap3. Dictionaries and Sets

The dict type is a fundamental part of Python’s implementation.Class and instance attributes, module namespaces, and function keyword arguments are some of the core Python constructs represented by dictionaries in memory. The __builtins__.__dict__ stores all built-in types, objects, and functions.

Because of their crucial role, Python dicts are highly optimized.Hash tables are the engines behind Python’s high-performance dicts.Other built-in types based on hash tables are set and frozenset.

0x01 Modern dict Syntax

Dict Comprehension

A dictcomp (dict comprehension) builds a dict instance by taking key:value pairs from any iterable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
dial_codes = [
(880, 'Bangladesh'),
(55, 'Brazil'),
(86, 'China'),
(91, 'India'),
(62, 'Indonesia'),
(81, 'Japan'),
(234, 'Nigeria'),
(92, 'Pakistan'),
(7, 'Russia'),
(1, 'United States'),
]
country_dial = {country: code for code, country in dial_codes}
print(country_dial)
# {'Bangladesh': 880, 'Brazil': 55, 'China': 86, 'India': 91, 'Indonesia': 62, 'Japan': 81, 'Nigeria': 234, 'Pakistan': 92, 'Russia': 7, 'United States': 1}
print({code: country.upper() for country, code in sorted(country_dial.items()) if code < 70})
# {55: 'BRAZIL', 62: 'INDONESIA', 7: 'RUSSIA', 1: 'UNITED STATES'}

country_dial.items()

-> [(k1,v1),(k2,v2)….]

Unpacking Mappings

函数形参前缀**表示以字典的形式接收,实参前缀**表示解包

This works when keys are all strings and unique across all arguments.要求键均为字符串且不重复

TypeError: keywords must be strings

TypeError: __main__.dump() got multiple values for keyword argument ‘x’

1
2
3
4
5
def dump(**kwargs):
print(kwargs)

dump(**{'x': 1}, y=2, **{'z': 3}) # dump(x=1, y=2, z=3)
# {'x': 1, 'y': 2, 'z': 3}

**也可以用在字典字面量内部

Later occurrences overwrite previous ones 允许键重复,但后面的键值会覆盖前面的键值

1
2
print({'a': 0, **{'x': 1}, 'y': 2, **{'z': 3, 'x': 4}})
# {'a': 0, 'x': 4, 'y': 2, 'z': 3}

Merging Mappings with |

Python 3.9 supports using | and |= to merge mappings.

1
2
3
4
>>> d1 = {'a':1, 'b':3}
>>> d2 = {'a':2, 'b':4, 'c':6}
>>> d1 | d2
{'a': 2, 'b': 4, 'c': 6}

0x02 Pattern Matching with Mappings

Thanks to destructuring, pattern matching is a powerful tool to process records structured like nested mappings and sequences, which we often need to read from JSON APIs and databases with semi-structured schemas, like MongoDB, EdgeDB, or PostgreSQL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def get_creators(record: dict) -> list:
match record:
case {'type': 'book', 'api': 2, 'authors': [*names]}:
return names
case {'type': 'book', 'api': 1, 'author': name}:
return [name]
case {'type': 'book'}:
raise ValueError(f"Invalid 'book' record: {record!r}")
case {'type': 'movie', 'director': name}:
return [name]
case _:
raise ValueError(f'Invalid record: {record!r}')

b1 = dict(api=1, author='Douglas Hofstadter', type='book', title='Gödel, Escher, Bach')
print(get_creators(b1))
['Douglas Hofstadter']

from collections import OrderedDict
b2 = OrderedDict(api=2, type='book', title='Python in a Nutshell', authors='Martelli Ravenscroft Holden'.split())
print(get_creators(b2))
['Martelli', 'Ravenscroft', 'Holden']
  • The order of the keys in the patterns is irrelevant, even if the subject is an OrderedDict as b2. 键的顺序无关
  • In contrast with sequence patterns, mapping patterns succeed on partial matches. 支持部分匹配(case中没有title字段)
  • There is no need to use **extra to match extra key-value pairs. 由于支持部分匹配,就不需要**extra来接收多余的键值对,当然也可以这么做

0x03 Standard API of Mapping Types

What Is Hashable

An object is hashable if it has a hash code which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash code.

  • Numeric types and flat immutable types str and bytes are all hashable
  • Container types are hashable if they are immutable and all contained objects are also hashable.
  • A frozenset is always hashable because every element it contains must be hashable
  • User-defined types are hashable by default
    • their hash code is their id()
    • __eq__() method inherited from the object class simply compares the object IDs

Inserting or Updating Mutable Values

dict access with d[k] raises an error when k is not an existing key

以中括号形式访问,若key不存在会抛出异常

d.get(k, default)可以避免这个问题,找不到key时返回默认值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import re

WORD_RE = re.compile(r'\w+')
index = {}
with open('zen.txt', 'r') as fp:
for line_no, line in enumerate(fp, 1):
for match in WORD_RE.finditer(line):
word = match.group()
column_no = match.start() + 1
location = (line_no, column_no)
occurrences = index.get(word, [])
occurrences.append(location)
index[word] = occurrences

for word in sorted(index, key=str.upper):
print(word, index[word])

输出

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)] …

上面的代码统计了Zen of Python中每个单词出现的位置,列表中的每个元组表示(行号,列号)

可以看到每次循环搜索了两次字典。对于更新字典中的可变类型的值,有更优雅的写法——setdefault

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import re

WORD_RE = re.compile(r'\w+')
index = {}
with open('zen.txt', 'r') as fp:
for line_no, line in enumerate(fp, 1):
for match in WORD_RE.finditer(line):
word = match.group()
column_no = match.start() + 1
location = (line_no, column_no)
index.setdefault(word, []).append(location)

for word in sorted(index, key=str.upper):
print(word, index[word])

setdefault returns the value, so it can be updated without requiring a second search.

1
2
3
4
5
my_dict.setdefault(key, []).append(new_value)
等价于
if key not in my_dict:
my_dict[key] = []
my_dict[key].append(new_value)

0x04 Automatic Handling of Missing Keys

Sometimes it is convenient to have mappings that return some made-up value when a missing key is searched.

two approaches to this:

  • use defaultdict instead of a plain dict
  • subclass dict and add a __missing__ method

defaultdict

A collections.defaultdict instance creates items with a default value on demand whenever a missing key is searched using d[k] syntax.

when instantiating a defaultdict, you provide a callable to produce a default value whenever __getitem__ is passed a nonexistent key argument.

given a defaultdict created as dd = defaultdict(list), if ‘new-key’ is not in dd, the expression dd['new-key'] does the following steps:

  1. Calls list() to create a new list.
  2. Inserts the list into dd using ‘new-key’ as key.
  3. Returns a reference to that list.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import collections
import re

WORD_RE = re.compile(r'\w+')
index = collections.defaultdict(list)
with open('zen.txt', 'r') as fp:
for line_no, line in enumerate(fp, 1):
for match in WORD_RE.finditer(line):
word = match.group()
column_no = match.start() + 1
location = (line_no, column_no)
index[word].append(location)

for word in sorted(index, key=str.upper):
print(word, index[word])

__missing__ method

if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.