流畅的Python-3

2023-07-08 14:39:36 #python #code

Chap3. Dictionaries and Sets

The dict type is a fundamental part of Python’s implementation.Class and instance attributes, module namespaces, and function keyword arguments are some of the core Python constructs represented by dictionaries in memory. The __builtins__.__dict__ stores all built-in types, objects, and functions.

Because of their crucial role, Python dicts are highly optimized.Hash tables are the engines behind Python’s high-performance dicts.Other built-in types based on hash tables are set and frozenset.

0x01 Modern dict Syntax

Dict Comprehension

A dictcomp (dict comprehension) builds a dict instance by taking key:value pairs from any iterable.

dial_codes = [
    (880, 'Bangladesh'),
    (55, 'Brazil'),
    (86, 'China'),
    (91, 'India'),
    (62, 'Indonesia'),
    (81, 'Japan'),
    (234, 'Nigeria'),
    (92, 'Pakistan'),
    (7, 'Russia'),
    (1, 'United States'),
]
country_dial = {country: code for code, country in dial_codes}
print(country_dial)
# {'Bangladesh': 880, 'Brazil': 55, 'China': 86, 'India': 91, 'Indonesia': 62, 'Japan': 81, 'Nigeria': 234, 'Pakistan': 92, 'Russia': 7, 'United States': 1}
print({code: country.upper() for country, code in sorted(country_dial.items()) if code < 70})
# {55: 'BRAZIL', 62: 'INDONESIA', 7: 'RUSSIA', 1: 'UNITED STATES'}

country_dial.items()

-> [(k1,v1),(k2,v2)….]

Unpacking Mappings

函数形参前缀**表示以字典的形式接收，实参前缀**表示解包

This works when keys are all strings and unique across all arguments.要求键均为字符串且不重复

TypeError: keywords must be strings

TypeError: __main__.dump() got multiple values for keyword argument ‘x’

def dump(**kwargs):
    print(kwargs)

dump(**{'x': 1}, y=2, **{'z': 3}) # dump(x=1, y=2, z=3)
# {'x': 1, 'y': 2, 'z': 3}

**也可以用在字典字面量内部

Later occurrences overwrite previous ones 允许键重复，但后面的键值会覆盖前面的键值

1 2	print({'a': 0, {'x': 1}, 'y': 2, {'z': 3, 'x': 4}}) # {'a': 0, 'x': 4, 'y': 2, 'z': 3}

Merging Mappings with `|`

Python 3.9 supports using | and |= to merge mappings.

>>> d1 = {'a':1, 'b':3}
>>> d2 = {'a':2, 'b':4, 'c':6}
>>> d1 | d2
{'a': 2, 'b': 4, 'c': 6}

0x02 Pattern Matching with Mappings

Thanks to destructuring, pattern matching is a powerful tool to process records structured like nested mappings and sequences, which we often need to read from JSON APIs and databases with semi-structured schemas, like MongoDB, EdgeDB, or PostgreSQL.

def get_creators(record: dict) -> list:
    match record:
        case {'type': 'book', 'api': 2, 'authors': [*names]}:
            return names
        case {'type': 'book', 'api': 1, 'author': name}:
            return [name]
        case {'type': 'book'}:
            raise ValueError(f"Invalid 'book' record: {record!r}")
        case {'type': 'movie', 'director': name}:
            return [name]
        case _:
            raise ValueError(f'Invalid record: {record!r}')
            
b1 = dict(api=1, author='Douglas Hofstadter', type='book', title='Gödel, Escher, Bach')
print(get_creators(b1))
['Douglas Hofstadter']

from collections import OrderedDict
b2 = OrderedDict(api=2, type='book', title='Python in a Nutshell', authors='Martelli Ravenscroft Holden'.split())
print(get_creators(b2))
['Martelli', 'Ravenscroft', 'Holden']

The order of the keys in the patterns is irrelevant, even if the subject is an OrderedDict as b2. 键的顺序无关
In contrast with sequence patterns, mapping patterns succeed on partial matches. 支持部分匹配（case中没有title字段）
There is no need to use **extra to match extra key-value pairs. 由于支持部分匹配，就不需要**extra来接收多余的键值对，当然也可以这么做

0x03 Standard API of Mapping Types

What Is Hashable

An object is hashable if it has a hash code which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash code.

Numeric types and flat immutable types str and bytes are all hashable
Container types are hashable if they are immutable and all contained objects are also hashable.
A frozenset is always hashable because every element it contains must be hashable
User-defined types are hashable by default
- their hash code is their id()
- __eq__() method inherited from the object class simply compares the object IDs

Inserting or Updating Mutable Values

dict access with d[k] raises an error when k is not an existing key

以中括号形式访问，若key不存在会抛出异常

d.get(k, default)可以避免这个问题，找不到key时返回默认值

import re

WORD_RE = re.compile(r'\w+')
index = {}
with open('zen.txt', 'r') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            occurrences = index.get(word, [])
            occurrences.append(location)
            index[word] = occurrences

for word in sorted(index, key=str.upper):
    print(word, index[word])

输出

a [(19, 48), (20, 53)]
Although [(11, 1), (16, 1), (18, 1)]
ambiguity [(14, 16)]
and [(15, 23)]
are [(21, 12)]
aren [(10, 15)]
at [(16, 38)]
bad [(19, 50)]
be [(15, 14), (16, 27), (20, 50)] …

上面的代码统计了Zen of Python中每个单词出现的位置，列表中的每个元组表示(行号，列号)

可以看到每次循环搜索了两次字典。对于更新字典中的可变类型的值，有更优雅的写法——setdefault

import re

WORD_RE = re.compile(r'\w+')
index = {}
with open('zen.txt', 'r') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            index.setdefault(word, []).append(location)

for word in sorted(index, key=str.upper):
    print(word, index[word])

setdefault returns the value, so it can be updated without requiring a second search.

my_dict.setdefault(key, []).append(new_value)
等价于
if key not in my_dict:
 my_dict[key] = []
my_dict[key].append(new_value)

0x04 Automatic Handling of Missing Keys

Sometimes it is convenient to have mappings that return some made-up value when a missing key is searched.

two approaches to this:

use defaultdict instead of a plain dict
subclass dict and add a __missing__ method

defaultdict

A collections.defaultdict instance creates items with a default value on demand whenever a missing key is searched using d[k] syntax.

when instantiating a defaultdict, you provide a callable to produce a default value whenever __getitem__ is passed a nonexistent key argument.

given a defaultdict created as dd = defaultdict(list), if ‘new-key’ is not in dd, the expression dd['new-key'] does the following steps:

Calls list() to create a new list.

Inserts the list into dd using ‘new-key’ as key.

Returns a reference to that list.

import collections
import re

WORD_RE = re.compile(r'\w+')
index = collections.defaultdict(list)
with open('zen.txt', 'r') as fp:
    for line_no, line in enumerate(fp, 1):
        for match in WORD_RE.finditer(line):
            word = match.group()
            column_no = match.start() + 1
            location = (line_no, column_no)
            index[word].append(location)

for word in sorted(index, key=str.upper):
    print(word, index[word])

`missing` method

if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.