参考:
Understanding Python Dataclasses
docs.python.org
Python dataclass
heapq — Heap queue algorithm

DataClasses从python3.7开始加入,是一种用来高效存储数据的工具,本文介绍以下内容
(1)dataclass的基本定义和功能
(2)dataclass+优先队列的排序
(3)dataclass的字段设置

1.dataclasses的dataclass(定义数据类)

1.1 dataclass🆚class

dataclass与python中正常的class相似,但是提供了实例化(instantiation),比较(comparing)和输出(printing)的基本功能,dataclass的语法记录如下

dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
 - init:如果为true,__init__()方法将会生成
 - repr:如果为tru,__repr__()方法将会生成
 - eq:如果为true,__eq__()方法将会生成
 - order:如果为true,__lt__(),__le__(),__gt__(),__ge__()方法将会生成
 - unsafe_hash:如果为false,__hash__()根据eq和frozen的设置方式生成
 - frozen:如果为false, 赋值字段将产生异常

首先看下正常的class的实例化,比较和输出

class Employee:
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city

    def __repr__(self):
        return f'employee name:{self.name}, age:{self.age}, city:{self.city}'

    def __eq__(self, other):
        return (self.name, self.age, self.city) == (other.name, other.age, other.city)


e1 = Employee('zoey', 18, 'patna')
e2 = Employee('mike', 20, 'delhi')
e3 = Employee('zoey', 18, 'patna')

print('employee information:')
print(e1)
print(e2)
print(f'e1 and e3 same? {e1 == e3}')
print(f'e1 and e2 same? {e1 == e2}')
employee information:
employee name:zoey, age:18, city:patna
employee name:mike, age:20, city:delhi
e1 and e3 same? True
e1 and e2 same? False

__init__方法用于实例化对象,__repr__方法用于输出对象内容,__eq__用于比较对象内容是否相等。这些方法的使用最大问题是每次都要复制属性并返回对象,在处理少量数据时还能接受,但是大量数据就会变得复杂,dataclass就是为了解决这些问题

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    city: str

e1 = Employee('zoey', 18, 'patna')
e2 = Employee('mike', 20, 'delhi')
e3 = Employee('zoey', 18, 'patna')

print('employee information:')
print(e1)
print(e2)
print(f'e1 and e3 same? {e1 == e3}')
print(f'e1 and e2 same? {e1 == e2}')
employee information:
Employee(name='zoey', age=18, city='patna')
Employee(name='mike', age=20, city='delhi')
e1 and e3 same? True
e1 and e2 same? False

同样的内容,dataclass不需要再重新写__init____repr____eq__

1.2 创建不可变数据对象

通常情况下数据类的实例可以再修改字段值,如果想要这个数据对象不可变,可以设置frozen=True,此处修改字段值会报错

@dataclass(frozen=True)
class Employee:
    name: str
    age: int
    city: str


e1 = Employee('zoey', 18, 'patna')
e1.name = 'mike'
dataclasses.FrozenInstanceError: cannot assign to field 'name'

1.3 dataclass继承

dataclass和正常类一样可以继承父类的所有属性

@dataclass(unsafe_hash=True)
class Staff:
    name: str
    age: int
    city: str

@dataclass
class Employee(Staff):
    salary: int

e1 = Employee('zoey', 18, 'patna', 20000)
print(e1)
Employee(name='zoey', age=18, city='patna', salary=20000)

1.4 自定义初始化

如果有一些字段的初始化需要依赖其它字段的值,可以使用__post_init__方法,同时使用field设置这个字段的init=False,field的更多介绍见后面内容。

@dataclass
class Employee:
    name: str
    age: int
    city: str
    adult: bool = field(init=False)

    def __post_init__(self):
        self.adult = 18  self.age  70

e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', adult=True)

基于age字段来判断adult字段,但是如果实例化后,修改对象的age,adult是不会随之更新的。

e1 = Employee('zoey', 18, 'patna')
print(e1)
e1.age = 8
print(e1)
Employee(name='zoey', age=18, city='patna', adult=True)
Employee(name='zoey', age=8, city='patna', adult=True)

age修改为8,adult依然为True

1.5 数据对象自定义排序

python中的富比较方法如下,对各种对象都适用

  • object.__lt__(self, other):x
  • object.__le__(self, other):x
  • object.__eq__(self, other):x==y
  • object.__ne__(self, other):x!=y
  • object.__gt__(self, other):x>y
  • object.__ge__(self, other):x>=y

如果想给数据对象进行排序,可以结合优先队列实现,优先队列有两种实现queue.PriorityQueueheapqqueue.PriorityQueue也是基于heapq实现,heapq提供了堆排序算法的实现,本身heapq是不支持自定义比较函数,但是可以通过重写数据类的__lt__(self, other)函数来实现自定义,__lt__(self, other)对应到

from dataclasses import dataclass, field
from queue import PriorityQueue

@dataclass
class Employee:
    name: str = field(compare=False)
    age: int
    city: str = field(compare=False)
    work: int

    def __lt__(self, other):
        if self.age  other.age:
            return True
        elif self.work > other.work:
            return True

e1 = Employee('zoey', 18, 'patna', 20)
e2 = Employee('joe', 19, 'patna', 21)
e3 = Employee('mike', 19, 'deli', 20)
e4 = Employee('judy', 17, 'india', 22)

q = PriorityQueue()
q.put(e1)
q.put(e2)
q.put(e3)
q.put(e4)

while not q.empty():
    next_item = q.get()
    print(next_item)
    print('n')
Employee(name='judy', age=17, city='india', work=22)
Employee(name='zoey', age=18, city='patna', work=20)
Employee(name='joe', age=19, city='patna', work=21)
Employee(name='mike', age=19, city='deli', work=20)

通过重写数据类的__lt__(self, other)函数,设置age越小越有限,work越大越优先,注意的是,__lt__是self.work > other.work,这样才能work大的排在前面。如果要自定义比较函数,不能设置order=True,这和后面介绍的field的compare字段不一样

2.dataclasses的astuple和asdict(数据类变成元组和字典)

dataclasses模块还提供了astuple()asdict()功能,能将dataclass实例变成元组和字典

from dataclasses import dataclass, astuple, asdict


@dataclass(unsafe_hash=True)
class Employee:
    name: str
    age: int
    city: str

e1 = Employee('zoey', 18, 'patna')

print(astuple(e1))
print(asdict(e1))
('zoey', 18, 'patna')
{'name': 'zoey', 'age': 18, 'city': 'patna'}

3.dataclasses的fields(数据类字段设置)

dataclasses.field()对象描述dataclass中每个已定义的字段

dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)

(1)参数1:default,指定该字段的默认值

from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    age: int
    city: str
    work: str = field(default='china')

e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')

work字段默认是china

(2)参数2:default_factory,字段接收一个函数,返回这个字段的初始值,要求函数无参数

from dataclasses import dataclass, field

def get_work():
    return 'china'

@dataclass
class Employee:
    name: str
    age: int
    city: str = field(default='patna')
    work: str = field(default_factory=get_work)

e1 = Employee('zoey', 18)
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')

work字段接收函数get_work,返回china

(3)参数3:init,如果为true,该字段将作为生成的__init__()方法的参数包含

from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    age: int
    city: str
    work: str = field(init=False, default='china')

e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')

work字段的init=False,初始化生成e1时不能传入这个参数,否则会报错;

work: str = field(init=True, default='china')

e1 = Employee('zoey', 18, 'patna', 'korea')
print(e1)
Employee(name='zoey', age=18, city='patna', work='korea')

如果init=True,那么可以输入这个参数,并且保留这个参数的值

(4)参数4:repr,如果为true,该字段将作为生成的__repr__()方法的参数

class Employee:
    name: str
    age: int
    city: str
    work: str = field(init=False, default='china', repr=False)

e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna')

work字段的repr=False,输出e1时没有显示work=‘china’

work: str = field(init=False, default='china', repr=True)

e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')

如果work字段的repr=True,输出e1后会显示work=‘china’

(5)参数5:compare,如果为true,字段会作为生成的富比较方法参数

首先要设置order=True,然后设置compare值来设置数据对象的字段是否参与比较,compare是默认为True

@dataclass(order=True)
class Employee:
    name: str = field(compare=False)
    age: int = field(compare=False)
    city: str = field(compare=False)
    work: int

e2 = Employee('joe', 19, 'patna', 21)
e4 = Employee('judy', 17, 'india', 22)
print(e2  e4)
True

只比较work字段的大小,如果要自定义多个属性,参考在1.5节数据对象排序

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。