文章目录
参考:
Understanding Python Dataclasses
docs.python.org
Python dataclass
heapq — Heap queue algorithmDataClasses从python3.7开始加入,是一种用来高效存储数据的工具,本文介绍以下内容
(1)dataclass的基本定义和功能
(2)dataclass+优先队列的排序
(3)dataclass的字段设置
1.dataclasses的dataclass(定义数据类)
1.1 dataclass🆚class
dataclass与python中正常的class相似,但是提供了实例化(instantiation),比较(comparing)和输出(printing)的基本功能,dataclass的语法记录如下
dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
- init:如果为true,__init__()方法将会生成
- repr:如果为tru,__repr__()方法将会生成
- eq:如果为true,__eq__()方法将会生成
- order:如果为true,__lt__(),__le__(),__gt__(),__ge__()方法将会生成
- unsafe_hash:如果为false,__hash__()根据eq和frozen的设置方式生成
- frozen:如果为false, 赋值字段将产生异常
首先看下正常的class的实例化,比较和输出
class Employee:
def __init__(self, name, age, city):
self.name = name
self.age = age
self.city = city
def __repr__(self):
return f'employee name:{self.name}, age:{self.age}, city:{self.city}'
def __eq__(self, other):
return (self.name, self.age, self.city) == (other.name, other.age, other.city)
e1 = Employee('zoey', 18, 'patna')
e2 = Employee('mike', 20, 'delhi')
e3 = Employee('zoey', 18, 'patna')
print('employee information:')
print(e1)
print(e2)
print(f'e1 and e3 same? {e1 == e3}')
print(f'e1 and e2 same? {e1 == e2}')
employee information:
employee name:zoey, age:18, city:patna
employee name:mike, age:20, city:delhi
e1 and e3 same? True
e1 and e2 same? False
__init__
方法用于实例化对象,__repr__
方法用于输出对象内容,__eq__
用于比较对象内容是否相等。这些方法的使用最大问题是每次都要复制属性并返回对象,在处理少量数据时还能接受,但是大量数据就会变得复杂,dataclass就是为了解决这些问题
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
city: str
e1 = Employee('zoey', 18, 'patna')
e2 = Employee('mike', 20, 'delhi')
e3 = Employee('zoey', 18, 'patna')
print('employee information:')
print(e1)
print(e2)
print(f'e1 and e3 same? {e1 == e3}')
print(f'e1 and e2 same? {e1 == e2}')
employee information:
Employee(name='zoey', age=18, city='patna')
Employee(name='mike', age=20, city='delhi')
e1 and e3 same? True
e1 and e2 same? False
同样的内容,dataclass不需要再重新写__init__
,__repr__
和__eq__
1.2 创建不可变数据对象
通常情况下数据类的实例可以再修改字段值,如果想要这个数据对象不可变,可以设置frozen=True
,此处修改字段值会报错
@dataclass(frozen=True)
class Employee:
name: str
age: int
city: str
e1 = Employee('zoey', 18, 'patna')
e1.name = 'mike'
dataclasses.FrozenInstanceError: cannot assign to field 'name'
1.3 dataclass继承
dataclass和正常类一样可以继承父类的所有属性
@dataclass(unsafe_hash=True)
class Staff:
name: str
age: int
city: str
@dataclass
class Employee(Staff):
salary: int
e1 = Employee('zoey', 18, 'patna', 20000)
print(e1)
Employee(name='zoey', age=18, city='patna', salary=20000)
1.4 自定义初始化
如果有一些字段的初始化需要依赖其它字段的值,可以使用__post_init__
方法,同时使用field设置这个字段的init=False
,field的更多介绍见后面内容。
@dataclass
class Employee:
name: str
age: int
city: str
adult: bool = field(init=False)
def __post_init__(self):
self.adult = 18 self.age 70
e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', adult=True)
基于age字段来判断adult字段,但是如果实例化后,修改对象的age,adult是不会随之更新的。
e1 = Employee('zoey', 18, 'patna')
print(e1)
e1.age = 8
print(e1)
Employee(name='zoey', age=18, city='patna', adult=True)
Employee(name='zoey', age=8, city='patna', adult=True)
age修改为8,adult依然为True
1.5 数据对象自定义排序
python中的富比较方法如下,对各种对象都适用
object.__lt__(self, other)
:xobject.__le__(self, other)
:xobject.__eq__(self, other)
:x==yobject.__ne__(self, other)
:x!=yobject.__gt__(self, other)
:x>yobject.__ge__(self, other)
:x>=y
如果想给数据对象进行排序,可以结合优先队列实现,优先队列有两种实现queue.PriorityQueue
和heapq
,queue.PriorityQueue
也是基于heapq
实现,heapq
提供了堆排序算法的实现,本身heapq
是不支持自定义比较函数,但是可以通过重写数据类的__lt__(self, other)
函数来实现自定义,__lt__(self, other)
对应到
from dataclasses import dataclass, field
from queue import PriorityQueue
@dataclass
class Employee:
name: str = field(compare=False)
age: int
city: str = field(compare=False)
work: int
def __lt__(self, other):
if self.age other.age:
return True
elif self.work > other.work:
return True
e1 = Employee('zoey', 18, 'patna', 20)
e2 = Employee('joe', 19, 'patna', 21)
e3 = Employee('mike', 19, 'deli', 20)
e4 = Employee('judy', 17, 'india', 22)
q = PriorityQueue()
q.put(e1)
q.put(e2)
q.put(e3)
q.put(e4)
while not q.empty():
next_item = q.get()
print(next_item)
print('n')
Employee(name='judy', age=17, city='india', work=22)
Employee(name='zoey', age=18, city='patna', work=20)
Employee(name='joe', age=19, city='patna', work=21)
Employee(name='mike', age=19, city='deli', work=20)
通过重写数据类的__lt__(self, other)
函数,设置age越小越有限,work越大越优先,注意的是,__lt__
是self.work > other.work,这样才能work大的排在前面。如果要自定义比较函数,不能设置order=True
,这和后面介绍的field的compare
字段不一样
2.dataclasses的astuple和asdict(数据类变成元组和字典)
dataclasses模块还提供了astuple()
和asdict()
功能,能将dataclass实例变成元组和字典
from dataclasses import dataclass, astuple, asdict
@dataclass(unsafe_hash=True)
class Employee:
name: str
age: int
city: str
e1 = Employee('zoey', 18, 'patna')
print(astuple(e1))
print(asdict(e1))
('zoey', 18, 'patna')
{'name': 'zoey', 'age': 18, 'city': 'patna'}
3.dataclasses的fields(数据类字段设置)
dataclasses.field()
对象描述dataclass中每个已定义的字段
dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)
(1)参数1:default
,指定该字段的默认值
from dataclasses import dataclass, field
@dataclass
class Employee:
name: str
age: int
city: str
work: str = field(default='china')
e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')
work字段默认是china
(2)参数2:default_factory
,字段接收一个函数,返回这个字段的初始值,要求函数无参数
from dataclasses import dataclass, field
def get_work():
return 'china'
@dataclass
class Employee:
name: str
age: int
city: str = field(default='patna')
work: str = field(default_factory=get_work)
e1 = Employee('zoey', 18)
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')
work字段接收函数get_work
,返回china
(3)参数3:init
,如果为true,该字段将作为生成的__init__()
方法的参数包含
from dataclasses import dataclass, field
@dataclass
class Employee:
name: str
age: int
city: str
work: str = field(init=False, default='china')
e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')
work字段的init=False
,初始化生成e1时不能传入这个参数,否则会报错;
work: str = field(init=True, default='china')
e1 = Employee('zoey', 18, 'patna', 'korea')
print(e1)
Employee(name='zoey', age=18, city='patna', work='korea')
如果init=True
,那么可以输入这个参数,并且保留这个参数的值
(4)参数4:repr
,如果为true,该字段将作为生成的__repr__()
方法的参数
class Employee:
name: str
age: int
city: str
work: str = field(init=False, default='china', repr=False)
e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna')
work字段的repr=False
,输出e1时没有显示work=‘china’
work: str = field(init=False, default='china', repr=True)
e1 = Employee('zoey', 18, 'patna')
print(e1)
Employee(name='zoey', age=18, city='patna', work='china')
如果work字段的repr=True
,输出e1后会显示work=‘china’
(5)参数5:compare
,如果为true,字段会作为生成的富比较方法参数
首先要设置order=True
,然后设置compare
值来设置数据对象的字段是否参与比较,compare
是默认为True
@dataclass(order=True)
class Employee:
name: str = field(compare=False)
age: int = field(compare=False)
city: str = field(compare=False)
work: int
e2 = Employee('joe', 19, 'patna', 21)
e4 = Employee('judy', 17, 'india', 22)
print(e2 e4)
True
只比较work字段的大小,如果要自定义多个属性,参考在1.5节数据对象排序
评论(0)