In my design framework business, dictionary categories, departmental agencies, system menus and so on these tables, there are id, pid fields, mainly as a self-referential relationship, the realization of the tree list of data processing, because this can achieve unlimited levels of the tree list. In the actual use of Pydantic and SqlAlchemy to directly deal with nested relationships, there is always data in the Pydantic object conversion validation, climb the pit for a period of time to find out that it is the use of the model definition of the problem, this accompanying introduction to the use of Pydantic and SqlAlchemy to achieve a tree list of data (self-referential table relationship) processing, and recursive way to deal with data discrepancies. and recursive way to deal with data differences.
1, the use of Pydantic and SqlAlchemy to realize the tree list data (self-referential table relationship) processing
The model definition of the default organization table for sqlalchemy is shown below.
class Ou(Base): """Institutional (departmental) information-table model""" __tablename__ = "t_acl_ou" id = Column(Integer, primary_key=True, comment="primary key", autoincrement=True) pid = Column(Integer, ForeignKey("t_acl_ou.id"), comment="Parent organization ID")
**** additional information
# Define parent relationship parent = relationship( "Ou", remote_side=[id], back_populates="children") # Define children relationship children = relationship("Ou", back_populates="parent")
The corresponding DTO (Schema) data class is then defined as follows.
class OuDto(BaseModel): id: Optional[int] = None pid: Optional[int] = None
***Other information class OuNodeDto(OuDto): """Sectoral Institutional Nodal Objects""" children: Optional[List["OuNodeDto"]] = None # Here we use Optional class Config: orm_mode = True # Enable orm_mod from_attributes = True extra = "allow"
Then I defined a get_children function inside the organization's Crud class as follows
async def get_children(self, db: AsyncSession, id: int) -> Ou: """ Get the sublist :param db. :param id. :return. """ result = await ( select(Ou).options(selectinload()).where( == id) ) result = result.scalar_one_or_none() return result
The selectinload loading method allows you to retrieve the data and get the sublists processed at the same time.
In order to verify that the data can be properly retrieved from the CRUD, I tested the interface of the CRUD class and queried to obtain the collection of children, the code is as follows
async def test_list_ou(): async with async_session() as db: ou = await ou_crud.get_children(db, "3") print(vars(ou)) for o in : print(vars(o)) await ()
One of the organization ids is 3, which is the Guangzhou branch, and it reads out all the organizations under that company normally, so there is no problem at the bottom.
But using FastAPI's interface processing, the data conversion through pydantic just can't be obtained properly, the following is FastAPI's routing interface implementation.
@( "/get-children", response_model=AjaxResponse[OuNodeDto | None], summary="Get customers by name", dependencies=[DependsJwtAuth], ) async def get_children( id: Annotated[int | None, Query()] = None, db: AsyncSession = Depends(get_db), ): ou = await ou_crud.get_children(db, id) try: result = OuNodeDto.model_validate(ou) except Exception as e: print(()) return AjaxResponse( success=False, result=None, errorInfo=ErrorInfo(message=str(e)) ) return AjaxResponse(result)
Note here that I used OuNodeDto.model_validate(ou) to transform the nested list object, and this is where the error occurs.
We can debug the error messages in the Swagger interface.
I double-checked that the Model Model and Schema objects are one-to-one, and that the error is not a field name issue, so it may be a data processing issue, but it's hard to get more useful information out of it. The error message obtained in Chatgpt is shown below.
Why is it that my nested lists can be queried, but when converted via Schema's model_validate, it prompts validation errors?
The issue you are experiencing may be due to inconsistencies or problems encountered during Pydantic's model validation process related to nested data structures. Here are some common causes and their solutions:
1)mismatch of nested structures
Ensure that your nested data structure is consistent with the Pydantic model definition. If your model expects a nested list, but the database query returns a result with some fields missing or a type mismatch, it may cause validation to fail.
cure:- Check the data structure queried from the database to make sure it agrees with the nested definitions in the Pydantic model.
2)Forward Reference (Forward References
)
If your Pydantic model has self-references or cross-references, make sure you are properly using theupdate_forward_refs()
. If forward references are not handled correctly in nested references, validation may fail.
cure:
Make sure you call the Pydantic model definition after the
, especially when the model contains recursive references.update_forward_refs(),New version of pydantic using model_rebuild
from_attributes
options (as in computer software settings)
If your data comes from ORM objects (or any non-dictionary objects) and you're using Pydantic v2, try using thefrom_attributes
Properties.
cure:
- exist
Config
enablefrom_attributes
option to ensure that the Pydantic model can extract data from attributes.
4) Problems with the type of data returned by the database
Ensure that the type of data returned from the database (especially thechildren
field) is the type you expect. If thechildren
Returning anything other than a list or an object containing children will cause the validation to fail.
cure:
Check your query logic to make sure thatchildren
field returns a list and each item in the list conforms to the OuNodeDto model.
5) Failure to properly handle relationship loading
When using SQLAlchemy, make sure that relevant data (such as thechildren
) has been loaded correctly. If an attempt is made to access an unloaded relationship, it may throw theStatementError
。
result = await ( select(Ou).options(selectinload()).where( == id) )
I finally realized that I had done all of this, and my pydantic model definition is shown below, and I still get errors.
class OuNodeDto(OuDto): """Sectoral Institutional Nodal Objects""" children: Optional[List["OuNodeDto"]] = None # Here we use Optional class Config: orm_mode = True # Enable orm_mod from_attributes = True extra = "allow" # Updating forward references OuNodeDto.model_rebuild(force=True)
If the object returned by the record is normal, but is not used when using theOuNodeDto.model_validate(ou)
A possible problem with errors in conversion is that the definition of the Pydantic model or the structure of the object does not exactly match the format expected by the Pydantic model.
Finally, it turns out that it'srelationship
(used form a nominal expression)lazy
The effect of the loading strategy of the parameter。
lazy
Overview of loading strategies
In SQLAlchemy.relationship
(used form a nominal expression)lazy
Parameters determine how and when the associated objects are loaded. The commonlazy
The loading strategy has:
-
lazy="immediate"
:- define: This strategy loads the associated child object as soon as the parent object is loaded. That is, the parent object and its children are loaded in the same query.
-
Why it works.: because
lazy="immediate"
will immediately load all relevant objects when you use Pydantic'smodel_validate
When data validation is performed, the relevant objects are already loaded and available for access.
-
lazy="select"
:- define: This strategy loads the relevant objects via a separate query when you access the relationship. That is, SQLAlchemy will only launch an additional query to get the child objects when you access them.
-
Why it may not work: Since subobjects are loaded on access, they are not loaded until you make the
model_validate
At the time of validation, it is possible that the child object has not been loaded yet, causing the validation to fail.
-
lazy="dynamic"
:- define: This strategy returns a query object instead of an actual child object. You need to explicitly execute this query to get the relevant subobjects.
-
Why it may not work:
lazy="dynamic"
The query object is returned instead of the actual object instance. Therefore, Pydantic'smodel_validate
It is not possible to process these query objects directly; you must first execute the query to get the actual object.
utilizationlazy="immediate"
class Ou(Base): __tablename__ = "t_acl_ou" id = Column(Integer, primary_key=True) pid = Column(Integer, ForeignKey("t_acl_ou.id")) name = Column(String) parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="immediate") children = relationship("Ou", back_populates="parent", lazy="immediate")
In this case, when you query aOu
object when thechildren
has been loaded immediately and can be used directly in Pydantic'smodel_validate
。
utilizationlazy="select"
class Ou(Base): __tablename__ = "t_acl_ou" id = Column(Integer, primary_key=True) pid = Column(Integer, ForeignKey("t_acl_ou.id")) name = Column(String) parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="select") children = relationship("Ou", back_populates="parent", lazy="select")
Under this strategy, thechildren
will only be loaded when accessed, which can lead to a problem when doing themodel_validate
verificationchildren
It has not yet finished loading.
utilizationlazy="dynamic"
class Ou(Base): __tablename__ = "t_acl_ou" id = Column(Integer, primary_key=True) pid = Column(Integer, ForeignKey("t_acl_ou.id")) name = Column(String) parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="dynamic") children = relationship("Ou", back_populates="parent", lazy="dynamic")
lazy="dynamic"
What is returned is a query object, not the actualchildren
objects. In order to validate with Pydantic, you must first execute this query to get the actual object.
lazy
Loading Strategy Summary
-
lazy="immediate"
: Immediately loads the relevant object so that it is in Pydantic'smodel_validate
Available in. -
lazy="select"
: The relevant objects need to be loaded at the time of access and may not have been loaded at the time of validation. -
lazy="dynamic"
: return the query object, requiring an additional query step.model_validate
Cannot be dealt with directly.
In SQLAlchemy.lazy
The schema is part of the model definition and determines how related objects are loaded.
In order to ensuremodel_validate
For proper operation, it is usually recommended to uselazy="immediate"
to ensure that all relevant data has been fully loaded at the time of validation.
Last modified.lazy="immediate"
After that, it worked fine and the conversion of the list went smoothly.
2, recursive way to deal with the tree list data
To ensure that Pydantic'smodel_validate
Can handle nested relationships correctly, recommendedlazy="immediate"
Policy. This ensures that when Pydantic performs data validation, all relevant data has been fully loaded.
What's the best way to handle the tree list other than doing this to it?
Common methods for working with tree lists (or tree structures) include the following, in addition to directly using SQLAlchemy'srelationship
cap (a poem)lazy="immediate"
In addition to loading strategies, a number of other techniques and tools can be employed to process and manipulate tree data structures.
The following describes the processing of list data using recursive search.In Python, a common way to recursively process a tree list is to use a recursive function to traverse the tree structure.
Suppose you have a department (Ou
) model where each department can have multiple sub-departments (tree structure). You want to use a recursive method to traverse and populate each node in the tree structure.
via Pydantic'sBaseModel
classes for traversal filling, you can combine recursion and Pydantic's model to work with tree-structured data. Here is a detailed example showing how to use Pydantic'sBaseModel
class to recursively traverse and populate the tree structure.
First, a Pydantic model is defined to represent the nodes in a tree structure. To handle nested child nodes, recursive type annotations can be used in the model.
from typing import List, Optional from pydantic import BaseModel class OuNodeDto(BaseModel): id: int name: str children: Optional[List['OuNodeDto']] = None # Recursive type annotations class Config: orm_mode = True
Suppose we have a nested set of dictionary data representing a tree structure:
ou_data = [ { "id": 1, "name": "Root Department", "children": [ { "id": 2, "name": "Child Department 1", "children": [ {"id": 4, "name": "Grandchild Department 1"}, {"id": 5, "name": "Grandchild Department 2"} ] }, {"id": 3, "name": "Child Department 2"} ] } ]
via Pydantic'sBaseModel
, you can just do a recursive fill. Supposeou_data
is a list of dictionaries obtained from a database or other external source, which you can construct recursively with theOuNodeDto
Example.
def build_tree(data: List[dict]) -> List[OuNodeDto]: """ Recursively traverse and build the Pydantic model tree. :param data: A list of dictionaries containing the tree structure. :return: List of populated Pydantic models. """ tree = [] for node_data in data: # Handling child node recursive padding children = build_tree(node_data.get("children", [])) # Model Validation and Object Creation with Pydantic node = OuNodeDto( id=node_data["id"], name=node_data["name"], children=children if children else None ) (node) return tree
Use the abovebuild_tree
function, you can recursively traverse the data and use the Pydantic model to build the entire tree structure.
ou_tree = build_tree(ou_data) # Pydantic model of the output tree structure for node in ou_tree: print(node)
After running the above code, the output will be a tree-structured list of Pydantic models, populated with all the data.
OuNodeDto(id=1, name='Root Department', children=[ OuNodeDto(id=2, name='Child Department 1', children=[ OuNodeDto(id=4, name='Grandchild Department 1', children=None), OuNodeDto(id=5, name='Grandchild Department 2', children=None) ]), OuNodeDto(id=3, name='Child Department 2', children=None) ])
In this way, you can use Pydantic'sBaseModel
Combined with recursive functions to handle data filling and validation of tree structures, Pydantic provides powerful data validation and parsing capabilities, making it easier to handle complex nested structures.
How should I recursively traverse the model classes of SqlAlchemy if they are obtained by retrieving them from the database?
To retrieve and recursively traverse SQLAlchemy's model classes from the database and then convert them to Pydantic's model classes, you can follow these steps. Assuming you have defined the SQLAlchemy model class and the corresponding Pydantic model class, the next section shows how to recursively traverse and populate this data.
First, we define a SQLAlchemy model that contains self-referential relationships.
from sqlalchemy import Column, Integer, String, ForeignKey from import relationship, declarative_base Base = declarative_base() class Ou(Base): __tablename__ = "t_acl_ou" id = Column(Integer, primary_key=True, autoincrement=True) pid = Column(Integer, ForeignKey("t_acl_ou.id")) name = Column(String) # self-reference relation parent = relationship("Ou", remote_side=[id], back_populates="children") children = relationship("Ou", back_populates="parent")
Then define a corresponding Pydantic model that supports nested child nodes.
from typing import List, Optional from pydantic import BaseModel class OuNodeDto(BaseModel): id: int name: str children: Optional[List['OuNodeDto']] = None # Recursive type annotations class Config: orm_mode = True # Allow conversion from ORM objects
Use SQLAlchemy's queries to get organizational structure data from the database. To handle nested relationships, you can use theselectinload
or other similar loading strategies to preload child nodes.
from import select from import selectinload, sessionmaker from import AsyncSession, create_async_engine # Assuming an asynchronous engine is used engine = create_async_engine('sqlite+aiosqlite:///./') Session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False) async def get_ou_tree_by_id(ou_id: int): async with Session() as session: # Preloading child nodes with selectinload result = await ( select(Ou).options(selectinload()).filter_by(id=ou_id) ) ou = ().first() return ou
Write a recursive function that converts SQLAlchemy's ORM objects to Pydantic model objects.
def build_tree(node: Ou) -> OuNodeDto: """ Recursively traverse the SQLAlchemy model and convert it to a Pydantic model. :param node: SQLAlchemy model instance :return: Populated Pydantic model instance """ # Validating and Creating Objects with Pydantic Models node_dto = OuNodeDto( id=, name=, # Recursive filling of child nodes children=[build_tree(child) for child in ] if else None ) return node_dto
Combine the above steps to get the data from the database and build the tree structure.
import asyncio async def main(): # Get the organization tree for the specified ID ou = await get_ou_tree_by_id(1) # Recursive Construction of Tree-Structured Pydantic Models ou_tree = build_tree(ou) # Output tree structure print(ou_tree) # Running an asynchronous task (main())
After executing the above code, theou_tree
will be a Pydantic model object containing a tree of organizational structures, and all nested child nodes will be recursively transformed and populated.
Which performs better, using Pydantic's model_validate processing and relationships loading strategy lazy=immediate or model class filling via recursive SqlAlchemy?
In comparing the use of Pydantic'smodel_validate
The main points of consideration when comparing the performance of manually populating SQLAlchemy model classes via recursion are the efficiency of the data loading strategy and the way the models are transformed. Both are analyzed in detail below:
1) Pydanticmodel_validate
+ lazy="immediate"
-
Working Principle:
- utilization
lazy="immediate"
Associated child objects are loaded as soon as SQLAlchemy instantiates the object. This means that the child objects are already fully loaded when the main object is accessed, and no additional database queries need to be initiated. - Using Pydantic's
model_validate
It is possible to convert SQLAlchemy's ORM objects directly into Pydantic models.
- utilization
-
Performance considerations:
-
Data loading:
lazy="immediate"
will automatically load the relevant data when the object is accessed, which is suitable for scenarios where the full data structure needs to be accessed immediately, but it can lead to unnecessary loading, especially if not all the child objects are needed. -
conversion overhead:
model_validate
is a single-step operation that automatically handles conversions of complex nested objects. Although convenient, its overhead depends on the complexity of the objects and the depth of nesting. For a large number of deeply nested objects, the conversion may take longer.
-
Data loading:
-
vantage:
- Simplicity: reduces the need to write recursive code manually.
- Automation: the ability to automate the conversion of nested relationships.
-
drawbacks:
- May result in unnecessary data loading.
- Performance may be degraded for massively nested objects.
2)Recursively manually populating SQLAlchemy model classes
-
Working Principle:
- Manually write recursive functions to traverse SQLAlchemy's ORM objects and convert them to Pydantic models.
- Flexibility to choose when to load subobjects (via the
selectinload
,joinedload
etc.), thus reloading the data when needed.
-
Performance considerations:
-
Data loading: Manually controlling the data loading strategy optimizes query performance and avoids unnecessary data loading. Loading data only when it is needed avoids the
lazy="immediate"
Bring on the overloading. - conversion overhead: Manual recursion usually deals only with the current node and its direct children and is relatively efficient. You can reduce unnecessary object creation and conversion by optimizing the recursion logic.
-
Data loading: Manually controlling the data loading strategy optimizes query performance and avoids unnecessary data loading. Loading data only when it is needed avoids the
-
vantage:
- Performance optimization: You can optimize the loading strategy and recursive processing as needed to avoid unnecessary calculations and data loading.
- Flexibility: more flexible control over object transformations and data access.
-
drawbacks:
- Recursive logic needs to be written manually, adding complexity to the code.
- not as good as
model_validate
Automated and error prone.
3)Summary and Options
-
Lesser data size and complexity:
model_validate
+lazy="immediate"
More convenient and with little performance impact, automated Pydantic model transformations can be quickly implemented. -
When data size is large and nesting depth is high: It may be more efficient to fill manually and recursively, especially if you need to finely control the data loading strategy. This avoids overloading and optimizes performance.
reach a verdict: If your application scenario has high performance requirements and more complex data structures, manual recursion may be preferable. If you prioritize code simplicity and development efficiency, and the data size is not large, thenmodel_validate
become man and wifelazy="immediate"
is the better choice.