Location>code7788 >text

Using Pydantic and SqlAlchemy to implement tree-list data (self-referencing table relationships) and recursive ways to handle data discrepancies

Popularity:341 ℃/2024-10-15 19:39:01

In my design framework business, dictionary categories, departmental agencies, system menus and so on these tables, there are id, pid fields, mainly as a self-referential relationship, the realization of the tree list of data processing, because this can achieve unlimited levels of the tree list. In the actual use of Pydantic and SqlAlchemy to directly deal with nested relationships, there is always data in the Pydantic object conversion validation, climb the pit for a period of time to find out that it is the use of the model definition of the problem, this accompanying introduction to the use of Pydantic and SqlAlchemy to achieve a tree list of data (self-referential table relationship) processing, and recursive way to deal with data discrepancies. and recursive way to deal with data differences.

1, the use of Pydantic and SqlAlchemy to realize the tree list data (self-referential table relationship) processing

The model definition of the default organization table for sqlalchemy is shown below.

class Ou(Base):
    """Institutional (departmental) information-table model"""

    __tablename__ = "t_acl_ou"

    id = Column(Integer, primary_key=True, comment="primary key", autoincrement=True)
    pid = Column(Integer, ForeignKey("t_acl_ou.id"), comment="Parent organization ID")

**** additional information
# Define parent relationship parent = relationship( "Ou", remote_side=[id], back_populates="children") # Define children relationship children = relationship("Ou", back_populates="parent")

The corresponding DTO (Schema) data class is then defined as follows.

class OuDto(BaseModel):
    id: Optional[int] = None
    pid: Optional[int] = None
***Other information
class OuNodeDto(OuDto): """Sectoral Institutional Nodal Objects""" children: Optional[List["OuNodeDto"]] = None # Here we use Optional class Config: orm_mode = True # Enable orm_mod from_attributes = True extra = "allow"

Then I defined a get_children function inside the organization's Crud class as follows

    async def get_children(self, db: AsyncSession, id: int) -> Ou:
        """
        Get the sublist
        :param db.
        :param id.
        :return.
"""
        result = await (
            select(Ou).options(selectinload()).where( == id)
        )
        result = result.scalar_one_or_none()
        return result

The selectinload loading method allows you to retrieve the data and get the sublists processed at the same time.

In order to verify that the data can be properly retrieved from the CRUD, I tested the interface of the CRUD class and queried to obtain the collection of children, the code is as follows

async def test_list_ou():
    async with async_session() as db:
        ou = await ou_crud.get_children(db, "3")
        print(vars(ou))
        for o in :
            print(vars(o))

        await ()

One of the organization ids is 3, which is the Guangzhou branch, and it reads out all the organizations under that company normally, so there is no problem at the bottom.

But using FastAPI's interface processing, the data conversion through pydantic just can't be obtained properly, the following is FastAPI's routing interface implementation.

@(
    "/get-children",
    response_model=AjaxResponse[OuNodeDto | None],
    summary="Get customers by name",
    dependencies=[DependsJwtAuth],
)
async def get_children(
    id: Annotated[int | None, Query()] = None,
    db: AsyncSession = Depends(get_db),
):
    ou = await ou_crud.get_children(db, id)
    try:
        result = OuNodeDto.model_validate(ou)
    except Exception as e:
        print(())
        return AjaxResponse(
            success=False, result=None, errorInfo=ErrorInfo(message=str(e))
        )
    return AjaxResponse(result)

Note here that I used OuNodeDto.model_validate(ou) to transform the nested list object, and this is where the error occurs.

We can debug the error messages in the Swagger interface.

I double-checked that the Model Model and Schema objects are one-to-one, and that the error is not a field name issue, so it may be a data processing issue, but it's hard to get more useful information out of it. The error message obtained in Chatgpt is shown below.

Why is it that my nested lists can be queried, but when converted via Schema's model_validate, it prompts validation errors?

The issue you are experiencing may be due to inconsistencies or problems encountered during Pydantic's model validation process related to nested data structures. Here are some common causes and their solutions:

1)mismatch of nested structures

Ensure that your nested data structure is consistent with the Pydantic model definition. If your model expects a nested list, but the database query returns a result with some fields missing or a type mismatch, it may cause validation to fail.

cure:
  • Check the data structure queried from the database to make sure it agrees with the nested definitions in the Pydantic model.

2)Forward Reference (Forward References)

If your Pydantic model has self-references or cross-references, make sure you are properly using theupdate_forward_refs(). If forward references are not handled correctly in nested references, validation may fail.

cure:

Make sure you call the Pydantic model definition after theupdate_forward_refs(),New version of pydantic using model_rebuild, especially when the model contains recursive references.

3)utilizationfrom_attributes options (as in computer software settings)

If your data comes from ORM objects (or any non-dictionary objects) and you're using Pydantic v2, try using thefrom_attributes Properties.

cure:

  • existConfig enablefrom_attributes option to ensure that the Pydantic model can extract data from attributes.

4) Problems with the type of data returned by the database

Ensure that the type of data returned from the database (especially thechildren field) is the type you expect. If thechildren Returning anything other than a list or an object containing children will cause the validation to fail.

cure:

Check your query logic to make sure thatchildren field returns a list and each item in the list conforms to the OuNodeDto model.

5) Failure to properly handle relationship loading

When using SQLAlchemy, make sure that relevant data (such as thechildren) has been loaded correctly. If an attempt is made to access an unloaded relationship, it may throw theStatementError

result = await (
    select(Ou).options(selectinload()).where( == id)
)

 

I finally realized that I had done all of this, and my pydantic model definition is shown below, and I still get errors.

class OuNodeDto(OuDto):
    """Sectoral Institutional Nodal Objects"""

    children: Optional[List["OuNodeDto"]] = None  # Here we use Optional

    class Config:
        orm_mode = True  # Enable orm_mod
        from_attributes = True
        extra = "allow"


# Updating forward references
OuNodeDto.model_rebuild(force=True)

If the object returned by the record is normal, but is not used when using theOuNodeDto.model_validate(ou) A possible problem with errors in conversion is that the definition of the Pydantic model or the structure of the object does not exactly match the format expected by the Pydantic model.

Finally, it turns out that it'srelationship (used form a nominal expression)lazy The effect of the loading strategy of the parameter

lazy Overview of loading strategies

In SQLAlchemy.relationship (used form a nominal expression)lazy Parameters determine how and when the associated objects are loaded. The commonlazy The loading strategy has:

  1. lazy="immediate":

    • define: This strategy loads the associated child object as soon as the parent object is loaded. That is, the parent object and its children are loaded in the same query.
    • Why it works.: becauselazy="immediate" will immediately load all relevant objects when you use Pydantic'smodel_validate When data validation is performed, the relevant objects are already loaded and available for access.
  2. lazy="select":

    • define: This strategy loads the relevant objects via a separate query when you access the relationship. That is, SQLAlchemy will only launch an additional query to get the child objects when you access them.
    • Why it may not work: Since subobjects are loaded on access, they are not loaded until you make themodel_validate At the time of validation, it is possible that the child object has not been loaded yet, causing the validation to fail.
  3. lazy="dynamic":

    • define: This strategy returns a query object instead of an actual child object. You need to explicitly execute this query to get the relevant subobjects.
    • Why it may not worklazy="dynamic" The query object is returned instead of the actual object instance. Therefore, Pydantic'smodel_validate It is not possible to process these query objects directly; you must first execute the query to get the actual object.

utilizationlazy="immediate"

class Ou(Base):
    __tablename__ = "t_acl_ou"
    id = Column(Integer, primary_key=True)
    pid = Column(Integer, ForeignKey("t_acl_ou.id"))
    name = Column(String)
    parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="immediate")
    children = relationship("Ou", back_populates="parent", lazy="immediate")

In this case, when you query aOu object when thechildren has been loaded immediately and can be used directly in Pydantic'smodel_validate

utilizationlazy="select"

class Ou(Base):
    __tablename__ = "t_acl_ou"
    id = Column(Integer, primary_key=True)
    pid = Column(Integer, ForeignKey("t_acl_ou.id"))
    name = Column(String)
    parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="select")
    children = relationship("Ou", back_populates="parent", lazy="select")

Under this strategy, thechildren will only be loaded when accessed, which can lead to a problem when doing themodel_validate verificationchildren It has not yet finished loading.

utilizationlazy="dynamic"

class Ou(Base):
    __tablename__ = "t_acl_ou"
    id = Column(Integer, primary_key=True)
    pid = Column(Integer, ForeignKey("t_acl_ou.id"))
    name = Column(String)
    parent = relationship("Ou", remote_side=[id], back_populates="children", lazy="dynamic")
    children = relationship("Ou", back_populates="parent", lazy="dynamic")

lazy="dynamic" What is returned is a query object, not the actualchildren objects. In order to validate with Pydantic, you must first execute this query to get the actual object.

lazy Loading Strategy Summary

  • lazy="immediate": Immediately loads the relevant object so that it is in Pydantic'smodel_validate Available in.
  • lazy="select": The relevant objects need to be loaded at the time of access and may not have been loaded at the time of validation.
  • lazy="dynamic": return the query object, requiring an additional query step.model_validate Cannot be dealt with directly.

In SQLAlchemy.lazy The schema is part of the model definition and determines how related objects are loaded.

In order to ensuremodel_validate For proper operation, it is usually recommended to uselazy="immediate" to ensure that all relevant data has been fully loaded at the time of validation.

Last modified.lazy="immediate" After that, it worked fine and the conversion of the list went smoothly.

 

2, recursive way to deal with the tree list data

To ensure that Pydantic'smodel_validate Can handle nested relationships correctly, recommendedlazy="immediate" Policy. This ensures that when Pydantic performs data validation, all relevant data has been fully loaded.

What's the best way to handle the tree list other than doing this to it?

Common methods for working with tree lists (or tree structures) include the following, in addition to directly using SQLAlchemy'srelationship cap (a poem)lazy="immediate" In addition to loading strategies, a number of other techniques and tools can be employed to process and manipulate tree data structures.

The following describes the processing of list data using recursive search.In Python, a common way to recursively process a tree list is to use a recursive function to traverse the tree structure.

Suppose you have a department (Ou) model where each department can have multiple sub-departments (tree structure). You want to use a recursive method to traverse and populate each node in the tree structure.

via Pydantic'sBaseModel classes for traversal filling, you can combine recursion and Pydantic's model to work with tree-structured data. Here is a detailed example showing how to use Pydantic'sBaseModel class to recursively traverse and populate the tree structure.

First, a Pydantic model is defined to represent the nodes in a tree structure. To handle nested child nodes, recursive type annotations can be used in the model.

from typing import List, Optional
from pydantic import BaseModel

class OuNodeDto(BaseModel):
    id: int
    name: str
    children: Optional[List['OuNodeDto']] = None  # Recursive type annotations

    class Config:
        orm_mode = True

Suppose we have a nested set of dictionary data representing a tree structure:

ou_data = [
    {
        "id": 1,
        "name": "Root Department",
        "children": [
            {
                "id": 2,
                "name": "Child Department 1",
                "children": [
                    {"id": 4, "name": "Grandchild Department 1"},
                    {"id": 5, "name": "Grandchild Department 2"}
                ]
            },
            {"id": 3, "name": "Child Department 2"}
        ]
    }
]

via Pydantic'sBaseModel, you can just do a recursive fill. Supposeou_data is a list of dictionaries obtained from a database or other external source, which you can construct recursively with theOuNodeDto Example.

def build_tree(data: List[dict]) -> List[OuNodeDto]:
    """
    Recursively traverse and build the Pydantic model tree.

    :param data: A list of dictionaries containing the tree structure.
    :return: List of populated Pydantic models.
"""
    tree = []
    for node_data in data:
        # Handling child node recursive padding
        children = build_tree(node_data.get("children", []))
        
        # Model Validation and Object Creation with Pydantic
        node = OuNodeDto(
            id=node_data["id"],
            name=node_data["name"],
            children=children if children else None
        )
        (node)
    return tree

Use the abovebuild_tree function, you can recursively traverse the data and use the Pydantic model to build the entire tree structure.

ou_tree = build_tree(ou_data)

# Pydantic model of the output tree structure
for node in ou_tree:
    print(node)

After running the above code, the output will be a tree-structured list of Pydantic models, populated with all the data.

OuNodeDto(id=1, name='Root Department', children=[
    OuNodeDto(id=2, name='Child Department 1', children=[
        OuNodeDto(id=4, name='Grandchild Department 1', children=None),
        OuNodeDto(id=5, name='Grandchild Department 2', children=None)
    ]),
    OuNodeDto(id=3, name='Child Department 2', children=None)
])

In this way, you can use Pydantic'sBaseModel Combined with recursive functions to handle data filling and validation of tree structures, Pydantic provides powerful data validation and parsing capabilities, making it easier to handle complex nested structures.

 

How should I recursively traverse the model classes of SqlAlchemy if they are obtained by retrieving them from the database?

To retrieve and recursively traverse SQLAlchemy's model classes from the database and then convert them to Pydantic's model classes, you can follow these steps. Assuming you have defined the SQLAlchemy model class and the corresponding Pydantic model class, the next section shows how to recursively traverse and populate this data.

First, we define a SQLAlchemy model that contains self-referential relationships.

from sqlalchemy import Column, Integer, String, ForeignKey
from  import relationship, declarative_base

Base = declarative_base()

class Ou(Base):
    __tablename__ = "t_acl_ou"

    id = Column(Integer, primary_key=True, autoincrement=True)
    pid = Column(Integer, ForeignKey("t_acl_ou.id"))
    name = Column(String)

    # self-reference relation
    parent = relationship("Ou", remote_side=[id], back_populates="children")
    children = relationship("Ou", back_populates="parent")

Then define a corresponding Pydantic model that supports nested child nodes.

from typing import List, Optional
from pydantic import BaseModel

class OuNodeDto(BaseModel):
    id: int
    name: str
    children: Optional[List['OuNodeDto']] = None  # Recursive type annotations

    class Config:
        orm_mode = True  # Allow conversion from ORM objects

Use SQLAlchemy's queries to get organizational structure data from the database. To handle nested relationships, you can use theselectinload or other similar loading strategies to preload child nodes.

from  import select
from  import selectinload, sessionmaker
from  import AsyncSession, create_async_engine

# Assuming an asynchronous engine is used
engine = create_async_engine('sqlite+aiosqlite:///./')
Session = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)

async def get_ou_tree_by_id(ou_id: int):
    async with Session() as session:
        # Preloading child nodes with selectinload
        result = await (
            select(Ou).options(selectinload()).filter_by(id=ou_id)
        )
        ou = ().first()
        return ou

Write a recursive function that converts SQLAlchemy's ORM objects to Pydantic model objects.

def build_tree(node: Ou) -> OuNodeDto:
    """
    Recursively traverse the SQLAlchemy model and convert it to a Pydantic model.

    :param node: SQLAlchemy model instance
    :return: Populated Pydantic model instance
"""
    # Validating and Creating Objects with Pydantic Models
    node_dto = OuNodeDto(
        id=,
        name=,
        # Recursive filling of child nodes
        children=[build_tree(child) for child in ] if  else None
    )
    return node_dto

Combine the above steps to get the data from the database and build the tree structure.

import asyncio

async def main():
    # Get the organization tree for the specified ID
    ou = await get_ou_tree_by_id(1)
    
    # Recursive Construction of Tree-Structured Pydantic Models
    ou_tree = build_tree(ou)
    
    # Output tree structure
    print(ou_tree)

# Running an asynchronous task
(main())

After executing the above code, theou_tree will be a Pydantic model object containing a tree of organizational structures, and all nested child nodes will be recursively transformed and populated.

 

Which performs better, using Pydantic's model_validate processing and relationships loading strategy lazy=immediate or model class filling via recursive SqlAlchemy?

In comparing the use of Pydantic'smodel_validate The main points of consideration when comparing the performance of manually populating SQLAlchemy model classes via recursion are the efficiency of the data loading strategy and the way the models are transformed. Both are analyzed in detail below:

1) Pydanticmodel_validate + lazy="immediate"

  • Working Principle:

    • utilizationlazy="immediate" Associated child objects are loaded as soon as SQLAlchemy instantiates the object. This means that the child objects are already fully loaded when the main object is accessed, and no additional database queries need to be initiated.
    • Using Pydantic'smodel_validate It is possible to convert SQLAlchemy's ORM objects directly into Pydantic models.
  • Performance considerations:

    • Data loading: lazy="immediate" will automatically load the relevant data when the object is accessed, which is suitable for scenarios where the full data structure needs to be accessed immediately, but it can lead to unnecessary loading, especially if not all the child objects are needed.
    • conversion overhead: model_validate is a single-step operation that automatically handles conversions of complex nested objects. Although convenient, its overhead depends on the complexity of the objects and the depth of nesting. For a large number of deeply nested objects, the conversion may take longer.
  • vantage:

    • Simplicity: reduces the need to write recursive code manually.
    • Automation: the ability to automate the conversion of nested relationships.
  • drawbacks:

    • May result in unnecessary data loading.
    • Performance may be degraded for massively nested objects.

2)Recursively manually populating SQLAlchemy model classes

  • Working Principle:

    • Manually write recursive functions to traverse SQLAlchemy's ORM objects and convert them to Pydantic models.
    • Flexibility to choose when to load subobjects (via theselectinload, joinedload etc.), thus reloading the data when needed.
  • Performance considerations:

    • Data loading: Manually controlling the data loading strategy optimizes query performance and avoids unnecessary data loading. Loading data only when it is needed avoids thelazy="immediate" Bring on the overloading.
    • conversion overhead: Manual recursion usually deals only with the current node and its direct children and is relatively efficient. You can reduce unnecessary object creation and conversion by optimizing the recursion logic.
  • vantage:

    • Performance optimization: You can optimize the loading strategy and recursive processing as needed to avoid unnecessary calculations and data loading.
    • Flexibility: more flexible control over object transformations and data access.
  • drawbacks:

    • Recursive logic needs to be written manually, adding complexity to the code.
    • not as good asmodel_validate Automated and error prone.

3)Summary and Options

  • Lesser data size and complexity: model_validate + lazy="immediate" More convenient and with little performance impact, automated Pydantic model transformations can be quickly implemented.

  • When data size is large and nesting depth is high: It may be more efficient to fill manually and recursively, especially if you need to finely control the data loading strategy. This avoids overloading and optimizes performance.

reach a verdict: If your application scenario has high performance requirements and more complex data structures, manual recursion may be preferable. If you prioritize code simplicity and development efficiency, and the data size is not large, thenmodel_validate become man and wifelazy="immediate" is the better choice.