Location>code7788 >text

How to upload a short video|Write a file service that supports slice upload/breakpoint upload/second upload function!

Popularity:435 ℃/2024-08-17 11:13:55

preamble

The short video apps you usually use, WeChat & Weibo and other graphic communities, their ability to upload graphic dynamics & video, are extremely core to the business.

Essentially, it's all about file uploading. This article takes you through the process of writing a file uploading service and exploring its core principles, which I'm sure can be of some help to you.

Thanks to my buddy.Trembling Support for this article

It consists mainly of the following capabilities:

  • File Upload
  • File Download
  • Split Upload
  • resume transmission after a break
  • file transfer in seconds

Past Video Explained 📺: B-Site:ShirasawaTalk, public number: Shirazetalk

image-20240726234405804

FileService Main Capabilities

pd definition:

service FileService {
  // pre sign a file url for user get it
  rpc PreSignGet(PreSignGetRequest) returns (PreSignGetResponse);
  // pre sign a file url for user put it
  rpc PreSignPut(PreSignPutRequest) returns (PreSignPutResponse);
  // report a file has been uploaded
  rpc ReportUploaded(ReportUploadedRequest) returns (ReportUploadedResponse);
  // pre sign a file url for user put it with slicing
  rpc PreSignSlicingPut(PreSignSlicingPutRequest) returns (PreSignSlicingPutResponse);
  // get upload progress rate for slicing put
  rpc GetProgressRate4SlicingPut(GetProgressRate4SlicingPutRequest) returns (GetProgressRate4SlicingPutResponse);
  // merge a slicing uploading file
  rpc MergeFileParts(MergeFilePartsRequest) returns (MergeFilePartsResponse);
  // remove a file
  rpc RemoveFile(RemoveFileRequest) returns (RemoveFileResponse);
}

FileService is used to provide file uploading and downloading capabilities to various business areas. In all interfaces of FileService, there exists an interface namedfile contextstructure, this parameter is usually used to specify file-related information. Its structure is as follows:

message FileContext {
  // The domain to which the bucket is assigned.
  string domain = 1;
  // The name of the business to which it belongs
  string biz_name = 2; // The id of the file.
  // File id
  int64 file_id = 3; // file sha256 hash
  // File sha256 hash
  string hash = 4; // File type
  // file type
  string file_type = 5; // File size in bytes.
  // File size in bytes
  int64 size = 6; // File access link expiration time.
  // expiration time of the file access link
  int64 expire_seconds = 7; // File name, in bytes.
  // File name
  string filename = 8; }
}

In the individual requests, thedomaincap (a poem)biz_nameTwo parameters are mandatory, and the combination of these two parameters will make the tables that FileService relies on to be divided into tables for the first time with the dimension of business domain. On this basis, you can configure a corresponding secondary sub-table for each business domain, which can be specified to be divided into a number of sub-tables.

Quick Start

FileService has different scaling speeds for the number of files for different business areas and different business items, and the amount of data brought in is likely to be vastly different. Therefore, to use the ability of FileService, you need to configure the number of sub-tables in FileService.

existin which you can configure the following:

data:
  db_sharding_config:
    file_shortvideo_short_video:
      sharding: file_shortvideo_short_video
      sharding_number: 5

db_sharding_configitem is used to configure the number of sub-tables under thefile_shortvideo_short_videois the sub-table key, where "file" is a fixed value.shortvideofor the area of operations.short_videois the name of the business.sharding_numberis the number of sub-tables. The business area and business name need to be passed in when using FileService. If not configured, the default number of split tables is 1.

Primary Link

Normal Upload/Download

sequenceDiagram participant up as Upstream Services participant fs as FileService participant minio as Minio Server up->>fs: PreSignPut fs-->>up: PreSignPutResponse(MinioUpload link) up->>minio: Uploading files minio-->>up: Upload results up ->> fs: ReportUploaded fs-->>up: ReportUploadedResponse up->>fs: PreSignGet fs-->>up: PreSignGetResponse(download link) up->>minio: Download file minio-->>up: Return to Documentation

Segmented Uploads / Disconnected Uploads

sequenceDiagram participant up as UpstreamService participant fs as FileService participant minio as Minio Server up->>fs: PreSignSlicingPut fs-->>up: PreSignSlicingPutResponse(array of sliced upload links) up->>minio: upload a portion of the file minio-->>up: upload results up->>up: have a cup of coffee up->>fs: GetProgressRate4SlicingPut fs-->>up: GetProgressRate4SlicingPutResponse(slicing upload progress) up->>minio: Upload remaining files minio-->>up: upload result up->>fs: MergeFileParts fs-->>up: MergeFilePartsResponse up->>fs: PreSignGet fs-->>up: PreSignGetResponse(download link) up->>minio: Download File minio-->>up: Return File

Key competencies

Normal Upload

The most basic upload capabilities are provided. Parameters that require additional uploads for this interface include:

  • hash: sha256 value of the file
  • file_type: file type
  • size: file size (in bytes)
  • expire_seconds: expiration time for file upload links

For example, you can pass this interface a parameter like.

{
    "file_context": {
        "domain": "shortvideo",
        "biz_name": "short_video",
        "hash": "8D6BB0819A2C1E66F846031DC54AAF47",
        "file_type": "pdf",
        "size": 1181178,
        "expire_seconds": 86400
    }
}

Upon access, the interface will return an upload link (http) to which the upstream service/frontend will upload the file directly. An upload sample code (Python):

with open(file_path, 'rb') as file_data:
    response = (
        minio_url, # The upload link returned by the interface
        data=file_data,
        headers={"Content-Type": "application/octet-stream"}
    )
    print(response)
    return response.status_code

After uploading the file by means of the Python code above, you can't access the file directly, you need to go through theinterface for upload confirmation, only the uploaded confirmed file will be marked in the database as auploadedand can be accessed. AboutThe specific use of the

Split Upload

In some cases, the files to be uploaded are large and the following problems may occur if they are uploaded directly:

  1. Slow uploading
  2. If something goes wrong during the upload process, the entire file needs to be re-uploaded

So, FileService provides the ability to slice and dice uploads.

First of all, it can be done byPre-registers for a slice upload task. The interface requires the same parameters to be passed in as theSame. The main elements returned by the interface include:

  • urls: array of upload links for each slice, already sorted by slice number
  • upload_id: id of the upload task
  • parts: total number of slices
  • file_id: file id

Parameter example.

{
    "file_context": {
        "domain": "shortvideo",
        "biz_name": "short_video",
        "hash": "8D6BB0819A2C1E66F846031DC54AAF47",
        "file_type": "pdf",
        "size": 72246060,
        "expire_seconds": 86400
    }
}

At this point, the file is sliced by the upper service/front-end (each slice is 5MB in size), and then the individual slices are uploaded, an example of a file slice is as follows (in Python).

def slicing(filename).
    file_size = 5 * 1024 * 1024 # 10MB

    files = list()

    # Open the file
    with open(filename, 'rb') as f.
        index = 0
        while True: # Locate the position to read.
            # Locate where to read
            (index * file_size)
            # Read the data
            data = (file_size)
            # If the end of the file has been reached, exit the loop
            if not data.
                break
            # Write the split file
            with open(f'{filename}_{index}', 'wb') as f1: # Write to the split file.
                (data)
            (data)
            # Update the position
            index += 1
    return files

After all the slices have been uploaded, you can pass theto merge slices. Key parameters include.

  • file_id
  • upload_id

Parameter example.

{
    "upload_id": "ZDNlOWI2MjktMjAzOC00NzJkLWE0ODYtOGMzZTBlZmJlODUwLmRmN2M5ZWQyLTYxMzMtNDM4NS1hNTljLWEwMzRlNTI5NWNkNHgxNzIzNzM5ODA2MTM2NzU3MzE5",
    "file_context": {
        "file_id": 1824123073628999680,
        "domain": "shortvideo",
        "biz_name": "short_video"
    }
    
}

Unlike different uploads, slice uploads are not performed until the upload completes and calls theAfter that, the upload confirmation will be performed automatically, without having to call the

resume transmission after a break

During the above sliced upload, you can pass theFileService.GetProgressRate4SlicingPutto get the specifics of the slice upload, the main incoming parameters include.

  • file_id
  • upload_id

The return value of this interface contains a value namedpartsmap, key is the slice number, value is whether the slice is uploaded or not, the upstream service or server can decide which slices need to be re-uploaded based on this information

Example Parameters.

{
    "upload_id": "ZDNlOWI2MjktMjAzOC00NzJkLWE0ODYtOGMzZTBlZmJlODUwLjA3ZTgyNmY0LWE4YjQtNDQxMC04M2QzLWY4ODQ4MTRiZGM4Y3gxNzIzNzM4ODUzMTMzMjY4ODQ3",
    "file_context": {
        "file_id": 1824119076553756672,
        "domain": "shortvideo",
        "biz_name": "short_video"
    }
    
}

Example of return value.

{
    "parts": {
        "1": true,
        "2": true,
        "3": true,
        "4": true,
        "5": true,
        "6": true,
        "7": true,
        "8": true,
        "9": true,
        "10": true,
        "11": true,
        "12": true,
        "13": true,
        "14": true
    },
    "meta": {
        "reason": [],
        "biz_code": 0,
        "message": "success",
        "domain": ""
    },
    "progress_rate": 100
}

Upload Confirmation

Called when the upload is complete (except for slice uploads, which end at the end of the slice upload)(upload confirmation will be performed automatically), all need to call theto report, the mandatory parameters for this interface are.

  • file_id

This interface does one thing: it checks the hash of the file, and when it passes, it marks the file as "uploaded successfully", otherwise the file is not available for querying.

Parameter Example:

{
    "file_context": {
        "domain": "shortvideo",
        "biz_name": "short_video",
        "file_id": "1824118603822141440"
    }
}

Download file

pass (a bill or inspection etc)interface to get the link to download the file, the main parameters passed to the interface include:

  • file_id
  • expire_seconds

Parameter example.

{
    "file_context": {
        "domain": "shortvideo",
        "biz_name": "short_video",
        "file_id": "1824123073628999680",
        "expire_seconds": 86400,
        "filename": "data.mp4"
    }
}

sizzle

existcap (a poem)interface, if the passed-in hash already exists, it returns afile_idThis one.file_idCan be used to get a download link so that you don't need to upload the file again.