preamble
The short video apps you usually use, WeChat & Weibo and other graphic communities, their ability to upload graphic dynamics & video, are extremely core to the business.
Essentially, it's all about file uploading. This article takes you through the process of writing a file uploading service and exploring its core principles, which I'm sure can be of some help to you.
Thanks to my buddy.Trembling Support for this article
It consists mainly of the following capabilities:
- File Upload
- File Download
- Split Upload
- resume transmission after a break
- file transfer in seconds
Past Video Explained 📺: B-Site:ShirasawaTalk, public number: Shirazetalk
FileService Main Capabilities
pd definition:
service FileService {
// pre sign a file url for user get it
rpc PreSignGet(PreSignGetRequest) returns (PreSignGetResponse);
// pre sign a file url for user put it
rpc PreSignPut(PreSignPutRequest) returns (PreSignPutResponse);
// report a file has been uploaded
rpc ReportUploaded(ReportUploadedRequest) returns (ReportUploadedResponse);
// pre sign a file url for user put it with slicing
rpc PreSignSlicingPut(PreSignSlicingPutRequest) returns (PreSignSlicingPutResponse);
// get upload progress rate for slicing put
rpc GetProgressRate4SlicingPut(GetProgressRate4SlicingPutRequest) returns (GetProgressRate4SlicingPutResponse);
// merge a slicing uploading file
rpc MergeFileParts(MergeFilePartsRequest) returns (MergeFilePartsResponse);
// remove a file
rpc RemoveFile(RemoveFileRequest) returns (RemoveFileResponse);
}
FileService is used to provide file uploading and downloading capabilities to various business areas. In all interfaces of FileService, there exists an interface namedfile context
structure, this parameter is usually used to specify file-related information. Its structure is as follows:
message FileContext {
// The domain to which the bucket is assigned.
string domain = 1;
// The name of the business to which it belongs
string biz_name = 2; // The id of the file.
// File id
int64 file_id = 3; // file sha256 hash
// File sha256 hash
string hash = 4; // File type
// file type
string file_type = 5; // File size in bytes.
// File size in bytes
int64 size = 6; // File access link expiration time.
// expiration time of the file access link
int64 expire_seconds = 7; // File name, in bytes.
// File name
string filename = 8; }
}
In the individual requests, thedomain
cap (a poem)biz_name
Two parameters are mandatory, and the combination of these two parameters will make the tables that FileService relies on to be divided into tables for the first time with the dimension of business domain. On this basis, you can configure a corresponding secondary sub-table for each business domain, which can be specified to be divided into a number of sub-tables.
Quick Start
FileService has different scaling speeds for the number of files for different business areas and different business items, and the amount of data brought in is likely to be vastly different. Therefore, to use the ability of FileService, you need to configure the number of sub-tables in FileService.
existin which you can configure the following:
data:
db_sharding_config:
file_shortvideo_short_video:
sharding: file_shortvideo_short_video
sharding_number: 5
db_sharding_config
item is used to configure the number of sub-tables under thefile_shortvideo_short_video
is the sub-table key, where "file" is a fixed value.shortvideo
for the area of operations.short_video
is the name of the business.sharding_number
is the number of sub-tables. The business area and business name need to be passed in when using FileService. If not configured, the default number of split tables is 1.
Primary Link
Normal Upload/Download
Segmented Uploads / Disconnected Uploads
Key competencies
Normal Upload
The most basic upload capabilities are provided. Parameters that require additional uploads for this interface include:
- hash: sha256 value of the file
- file_type: file type
- size: file size (in bytes)
- expire_seconds: expiration time for file upload links
For example, you can pass this interface a parameter like.
{
"file_context": {
"domain": "shortvideo",
"biz_name": "short_video",
"hash": "8D6BB0819A2C1E66F846031DC54AAF47",
"file_type": "pdf",
"size": 1181178,
"expire_seconds": 86400
}
}
Upon access, the interface will return an upload link (http) to which the upstream service/frontend will upload the file directly. An upload sample code (Python):
with open(file_path, 'rb') as file_data:
response = (
minio_url, # The upload link returned by the interface
data=file_data,
headers={"Content-Type": "application/octet-stream"}
)
print(response)
return response.status_code
After uploading the file by means of the Python code above, you can't access the file directly, you need to go through theinterface for upload confirmation, only the uploaded confirmed file will be marked in the database as a
uploaded
and can be accessed. AboutThe specific use of the
Split Upload
In some cases, the files to be uploaded are large and the following problems may occur if they are uploaded directly:
- Slow uploading
- If something goes wrong during the upload process, the entire file needs to be re-uploaded
So, FileService provides the ability to slice and dice uploads.
First of all, it can be done byPre-registers for a slice upload task. The interface requires the same parameters to be passed in as the
Same. The main elements returned by the interface include:
- urls: array of upload links for each slice, already sorted by slice number
- upload_id: id of the upload task
- parts: total number of slices
- file_id: file id
Parameter example.
{
"file_context": {
"domain": "shortvideo",
"biz_name": "short_video",
"hash": "8D6BB0819A2C1E66F846031DC54AAF47",
"file_type": "pdf",
"size": 72246060,
"expire_seconds": 86400
}
}
At this point, the file is sliced by the upper service/front-end (each slice is 5MB in size), and then the individual slices are uploaded, an example of a file slice is as follows (in Python).
def slicing(filename).
file_size = 5 * 1024 * 1024 # 10MB
files = list()
# Open the file
with open(filename, 'rb') as f.
index = 0
while True: # Locate the position to read.
# Locate where to read
(index * file_size)
# Read the data
data = (file_size)
# If the end of the file has been reached, exit the loop
if not data.
break
# Write the split file
with open(f'{filename}_{index}', 'wb') as f1: # Write to the split file.
(data)
(data)
# Update the position
index += 1
return files
After all the slices have been uploaded, you can pass theto merge slices. Key parameters include.
- file_id
- upload_id
Parameter example.
{
"upload_id": "ZDNlOWI2MjktMjAzOC00NzJkLWE0ODYtOGMzZTBlZmJlODUwLmRmN2M5ZWQyLTYxMzMtNDM4NS1hNTljLWEwMzRlNTI5NWNkNHgxNzIzNzM5ODA2MTM2NzU3MzE5",
"file_context": {
"file_id": 1824123073628999680,
"domain": "shortvideo",
"biz_name": "short_video"
}
}
Unlike different uploads, slice uploads are not performed until the upload completes and calls theAfter that, the upload confirmation will be performed automatically, without having to call the
。
resume transmission after a break
During the above sliced upload, you can pass theFileService.GetProgressRate4SlicingPut
to get the specifics of the slice upload, the main incoming parameters include.
- file_id
- upload_id
The return value of this interface contains a value namedparts
map, key is the slice number, value is whether the slice is uploaded or not, the upstream service or server can decide which slices need to be re-uploaded based on this information
Example Parameters.
{
"upload_id": "ZDNlOWI2MjktMjAzOC00NzJkLWE0ODYtOGMzZTBlZmJlODUwLjA3ZTgyNmY0LWE4YjQtNDQxMC04M2QzLWY4ODQ4MTRiZGM4Y3gxNzIzNzM4ODUzMTMzMjY4ODQ3",
"file_context": {
"file_id": 1824119076553756672,
"domain": "shortvideo",
"biz_name": "short_video"
}
}
Example of return value.
{
"parts": {
"1": true,
"2": true,
"3": true,
"4": true,
"5": true,
"6": true,
"7": true,
"8": true,
"9": true,
"10": true,
"11": true,
"12": true,
"13": true,
"14": true
},
"meta": {
"reason": [],
"biz_code": 0,
"message": "success",
"domain": ""
},
"progress_rate": 100
}
Upload Confirmation
Called when the upload is complete (except for slice uploads, which end at the end of the slice upload)(upload confirmation will be performed automatically), all need to call the
to report, the mandatory parameters for this interface are.
- file_id
This interface does one thing: it checks the hash of the file, and when it passes, it marks the file as "uploaded successfully", otherwise the file is not available for querying.
Parameter Example:
{
"file_context": {
"domain": "shortvideo",
"biz_name": "short_video",
"file_id": "1824118603822141440"
}
}
Download file
pass (a bill or inspection etc)interface to get the link to download the file, the main parameters passed to the interface include:
- file_id
- expire_seconds
Parameter example.
{
"file_context": {
"domain": "shortvideo",
"biz_name": "short_video",
"file_id": "1824123073628999680",
"expire_seconds": 86400,
"filename": "data.mp4"
}
}
sizzle
existcap (a poem)
interface, if the passed-in hash already exists, it returns a
file_id
This one.file_id
Can be used to get a download link so that you don't need to upload the file again.