gdfs: Open source code analysis of GoogleDrive client based on Fuse

background

In the process of learning fuse, first start learning from demo in libfuse to understand the framework of communication between user state and kernel state. The demo here only focuses on the most basic communication, and the implementation of the user-state file system is just the simplest read only file system, and other operations are fake interfaces.
We must continue to learn in depth and directly see that highly integrated and highly perfect codes such as cephfs are easily overwhelmed by details. It is best to proceed step by step, and at the same time, you can also compare horizontally to see what problems each implementation has solved, how to organize file indexes in the user state, and how to cache them Implementation, how to solve the cache consistency between the client and the server, how to implement distributed locks, whether the implementation method is elegant, where there is something that needs to be improved, etc.
I found a Google Drive file system developed based on fuse. I looked through the code structure and it was relatively simple and suitable for introductory learning. Therefore, this article uses this project to illustrate.
Project address:
GitHub - robin-thomas/GDFS: Google Drive File System
Note: This code involves interacting with Google Drive through API, so you also need to understand Google API:
gdrive api:
/drive/api/guides/about-sdk?hl=zh-cn
google drive go sdk quickstart (see go sdk as an example here. The official does not provide c++ sdk, so the author calls the API through libcurl hand rubbing):
/drive/api/quickstart/go?hl=zh-cn
google cloud console:
/apis/credentials?hl=zh-cn&inv=1&invt=AbpcxA&project=wangxz-proj00049

Environment construction

# Pull up the centos8 container:
 IMAGE_ID='5d0da3dc9764'
 NAME=centos8_demo
 docker run --privileged -idt \
         --name $NAME \
         -v /data:/data \
         --net host \
         ${IMAGE_ID} \
         /usr/sbin/init
 docker exec -it $NAME /bin/bash

 #Configuration source:
 mkdir /etc//orig && mv /etc//*.repo /etc//orig && curl -o /etc/// /repo/

 # Install the compilation tool:
 yum install automake gcc-c++ cmake libfuse-devel libfuse3-devel curl libcurl -y -q

 # Compile:
 # Reconfigure according to the specific version of automake, otherwise an error will be reported and automake-1.15 cannot be found:
 autoreconf -f -i
 ./configure
 Make
 make install

 #Configuration:
 vim /opt/gdfs/
 #Log location:
 tail -f /opt/gdfs/

 # start up:
 /opt/gdfs/ start

 # or
 /usr/local/bin/ -m /tmp/robin -f -s --log_path /opt/gdfs --log_level DEBUG -o allow_other -o allow_root -o direct_io

 b mount_fs
 b get_node
 b gdfs_read
 b fuse_ll_process_buf
 b LRUCache::get
 b File::get
 b File::read_file

 #--
 cat /tmp/robin/_opt_gdfs_/haha

Architectural Analysis

google auth

This project uses Google OAuth 2.0 for authentication. The API example is as follows:

# google api example:
 curl '/drive/v3/about?fields=storageQuota(limit%2CusageInDrive)&key=[YOUR_API_KEY]' \
   --header 'Authorization: Bearer [YOUR_ACCESS_TOKEN]' \
   --header 'Accept: application/json' \
   --compressed

API simulation:/drive/api/reference/rest/v3/about/get?hl=zh-cn&apix_params=%7B%22fields%22%3A%22storageQuota(limit%2CusageInDrive)%22%7D

google drive/v3/files api

You need to know how to use Google API to access Google Drive's files.
● Each file or folder has a file_id as a unique identifier. For example, the modification time of a file_id can be obtained through the following interface:

/drive/v3/files/<file_id>?fields=modifiedTime

gdfs uses this interface to see if the modification time of a file is the same as that of local. If it is different, it means that there are modifications in the remote end. You need to obtain all subdirectories of this directory (otherwise you cannot know whether the subdirectories have been modified), and then check whether their children have been modified. If there is any modification, it needs to be GET.
A complete list of new subitems is required to accurately identify deleted files (items that exist in the old list but are not in the new list)
A complete list is required to detect and handle file name conflicts (when multiple clients create files of the same name concurrently)
If the parent directory is not modified, it also means that the files in its child directory have not been modified either.
A directory containing 5000 files needs to be obtained in 5 requests. When 10 files are deleted in the cloud, it is impossible to know which files are deleted through the modification time alone. When other clients add files, they need to synchronize the new items through the full list.

// Construct the URL to send the request.
url  = GDFS_FILE_URL_ + std::string("?pageSize=1000&q='") + parent_file_id;
url += "'+in+parents+and+trashed+%3D+false&orderBy=name&spaces=drive";
url += "&fields=files(id%2CmimeType%2CmodifiedTime%2Cname%2Csize%2CviewedByMeTime)%2CnextPageToken";

Data structure

Public functions

//A global static variable file_id_node is used to maintain the mapping relationship between file_name and GDFSNode pointer:
 std::unordered_multimap <std::string, GDFSNode *> file_id_node;
 // This design should have problems when multiple clients are mounted, TODO: Test.

 // A client's mount point corresponds to a GDrive, which contains uid/gid, capacity information, mount directory name rootDir, Auth instance used to call Google APIs, cache management LRUCache instance, thread pool instance, and a GDFSNode *  The root pointer is initialized at GDrive::get_root() and points to the root directory of the current file system.  And add it to the map of file_node_id.
 class GDrive {
   public:
     uid_t uid;
     gid_t gid;
     time_t mounting_time;
     uint64_t bytes_used;
     uint64_t bytes_free;
     uint64_t bytes_total;
     std::string rootDir; //root directory address
     std::string change_id; //The root directory change_id can be obtained from the remote end through the API to describe whether the root directory has been modified
     Auth auth; //Authentication example
     LRUCache cache; //Cache management example
     Threadpool threadpool; //Threadpool management example
     struct GDFSNode * root; // root directory GDFSNode pointer
 }

 // Analogizes to the basic data structure used by inode to describe a file, which can be used to find the parent node (parent) and the child node (through map children).
 // (linux puts file_name into entry, but this project directly puts it into Node.)
 struct GDFSNode {
   char link;
   std::string file_name;
   std::string sym_link;
   GDFSEntry * entry; // This file to entry information pointer
   GDFSNode * parent; // Parent node pointer
   std::unordered_map <std::string, struct GDFSNode *> children;// child node map in this directory
 }

 //Used to describe the organization form of a file in memory, analogy to dentry
 struct GDFSEntry {
   std::string file_id; //The file identifier in google drive. Each file has a unique file_id. All interactions through API need to be used to identify them.
   //The generation rules of file_id are described below.
   uint64_t file_size;
   time_t ctime;
   time_t mtime;
   time_t atime;
   time_t cached_time;
   uid_t uid;
   gid_t gid;
   mode_t file_mode;
   dev_t dev;
   bool is_dir; // Whether it is dir, it is used to distinguish between processing folders and files
   int ref_count;
   std::string mime_type;
   bool g_doc;
   bool dirty;
   bool pending_create;//When creating a file, is it marked as
   bool file_open;
   bool write;
   bool pending_get;
 }

 sem_t req_item_sem;
 pthread_mutex_t worker_lock;
 std::list <struct req_item> req_queue;
 std::queue <std::string> file_id_q; //Global file_id queue, batch generation

Generation of file_id

In GDrive::make_file(), if file_name starts with '.', it means a hidden file, and file_id is defined as file_id = gdfs_name_prefix + rand_str(); otherwise, it means a normal file or folder. At this time, a file_id is found as file_id from file_id_q. .
The queue file_id_q here is generated by actively calling generate_file_id() every time GDrive::make_file() or GDrive::make_dir(). This queue is expected to store more than 100 file_id queues.
When creating a file or folder, generate_file_id() will be called to check whether the queue is empty or the queue is less than 100. If so, the API 'generateIds' is called to obtain it, and file_id is placed in the queue'file_id_q 'middle,

void
GDrive::make_file (const std::string & file_name,
                   mode_t file_mode,
                   struct GDFSNode * parent_node,
                   uid_t uid_,
                   gid_t gid_)
void
GDrive::make_dir (const std::string & file_name,
                  mode_t file_mode,
                  struct GDFSNode * parent_node,
                  uid_t uid_,
                  gid_t gid_)

pending_create: Delay remote creation

When the client creates a file, in GDrive::make_file(), get the file_id described above, new a GDFSEntry(), add it to parent_node, and add it to file_id_node:

// Add to the directory tree.
entry = new GDFSEntry(file_id, 0, false, mtime, mtime, uid_, gid_, file_mode);
assert(entry != NULL);
node = parent_node->insert(new GDFSNode(file_name, entry, parent_node));
assert(node != NULL);
file_id_node.emplace(file_id, node);

To judge, if it is not a system file (that is, the file name does not start with gdfs_name_prefix), but a user's own file, set the entry->pending_create to true, indicating that it is in the process of creation, and then initiate an INSERT request to the Google API:

if (file_id.compare(0, gdfs_name_prefix.size(), gdfs_name_prefix) != 0) {
   entry->pending_create = true;//Set pending_create=true to indicate that it is being created.
   threadpool.build_request(file_id, INSERT, node, url, query);
 }

After the INSERT request calling Google API is completed, set it to false, indicating that the creation request for the file has been completed:

bool
 Threadpool::send_insert_req (std::string & url,
                              std::string & query,
                              struct GDFSEntry * entry) {
   //...
   //Send a request:
   resp = this->(url, INSERT, query);
   //Update mtime:
   mtime = rfc3339_to_sec(val["modifiedTime"].get());
   entry->mtime = entry->ctime = mtime;
   //Put pending_create back to false:
   entry->pending_create = false;
   //...
 }

This design is because the establishment of the GDFSNode data structure in memory and the sending request to Google Drive upload are not atomic.
So when will this entry->pending_create be used?
In get_children(), the last step will check the existing file list of GoogleDrive and

get_children: Create data structure, update and modify it locally as needed

/*
 * Retrieve the children list,
 * given a pointer to the node of the directory,
 * in the directory tree.
 */
void
GDrive::get_children (struct GDFSNode * parent)

In GDrive::get_children(struct GDFSNode *parent), its children will be queried based on the passed parent. If there is no node in memory, it will be rebuilt based on the list and information obtained from the remote end. At the same time, it will also check whether the remote end is inconsistent with the local area. If the remote end is newer than the local area, pull the latest file and update the node.
The following work will be carried out:
● Check whether the directory has been modified and has not been synchronized to the local area. That is, whether it is the pending_get state. likeparent->entry->pending_get==true, then reset it to false and then set dir_modified=true.
○ After setting dir_modified=true, you will make a judgment in the future. If this is not the case, it means that this directory has not been modified and there is no need to update.
○ Set the timing of pending_get status to get_children(), and determine that the current entry is dir(is_dir==true), and entry->mtime < mtime (the mtime recorded in entry is smaller than the mtime of the file obtained from the interface, that is, the directory has been modified at the remote end and has not been updated to the client).
○ Pending_get will only be set in dir type entry, not at the file level.
● If the parent currently passed in is the root directory (parent->file_name=="/"）
○ Called"/drive/v3/changes/startPageToken?fields=startPageToken"Get change_id and compare it with the change_id recorded in the current GDrive.
○ If it is different, it means that the root directory has been modified, set dir_modified=true, and update the new change_id to GDrive->change_id.
● If the above two conditions are not entered, then the current condition is entered, that is, the current folder is not the root directory, but can be a certain subdirectory or file.
○ Call "/drive/v3/files//?fileds=modifiedTime” to get the mtime of the current file_id.
○ If the obtained mtime is greater than the mtime recorded in the entry (mtime> entry->mtime), or if the current parent does not have a font directory, set dir_modified=true.
■ Judgment hereparent->is_empty()==trueAt the time, it is also believed that dir may be modified because there may be new files in the remote area of this directory and have not been synchronized to the local area. Mandatory verification is required.
● Ifdir_modified==false, which means that the parent_node has not been modified, goto out directly. Otherwise, it means that the perent_node has been modified and the existing process continues.
The above are all judgments for processing parent itself. The following will deal with matters about the word directory and file when parent is used as a directory.
● Call the API to get the latest parent's child item list and pass the value to child_items.
The get list here is paging with 1000, similar to the paging list object in rgw, to avoid excessive number of single directory files, occupancy of a large amount of memory, and query return timeout.

url  = GDFS_FILE_URL_ + std::string("?pageSize=1000&q='") + parent_file_id;
url += "'+in+parents+and+trashed+%3D+false&orderBy=name&spaces=drive";
url += "&fields=files(id%2CmimeType%2CmodifiedTime%2Cname%2Csize%2CviewedByMeTime)%2CnextPageToken";

● traversal child_items
○ Gets its file_id and mtime. record atime=mtime.
○ Check whether child is dir. Determine whether it is dir (if (mime_type == "application/")), if so, set is_dir=true.
○ If not dir, check whether it is Google Docs. If so, add pdf to the suffix.

 if (mime_type == "application/" ||
      mime_type == "application/" ||
      mime_type == "application/" ||
      mime_type == "application/") {
    g_doc = true;
    file_name += ".pdf";

○ If it is neither dir nor Google Docs, but just a normal file, it is assigned to file_size according to the returned size field.
○ According to file_id, check whether the map file_id_node exists.
■ If present, obtain its entry.
● If the file is dirty data, that isentry->dirty==true, skip.
○ The time to set entry->dirty=true is: when deleting a file, GDrive::delete_file(GDFSNode _node, bool delete_req) (the hard connection processing is also involved here)
■ If node->parent exists, disconnect the pointer link from its parent node to this node.
■ If the current node is dir, iterate through all children and add it to the q_nodes pending queue.
■ Otherwise it is not a directory, butentry->ref_count==1, then delete its call (file_id).
■ If the deleted file is not a special file (gdfs_name_prefix), and there is a request to delete tag (delete_req), and (entry->ref_count==1Or entry is dir andref_count==2, at this time it is marked as dirty: entry->dirty=true, and add it to the queue for processing: threadpool.build_request(file_id, DELETE, node, url);
■ Otherwise, it means that there is a hard connection, that is, multiple inodes have the same file_id. At this time, you need to pay attention to deleting the correct node.
● Find the corresponding GDFSNode based on file_id in file_id, traverse, find the project corresponding to the current node, and delete it from file_id_node.
■ Free node memory and set it to NULL.
○ When ref_count is initialized, dir is 2 and file is 1.
○ Ref_count reduction timing: ~GDFSNode() destructor.
○ Ref_count increase opportunity: in gdfs_link(path, new_path)
■ Find new_parent and new_file_name corresponding to new_path
■ Call state->get_node() according to path,uid,pid and assign value to node, and get entry=node->entry;
■ Verification, if entry->is_dir=true, it returns EPERM and does not allow hard connections to directories.
■ Obtain the node through state->get_node() according to new_parent,uid,gid, and confirm that new_parent exists.
■ Check permissions
■ Check whether the new path exists: tmp_node=node->find(new_file_name)
■ Increase the reference count of entry: ++entry->ref_count
■ For new_file_name, create a new GDFSNode and add it to file_id_node. Note that file_id is still the old file_id here. That is, hard connection is realized.
● If the file is being written, that is, entry->write=true, add its file_id to set s1.
○ Time when entry->write is set to true 1: gdfs_write(_path, *buf, size, offset, fuse_file_info):
■ Get node according to path: node=state->get_node(path,uid,gid);
■ Get entry according to node: entry=node->entry;
■ Check permissions
■ Set the current time to entry->mtime, set entry->file->size
■ Cache the current file, see "Write to Cache Logic" for details:

ret = state->(entry->file_id, const_cast<char*>(buf), offset, size, node, false);
entry->write = true;

■ Return size to write system call.
○ Time when entry->write is set to true 2: gdfs_truncate(path,newsize):
■ Get node = state->get_node(path,uid,gid);
■ Get entry = node->entry;
■ Determine the newsize size
● Ifnewsize==0, then clear the cache of the file_id in the cache:
state->(entry->file_id, NULL, 0, 0, NULL);
● Otherwise, the size of truncate is specified, and the write mark is set like the write operation: entry->write=true. Then calculate the starting position and size of truncate, call memset to initialize the given size buf, and update it to the cache.
○ If newsize > entry->file_size, then add zero:
state->(entry->file_id, buf, start, size, node, false)
○ Otherwise, if newsize < entry->file_size, the cache is reduced: state->(entry->file_id, newsize);
■ Update the mtime of the file_id in the cache: state->cache.set_time(entry->file_id, mtime);
■ Follow new entry->mtime and entry->file_size.

● If the file is Google Docs, that isg_doc==true, and the remote file is modified and not synchronized to the local area, that is, mtime>entry->mtime, then call download_file() to download it to the local area.
● If the file is dir, is_dir==true, and it is not synchronized to the local area by remote modification, set entry->pending_get=true.
● Determine whether file_name is the same as file_name recorded in the local area
○ If the names are the same, update the atime and mtime in the entry
○ Otherwise, deal with name conflicts. And update the atime and mtime in the entry.
■ If file_id_node does not exist, you need to re-establish the GDFSEntry and GDFSNode data structures, insert them into the parent's GDFSNode linked list, and download them locally.
This is also the case when a file is accessed for the first time, the client does not have a corresponding file locally. This function calls the list interface of the parent folder, find the file, establish the client's metadata interface, and pull the data to the local location.
■ Add this file_id to s1.
○ From parent->get_children(), get all children and add them to s2.
○ Make a set difference for s1 and s2 and write it to s3
○ traversal s3, determine whether it is deleted_child, add it to it
○ traversal deleted_child, call child->parent->remove_child(child_node), and delete the node from the GDFSNode perspective.

file_node_id：

file_id_node is a hash table (unordered_multimap<string, GDFSNode*>) that allows duplicate keys to be used to store the mapping of Google Drive file ID to node pointers.
Add time:
Add it when making_dir(), make_file(), get_children(), etc. need to establish memory data structures.
Reading time:
The only way to check is in get_children(), which is used to check whether the file_id is local. If it is there, perform some operations; if it is not there, new GDFSNode and GDFSEntry will re-establish the memory data structure.
Other scenarios are used simply to modify operations during traversal, delete or renam operations.
Modification time:
rename。
Deletion time:
delete。

get_node implementation

The implementation function is to query paths based on paths and find the corresponding GDFSNode from the cache. If it is not in the cache, call get_children to build and add it to the cache.

struct GDFSNode *
GDrive::get_node (const std::string & path,
                  uid_t uid,
                  gid_t gid,
                  bool search)

● In the GDrive instance, find the root node pointer, which is the starting point of the entire search tree: struct GDFSNode * node = this->root;
● Assign the incoming path to tmp to determine whether it ends with '/'
○ If tmp itself is '/', it means it is the root directory. Call get_children(node) to get its subdirectory and goto out;
○ Otherwise, it means that it is not the root directory, remove the ending '/'. tmp.pop_back();
● Looping the tmp to segment according to '/', gradually parsing the file name on the path to next_dir,
○ In the parent node, try to find GDFSNode based on the divided next layer directory: child = node->find(next_dir), that is, query in node->children based on file_name to see if it exists.
■ If it does not exist, call GDrive::get_children(node) for reconstruction. Then search again: child = node->find(next_dir). If it is still not found, the error directory does not exist.
■ Otherwise, it means that the lower directory has been found, node = child;
■ Check permissions: GDrive->file_access(uid,gid,X_OK,node->entry);
■ If it is not a directory, an error will be reported. node->entry->is_dir == false;
● Process the pending DELETE request in the queue. likenode->entry->dirty==true, it means that you are waiting for deletion in the queue, and then directly return does not exist and goto out;
● If not found in the cache, call get_children(node) to search from GoogleDrive, and new GDFSNode and GDFSEntry to add to the cache:

// Check if the path component exists.
child = node->find(next_dir);
if (child == NULL) {
  search = ((search && () == true) ? true : false);
  try {
    this->get_children(node);
  } catch (GDFSException & err) {
    err_num = errno;
    error = ();
    goto out;
  }
  child = node->find(next_dir);
  if (child == NULL) {
    err_num = ENOENT;
    error = "path component " + next_dir + " does not exist";
    goto out;
  }
}
node = child;

● Some update_node(node) or get_children(node) logic.

if (search &&
  node != NULL &&
  node->entry->mtime > 0 &&
  node->entry->file_id.compare(0, gdfs_name_prefix.size(), gdfs_name_prefix) != 0 &&
  (node->entry->is_dir || node->link == 0) &&
  node->entry->write == false) {
this->update_node(node);
} else if (node != NULL &&
         node->entry->is_dir &&
         node->entry->pending_get) {
get_children(node);
}

Cache management data structure

Managed by the following three data structures:
Page: Data storage unit
File: File-level cache management. A file is managed by a File, which may consist of multiple different (start, stop) pages. This design can realize partial loading of large files to reduce resource usage such as memory.
LRUCache: System-level cache management

// A part of a file
 struct Page { // Data storage unit
     char* mem; // Memory block pointer
     off_t start; // Start offset
     off_t stop; // End offset
     size_t size; // Data block length
 };

 struct File { // File-level management, for a certain file, consisting of multiple pages sets
     std::set<Page*> pages; // Ordered paging collection
     pthread_mutex_t lock; // File-level lock
     size_t size; // Total file size
     time_t mtime; // Modify time
 };

 class LRUCache { // Global cache management, each GDFSDrive has one instance
     std::list<std::pair<std::string, File*>> cache; // Access sequential link table
     std::unordered_map<std::string, decltype(())> map; // File index
 };

Cache read logic: LRUCache::get

size_t
LRUCache::get (const std::string & file_id,
               char * buffer,
               off_t offset,
               size_t len,
               struct GDFSNode * node)

The basic logic is as follows:
● According to the file_id passed in, look up in LRUCache->map:
○ If not, create a new empty File instance and add it to the LRUCache->cache frontend. And add to LRUCache->map. Associate by the currently given file_id.
○ If it can be found, move the corresponding File object to the front end of the cache (via ()). If the to_delete passed in is true, it means that all file pages have been downloaded from Google Drive, the entire file may have been modified, and all of them will be removed (f->delete_pages()).
● Free up the len size space passed in the cache to ensure that the cache has capacity. this->free_cache(len);
● Copy the buffer to the newly created new_buf and add it to the cache by calling File->put(new_buf, start, stop, node->entry).
● Update LRUCache->size length: this->size += len;

The implementation of File->put interface here is interesting, involving how a File optimizes the management space through multiple pages.

Write Page to File: File::put

struct Page *
File::put (char * buf,
           off_t start,
           off_t stop,
           struct GDFSEntry * entry)

As mentioned above, File is used to represent a file. For scenarios where large files are partially read, in order to optimize space and network bandwidth usage, a range-like method is used for management: each File consists of multiple Pages, and its data structure is
std::set <struct Page *, page_cmp> pages; where page_cmp is defined as a->start < b->start, that is, a key-only map sorted in ascending order of start. Put the value of the same key, and its content cannot be modified.
Here are the following possibilities:

# There are two pages as follows, [start,stop] are [99,199] and [299,399] respectively.  The middle is empty, indicating that the client caches the two ranges of the file.
 # There are 7 possibilities at this time:
            |99--------199| |299----------399|
 1) |20-80|
 2) |20----------------150|
 3) |20-----------------------230|
 4) |100--150|
 5)           |100------------230|
 6)           |100----------------------------350|
 7) |200---280|

Among them, the first, second, and third possibilities can be included in the fifth, sixth, and seventh, so the following four possibilities are finally left:

           |99--------199|                |299--------399|
1)           |100--150|
2)           |100------------230|
3)           |100----------------------------350|
4)                           |200---280|

The following explains it separately:

All newly modified contents are within a range of the original page: the content of the specified range of this page is the newly modified content.
The newly modified content is within a certain range of a page, but exceeds a part. The excess is the void: within this page, modified. Outside this page, new Page, add File.
Based on 2), the excess part covers part of the next page: modify within this page, outside this page, new Page and add File to override the next page, and modify.
The newly modified content is in the gap between two existing pages: directly new page and add to File.
5) This File originally has no page: This is the simplest, just add new Page to File.
In this way, you can use a File->pages to manage the cache of part of the data of a certain file, and you can achieve the effect of "what part is needed to take which part" to avoid full copying of the data.

Write File to Cache: LRUCache::put

This function implements the LRU cache writing mechanism. Mainly including:
● Maintain the order of LRU elimination
● Cache space management
● File data page update

bool
LRUCache::put (const std::string & file_id,
               char * buffer,
               off_t offset,
               size_t len,
               struct GDFSNode * node,
               bool to_delete)

The specific ideas are as follows:
● Look in cache->map. If it cannot be found, it means it is not in the cache. You need new File, associate it with file_id, and add it to the map.
● Otherwise, it means that it can be found in the cache, adjust the cache queue, use the splice function, and place the File on the front end of the cache queue, that is, the LRU effect. Determine whether to_delete is true. If so, it means that the cache is already dirty data, and the cache needs to be cleared and call f->delete_pages().
to_delete is only set according to the actual scene when (). The settings are as follows:
○ To_delete=true scenario:
■ GDrive::download_file(). This method is only for GoogleDoc. There may be special requirements for this format, so I will not discuss it too much.
■ gdfs_truncate(),newSize==0When, it is necessary to have a full clear scene.
○ Scenarios with to_delete=false:
■ gdfs_truncate(), when newSize !=0, it is the scene of partial truncate.
■ gdfs_write()。
● According to the incoming len, free up the cache space to avoid insufficient memory.
● Call File->put() and add Page to cache.
● Update cache->size.

//eg:
(gdb) p file_id_node
$49 = std::unordered_multimap with 2 elements = {["1wKdLp3fiUv3fKVj6ibTnFhUv1MSMOwmY"] = 0x13046f0, ["root"] = 0x1221520}

FUSE interface implementation

After understanding the "Public Functions" part, it is very simple to implement the FUSE interface. Give an example:

# Take echo "123" > robin/_opt_gdfs_/1234 as an example
 # Call get_attr first

 # gdfs_create
 state->make_file() # Create a file in Google Drive.  It involves getting the file_id of parent_file_id and the current file, calling the API to create a file in Google Drive, creating a GDFSEntry and GDFSNode, adding it to the parent structure, and adding <file_id,node> to the file_id_node map.

 # gdfs_open


 # gdfs_write
 ## Add data to cache
 ret = state->(entry->file_id, const_cast<char*>(buf), offset, size, node, false);
 entry->write = true;
 # gdfs_release
 ## Get node,entry:
  node = state->get_node(path, uid, gid);
  entry = node->entry;
 
 ## Check access:
 
 ## Upload to Google Drive:
 state->write_file(node); # Call UPLOAD API to upload.  I made some optimizations and used GDFS_UPLOAD_CHUNK_SIZE for shard upload, but the usage is only a simple retry
 entry->write = false;
 entry->file_open = false;

libfuse framework

#4  0x00007fdabc02b818 in fuse_ll_process_buf () from /lib64/.2
#5  0x00007fdabc028013 in fuse_session_loop () from /lib64/.2
#6  0x00007fdabc0200b6 in fuse_loop () from /lib64/.2
#7  0x00007fdabc030a87 in fuse_main_common () from /lib64/.2
#8  0x0000000000410cf4 in initGDFS (rootDir="/tmp/robin", path="/opt/gdfs/", argc=6, argv=0x7ffcc2bca2e0) at :1612
#9  0x0000000000406df8 in mount_::mount_fs () at :142
#10 0x00000000004064fc in main (argc=15, argv=0x7ffcc2bca578) at :172

API certification

class Auth {
  private:
    Request reqObj;
}
class Request {
  private:
    std::string confFile;
    std::string redirectUri;
    std::string clientId;
    std::string clientSecret;
    std::string accessToken;
    std::string refreshToken;
    time_t expiresIn;
}
enum requestType {
  GET,
  POST,
  DELETE,
  UPDATE,
  INSERT,
  DOWNLOAD,
  UPLOAD_SESSION,
  UPLOAD,
  GENERATE_ID,
};

Take gdfs_read as an example

【Qiwei Document】gdfs-FUSE-based GoogleDrive file system-gdfs_read
gdfs - GoogleDrive file system based on FUSE - gdfs_read
read:

#0  GDrive::get_node (this=0x9c1ce0, path="/_opt_gdfs_/haha", uid=0, gid=0, search=false) at :126
#0  gdfs_read (path=0xaf08c0 "/_opt_gdfs_/haha", buf=0x7fdabc84f010 "", size=131072, offset=0, fi=0x7ffcc2bc9e80) at :1310
#1  0x00007fdabc0224e8 in fuse_fs_read_buf () from /lib64/.2
#2  0x00007fdabc0226b6 in fuse_lib_read () from /lib64/.2
#3  0x00007fdabc02b0df in do_read () from /lib64/.2
#4  0x00007fdabc02b818 in fuse_ll_process_buf () from /lib64/.2
#5  0x00007fdabc028013 in fuse_session_loop () from /lib64/.2
#6  0x00007fdabc0200b6 in fuse_loop () from /lib64/.2
#7  0x00007fdabc030a87 in fuse_main_common () from /lib64/.2
#8  0x0000000000410cf4 in initGDFS (rootDir="/tmp/robin", path="/opt/gdfs/", argc=6, argv=0x7ffcc2bca2e0) at :1612
#9  0x0000000000406df8 in mount_::mount_fs () at :142
#10 0x00000000004064fc in main (argc=15, argv=0x7ffcc2bca578) at :172

There are problems

● Multi-client concurrent modification and deletion are not considered, that is, distributed locks are not implemented.
● There is no strong consistency between local clients and Google Drive. It should be final consistency, not strong consistency.

appendix

The original out-of-band auth method has been completely blocked in 2023:
February 28, 2022 - OOB process prohibits new OAuth
September 5, 2022 - The system may display a user-oriented warning message to a non-compliant OAuth request
October 3, 2022 - For OAuth clients created before February 28, 2022, the OOB process is deprecated
January 31, 2023 - All existing clients are blocked (including exempted clients)
/identity/protocols/oauth2/resources/oob-migration?hl=zh-cn