How can I tell if a database and object store are being used by JuiceFS?

As they have been using JuiceFS for longer and longer, some users have created a number of JuiceFS filesystems with a variety of databases and object stores. Some are purely cloud-based, some are purely local, and some are a combination of local and cloud-based. Some of them have files stored in them, while others have been created on an ad hoc basis for testing purposes. Mixing multiple filesystems can be confusing, especially if you create multiple filesystems in the same database instance.

For example, if you create filesystems in both database 0 and database 1 of a Redis instance, one of which is for normal use and the other for testing, when you need to delete the filesystem for testing, you may delete the filesystem for normal use by mistake.

In this post, we will share some daily management tips to solve similar problems, hoping to help more JuiceFS users.

prior knowledge

Before we begin, let's introduce some basic concepts of JuiceFS so that you can better understand the techniques.

As shown in the figure below, JuiceFS uses a technical architecture that stores data and metadata separately, with data stored in the object store and metadata stored in the database.

From a management perspective, this involves the management of both the database and the object store.

Scenario 1: Is the database being used by JuiceFS?

If you're like me and like to experiment with different databases as a metadata engine for JuiceFS, then you may run into a situation where you have a lot of databases, but aren't sure which one is being used by JuiceFS.

There are no more than two ways of judging this situation:

Perform the check with the JuiceFS client;
Perform the check with a database client.

Both methods can be used to determine this, it's just that some databases are better suited for the first method and some databases are better suited for the second method.

Databases suitable for checking with the JuiceFS client

For standalone databases like SQLite3 and Badger, it is more appropriate to check them directly with the JuiceFS client. Since a database corresponds to only one JuiceFS filesystem, as long as the JuiceFS client can connect to the database, it will be able to check whether this database is used by JuiceFS.

For example, I found a file named in a directory on my local computer, and I'm not sure if it's metadata for the JuiceFS file system or a database for another application. At this point, I can use the status subcommand of the JuiceFS client to check:

juicefs status sqlite3://

From the JSON output of the command, which contains the name of the file system, UUID, storage type, object storage, and other information, it can be determined that this database is a metadata engine for the JuiceFS file system.

If no JSON information is output and it says database is not formatted, then this is not the metadata engine used by JuiceFS.

Similar to SQLite3, Badger is a standalone database, but instead of a single file, its database is a directory.

For example, I find a local directory called myjfs, and I can't remember if it's a regular directory or a badger database directory. I can use the status subcommand of the JuiceFS client to check:

juicefs status badger://myjfs

Databases suitable for checking with a database client

While it is perfectly possible to check with the JuiceFS client, in cases where there are a lot of databases, and where you can't remember which databases you have, it is more intuitive and convenient to check the network databases with a database client.

Redis

By default, a Redis instance has 16 databases numbered from 0 to 15. If you don't record the purpose of each database in a memo pad, it will be difficult to tell which database is used by JuiceFS when you need it again over time.

For example, I have the following Redis instance:

classifier for principles, items, clauses, tasks, research projects etc	(be) worth
address	192.168.1.88
ports	6379
cryptographic	password

I'm not sure which databases are used for JuiceFS, or even if any of them are used for JuiceFS. at this point, the easiest thing to do is to use the redis-cli client to connect to the Redis instance and perform the check on each of the databases one-by-one:

# Connect to the Redis instance
redis-cli -h 192.168.1.88 -p 6379 -a password

# Check which databases are storing data
192.168.1.88> info keyspace

With the info keyspace command, you can see that db0, db1, and db3 of the current instance have data stored in them, and then you can run get setting on these databases in turn to check if they are the databases used by JuiceFS.

As shown in the figure, db0 has no information about JuiceFS, which means it is not a database used by JuiceFS. db1 and db3 both contain information about JuiceFS, which means they are databases used by JuiceFS.

Postgres、MySQL、MariaDB

For these three databases, there are many graphical client tools that can be used directly, such as pgAdmin, Adminer, Navicat, etc., to visualize the tables, data, and other information in the database.

I relatively prefer to use Adminer, which is a very lightweight database management tool that can be deployed directly through Docker, accessed through a browser, and supports all three of the databases mentioned above.

Assuming that Docker is already installed on your local computer, you can deploy Adminer with the following command:

docker run -d -p 8080:8080 --name adminer adminer

Once deployed it can be accessed via a browserhttp://localhost:8080 to use Adminer.

Taking the Postgres database as an example, after connecting to the Postgres database, you can see the tables, data, and other information in the database.

In the list of databases, if you are not sure which database is used by JuiceFS, you can click into the databases one by one to view the tables in them.

As shown below, JuiceFS table names usually start with jfs_.

MySQL, MariaDB, and other databases operate similarly, and can be accessed by simply selecting them in the Adminer login screen.

In addition, JuiceFS also supports other databases, such as TiKV, etcd, FoundationDB, etc. The method of checking whether they are used for JuiceFS is similar, so I won't repeat it here.

Scenario 2: Is the object store being used by JuiceFS?

For the JuiceFS file system, the metadata engine records information about all the files, and the object store holds all the actual files. The two complement each other and you can't have one without the other.

From an administrative point of view, as soon as the metadata engine of the JuiceFS file system can be determined, the corresponding object store can be determined.

That is, you can find the object store associated with this file system by simply scanning the appropriate database with the command juicefs status.

Based on the output, you can find the object store it uses in the Bucket section. Of course, if you have multiple accounts on the same cloud platform, it's best to view the Bucket contents one by one from the cloud platform's file manager.

As shown in the figure, JuiceFS creates a folder with the same name as the file system as the root directory in the Bucket root directory. When you open this directory, you can see the file named juicefs_uuid, which is the key to identifying the JuiceFS file system.

In addition, in the root directory of a file system, there are usually two directories, chunks and meta, which store the data blocks and metadata backups of the files, respectively. With these characteristics, you can determine whether a bucket belongs to the JuiceFS file system.

summarize

The above two scenarios introduce how to determine whether the existing database and object storage are being used by JuiceFS or not. After mastering these techniques, I believe readers can better manage the JuiceFS file system and avoid problems such as accidental deletion and misuse.

The article concludes with a few more suggestions when creating a JuiceFS file system:

When using a stand-alone database as a metadata engine, database naming is as meaningful and short as possible. This makes it easy to recognize and easy to use subsequently;
Filesystems, databases and buckets that are expected to be used for a long period of time are recommended to be kept exclusive to JuiceFS and not shared with other applications to avoid potential misuse and usage conflicts;
Defining an easily recognizable name for the file system can help with subsequent management, such as adding the jfs prefix my-jfs, test-jfs, etc. to the name.

If you have any other questions, please feel free to join the JuiceFS WeChat group to ask them, and if you have any other management tips, please feel free to share them with us.