I've been working on this for a while now, but I'm not sure if I'm going to be able to do it. Today I would like to share with you the interview questions organized by V Brother for MongoDB, collect them and they will surely be helpful to you.
1. What do you mean by NoSQL database and what is the direct difference between NoSQL and RDBMS? Why and why not use NoSQL databases? What are some of the advantages of NoSQL databases?
NoSQL ("Not Only SQL") databases are database management systems that are different from traditional relational databases (RDBMS).NoSQL was originally designed to handle structured, semi-structured, and unstructured data on a large scale, providing a more flexible way of storing data. It does not follow the "table-row-column" structure of relational databases, and commonly used data models are key-value, column family, document and graph types.
Differences between NoSQL and RDBMS
-
data structure:
- RDBMS With a table structure, data is organized into rows and columns, and different tables can be related to each other by foreign keys.
- NoSQL Provides a variety of data models such as document, key-value, column family and graph. Data can be semi-structured or unstructured for greater flexibility.
-
data consistency:
- RDBMS Follows ACID characteristics (Atomicity, Consistency, Isolation, Durability) to ensure strong consistency.
- NoSQL More in favor of availability and partition tolerance in the CAP theorem, weak consistency can be tolerated in some systems to improve performance.
-
scalability:
- RDBMS Most support vertical scaling (by increasing hardware performance).
- NoSQL Often supports horizontal scaling (by adding more common servers) and is better suited for large-scale data.
-
search language:
- RDBMS Use the standard SQL query language.
- NoSQL SQL is usually not used and queries are diversified, for example using MongoDB's query syntax or Cassandra's CQL.
Reasons to use NoSQL
-
Suitable for massive data storage: NoSQL can efficiently handle large amounts of data, fast reads and writes, and is suitable for big data scenarios such as social media and the Internet of Things.
-
Supports horizontal expansion: NoSQL databases can be scaled by adding servers, which is less expensive and more suitable for distributed architectures.
-
Flexible data modeling: NoSQL databases support a variety of data models such as document storage, key-value storage, column family storage, graph storage, etc. The structure of data is highly flexible and suitable for application scenarios that require rapid iteration.
-
High Performance and Scalability: For low-latency, high-concurrency requirements, NoSQL tends to perform better than traditional relational databases.
Scenarios where NoSQL is not applicable
-
Strong consistency requirements: If the application requires strong consistency and complex transaction processing (e.g., bank transfers), the ACID attribute of a relational database is more appropriate.
-
complex query: For complex SQL queries (e.g., multi-table associations, complex aggregations) and structured data, RDBMS outperforms NoSQL.
-
Data normalization: When highly standardized data management is required (to avoid data redundancy, etc.), an RDBMS is a better choice.
Benefits of NoSQL
-
Flexible Data Structures: NoSQL does not force a data schema and can store data in as many different formats as needed.
-
Easy to expand: Supports distributed architecture and horizontal scaling to better adapt to cloud computing and big data application scenarios.
-
high performance: Optimized for specific data access patterns, especially for better performance in high read/write scenarios.
-
Adaptation to rapid iteration: During the development process, if the data structure changes frequently, the flexibility of NoSQL can better meet the needs.
2. What are the types of NoSQL databases?
Types of NoSQL databases are usually categorized according to the data model, and there are four main categories:
1. Key-Value Store (KVS)
- specificities: Uses simple key-value pair storage, similar to a dictionary or hash table.
- vantage: Fast querying, scalable, and great for simple read/write operations.
- drawbacks: Only simple query operations are supported, the data model is simple and not suitable for complex queries.
- application scenario: Applies to session management, caching, simple configuration files, etc.
- Representative databases: Redis, Memcached, DynamoDB (Amazon), etc.
2. Document Store
- specificities: Use a document structure similar to JSON or BSON format to store data, each record can have different fields, and the data structure is flexible.
- vantage: Support nested structure, flexible data query, suitable for unstructured and semi-structured data.
- drawbacks: Cross-document query support is limited and not suitable for complex transactions.
- application scenario: Suitable for content management systems, log management, social networking, etc.
- Representative databases: MongoDB, CouchDB, RavenDB, and so on.
3. Column-Family Store
- specificities: Data is stored in columns, each row can contain a different number of columns, column data is stored by column family. Can be used for large-scale data distributed storage.
- vantage: Can efficiently handle large-scale data, supports horizontal scaling, and is fast at querying data for specific column families.
- drawbacks: The data structure is more complex and not suitable for frequent updates and complex queries.
- application scenario: Suitable for time series data, IoT data, data analysis, etc.
- Representative databases: Cassandra, HBase, ScyllaDB, and so on.
4. Graph Database
- specificities: Stores data in a graph structure, including nodes, edges and attributes, suitable for storing complex relationships.
- vantage: Ideal for handling relationally dense data queries, supporting fast graph traversal.
- drawbacks: Data storage is complex and performance may suffer when data is distributed over multiple nodes.
- application scenario: Suitable for social networks, recommender systems, path optimization, and other scenarios that require complex relational queries.
- Representative databases: Neo4j, JanusGraph, TigerGraph and others.
Other types (additional)
- Time-Series Database (TSD): Specialized for storing time series data, such as IoT device data, financial market data. Representatives include InfluxDB, TimescaleDB and so on.
- Object Store database (Object Store): Used to store and manage large amounts of unstructured data such as images, audio, video, etc. Commonly used are Amazon S3, MinIO and so on.
Different NoSQL database types are suitable for different data structures and scenarios, and users can choose the appropriate type according to the application requirements.
3. What are the basic differences between MySQL and MongoDB?
MySQL and MongoDB are two popular database systems, but there are some fundamental differences in their design philosophies and how they handle data:
1. data model
- MySQL: A relational database that uses a table-row-column structure to store data, enforcing a fixed data schema. Data can be related to each other by foreign keys, and the data structure is normalized.
- MongoDB: NoSQL document-based database , using JSON or BSON format documents to store data , each document can have a different structure , data structure flexibility , support for nested structures .
2. search language
- MySQL: Uses standard SQL language, supports complex multi-table joins and transactions, and is suitable for structured data queries.
- MongoDB: Uses its proprietary query language with a syntax similar to JavaScript's object queries. Supports simple queries and aggregations, but has limitations on complex queries across collections.
3. Transaction support
- MySQL: Supports ACID transactions, which ensures strong consistency and is suitable for application scenarios that require strong transaction guarantees, such as financial systems.
- MongoDB: Transaction support is also provided (version 4.0 and above), but the transaction mechanism is newer in distributed environments and may be slightly less performant than MySQL when operating.
4. scalability
- MySQL: Vertical scaling (adding hardware resources to improve performance) has traditionally been favored, although horizontal scaling can also be achieved through sharding, etc., but is relatively complex to implement.
- MongoDB: Native support for horizontal scaling , through the sharding mechanism to easily realize large-scale data distributed storage , scalability is better .
5. data consistency
- MySQL: Strong consistency mode is adopted by default, the data is more reliable and suitable for systems with high consistency requirements.
- MongoDB: Defaults to final consistency mode, suitable for applications with high requirements for high availability and partition tolerance. you can configure the consistency level as needed.
6. Applicable Scenarios
- MySQL: Suitable for structured data storage, application scenarios with strong data consistency needs and complex query requirements, such as banking systems, ERP systems.
- MongoDB: Suitable for scenarios that deal with large-scale, unstructured data with frequent changes in data patterns, such as social media, real-time analytics, and content management systems.
MySQL is more suitable for applications with structured data and strong consistency requirements, while theMongoDB On the other hand, it is suitable for big data application scenarios that are flexible and variable, have large data volumes, and require high scalability.
4. How do you compare MongoDB, CouchDB and CouchBase?
MongoDB, CouchDB, and Couchbase are all common NoSQL databases, and while they all support document storage, there are significant differences in architectural design, performance, scalability, and application scenarios.
1. Data Models and Storage Structures
- MongoDB: Using the BSON format (JSON-like binary storage) to store documents , support for nested structures and rich data types . It adopts a dynamic architecture , suitable for data structures that require frequent changes .
- CouchDB: Stores documents in JSON format with strong structural consistency and support for nested documents.CouchDB focuses on data integrity and uses Multiple Version Control (MVCC) to handle concurrency.
- Couchbase: It supports JSON format storage and combines document database and caching features. It focuses more on high-performance data access and can provide efficient read and write speed.
2. Query Language and Interface
- MongoDB: Provides its own query language and rich query capabilities with JavaScript-like syntax, support for complex queries, aggregation frameworks and multi-field indexes, and support for MapReduce.
- CouchDB: MapReduce is used as the query engine, which is designed for simple queries and has relatively limited capability for complex queries. Queries require JavaScript code and have weak aggregation capabilities.
- Couchbase: Provides N1QL query language, similar to SQL, which allows complex queries while retaining the flexibility of NoSQL. It supports advanced features such as full-text search and aggregate queries.
3. Data consistency and synchronization mechanisms
- MongoDB: Provides eventual consistency by default and supports single-document transactions (multi-document transactions are supported in versions 4.0 and above). The mechanism of slicing data helps to achieve high scalability, but affects strong consistency.
- CouchDB: Emphasis on eventual consistency , the design focuses more on multi-node synchronization , suitable for distributed , multi-device data synchronization scenarios . Supports multi-master replication and conflict resolution.
- CouchbaseCouchbase: Provides strong consistency and supports ACID transactions on a high-performance basis for applications with high consistency requirements.Couchbase integrates a caching layer to ensure data consistency and access speed.
4. Scalability and Distributed Support
- MongoDB: Horizontal scaling is natively supported and large-scale data can be managed through sharding. Its replication and sharding mechanisms allow for greater scalability.
- CouchDB: It is more suitable for multi-location distributed scenarios, supports multi-master replication, and has better data synchronization and conflict resolution mechanisms.
- Couchbase: Focus on horizontal scaling, using a distributed architecture of storage layer and cache layer separation, suitable for high concurrency, high throughput application scenarios.
5. Performance and Application Scenarios
- MongoDB: Suitable for application scenarios that are read-write intensive and require complex queries, such as social networks, real-time analytics, content management systems, etc.
- CouchDB: Suitable for distributed, multi-end data synchronization scenarios, such as mobile applications, IoT, etc. Due to its data synchronization characteristics, it is suitable for scenarios with high tolerance for network conditions and data offline.
- Couchbase: Outstanding performance for high-concurrency, low-latency scenarios such as online gaming, e-commerce, real-time ad recommendations, and other applications that require high-performance data access.
6. Comparison of advantages and disadvantages
comprehensive database | vantage | drawbacks |
---|---|---|
MongoDB | Highly scalable, flexible queries, rich community support | Performance under high concurrency may be limited by the locking mechanism, and implementation is more complicated when strong consistency is required. |
CouchDB | Powerful multi-location synchronization and multi-master replication mechanisms for easy offline access and data synchronization | Limited query complexity and relatively slow access to data |
Couchbase | High performance, low latency, combines caching and persistence, ACID support | High resource consumption and relatively complex deployment and management |
summarize
- MongoDB Suitable for flexible data models and read- and write-intensive applications, it excels at handling large-scale unstructured data.
- CouchDB Specializes in offline synchronization for multi-device or distributed applications, suitable for application scenarios that require data synchronization and conflict resolution.
- Couchbase Combines the advantages of caching and document storage for scenarios with high concurrency and low latency requirements.
5. What makes MongoDB the best NoSQL database?
MongoDB is considered to be one of the best NoSQL databases mainly because of its flexibility, high performance, and excellent performance in big data scenarios. Here are a few key reasons why MongoDB is a top NoSQL database:
1. Flexible data modeling
- Dynamic Architecture: MongoDB uses the BSON format to store data, which supports a flexible document structure and does not require a predefined schema. This dynamic architecture allows MongoDB to change the data structure at any time, adapting to scenarios where requirements change frequently.
- nested data structure: Support for nested documents and array structures allows for a more natural representation of complex objects and relationships, reducing the need for table associations.
2. Rich query function
- Flexible query language: MongoDB's query language supports a variety of query conditions, projections, sorting, paging, and other features that enable rich query operations.
- aggregation framework: MongoDB provides a powerful aggregation framework that supports complex aggregation operations and can efficiently handle data summarization, filtering and transformation tasks.
- full text search: Built-in full-text search function, can quickly complete the text search task, which is very useful for some search applications.
3. High Performance and Scalability
- built-in slice: MongoDB natively supports horizontal scaling, data can be sliced and stored on multiple nodes, through the sharding strategy can easily manage large amounts of data, and the sharding mechanism is relatively simple.
- Automatic load balancing: Distributed clusters support automatic load balancing, effectively distributing data and load to avoid hot node problems.
- multicopy setHigh availability and disaster recovery is achieved through Replica Sets, which ensure that data remains available in the event of a hardware failure.
4. Wide range of scenario adaptations
- Suitable for a wide range of scenarios: MongoDB can handle massive amounts of data and is suitable for most big data and real-time application scenarios, such as content management systems (CMS), social networks, real-time data analytics, IoT data, and more.
- distributed architecture: Supports distributed database architecture, ideal for modern distributed application scenarios such as cloud applications and global deployments.
5. Community support and widespread use
- Open source with an active community: MongoDB is open source, has a globally active developer community, and is rich in resources to help developers get up to speed and solve problems faster.
- Business Support: MongoDB, Inc. offers a commercial version, MongoDB Atlas, to support automated, manageable, and scalable cloud database services.
6. Transaction and consistency support
- Transaction support: Starting from version 4.0, MongoDB supports multi-document transactions, which further improves its adaptability in complex application scenarios, especially financial, order processing and other systems that require transaction support.
- Configurable consistency levels: Support for different levels of read consistency gives the flexibility to choose between performance and consistency, making MongoDB more comprehensive in CAP theory.
7. Multi-language driver support
- MongoDB provides multiple language drivers, including Python, Java, C#, PHP, etc. Almost all major programming languages can seamlessly use MongoDB, suitable for a variety of development needs.
summarize
What makes MongoDB a great NoSQL database is its flexible data model, high scalability, excellent querying capabilities, and extensive support and adaptability. Advantages in multiple data models, multi-language drivers, and automated deployment make MongoDB the first choice for many developers and organizations.
6. What are the nuances of MongoDB on 32-bit systems?
There are some limitations to using MongoDB on 32-bit systems, mainly due to memory addressing limitations on 32-bit systems. Here are the main differences and limitations of MongoDB on 32-bit systems:
1. Data Storage Size Limit
- Maximum storage size: On a 32-bit system, the storage size of each MongoDB database (including data and indexes) is limited to about2GBThis is mainly due to the limited memory addressing space on 32-bit systems. This is mainly because 32-bit systems have limited memory addressing space, and MongoDB can't fully utilize more memory to manage larger data sizes.
- Storage Engine Limitations: On 32-bit systems, MongoDB only supports the MMAPv1 storage engine and not the more modern WiredTiger engine, further limiting performance and functionality.
2. Performance Limits
- memory limit: Since 32-bit systems have about 4GB of memory addressing space, MongoDB can only use less than 4GB of memory, and the actual available memory is usually less, which does not fully utilize caching and memory mapping, and may result in slower data access.
- Data Read/Write Restrictions: As the amount of data approaches the 2GB limit, MongoDB's performance may degrade significantly, slowing down data writes and potentially causing service instability.
3. Not recommended for production environments
- Easy to reach the upper limit: Due to storage limitations and performance bottlenecks, MongoDB is not officially recommended for production deployments on 32-bit systems. 32-bit environments are better suited for small development or testing environments than for applications that need to handle large amounts of data.
4. Version Support Limitations
- The new version of MongoDB no longer supports 32-bit systems.: Starting with MongoDB version 3.2, MongoDB ceased official support for 32-bit systems. Newer versions of MongoDB only run on 64-bit systems, which further reduces the use of MongoDB on 32-bit systems.
summarize
The use of MongoDB on 32-bit systems is limited by storage size, memory, and performance, so it is only suitable for small, non-production environments. For larger or more demanding applications, a 64-bit system is recommended to fully utilize MongoDB's performance and scalability.
7. Is there a problem with journal replay when entries are incomplete (e.g., if one fails in the middle)?
In MongoDB, if thejournal
Entries are incomplete due to a mid-write failure, and MongoDB's recovery mechanism handles this situation. Specifically, MongoDB'sjournal
AdoptedSequential Write and Write-Ahead Logging (WAL, Write-Ahead Logging)technique and is idempotent, thus effectively addressing the problem of incomplete entries.
Recovery mechanisms and treatment
-
Sequential write and pre-write logging: MongoDB's
journal
The entries are written sequentially to disk, which means that it will ensure that the operation is logged to thejournal
in the file. This sequentiality ensures that the recovery process can proceed in an organized manner even in the event of a failure. -
Idempotence and entry checking: MongoDB by checking the
journal
Entry completeness to avoid playing back incomplete entries. Eachjournal
Entries contain a checksum, which the recovery process verifies entry by entry, and if it encounters an incomplete entry or an entry with a mismatched checksum, it skips the entry to avoid incorrect playback. -
Transaction-level consistency: MongoDB maintains data consistency by playing back the last complete transaction log entry during recovery and excluding incomplete transaction entries.
Typical Processes
- At system startup, MongoDB will check the
journal
entries in the file. - If it detects an incomplete entry due to a mid-course failure, MongoDB automatically skips the incomplete entry and plays back only the contents of the complete entry.
- Through this mechanism, even in the event of a failure or power outage, MongoDB is able to ensure the security and consistency of the data.
summarize
As a result, MongoDB has been able to use thejournal
There is no problem of data corruption when entries are incomplete. Its recovery mechanism ensures that incomplete entries are skipped, thus maintaining data consistency and reliability.
8. What is the role of analyzers in MongoDB?
In MongoDB, theAnalyzermainly used forFull-text indexing and full-text searching. Its role is to process and optimize text data so that MongoDB can perform text search queries more efficiently and accurately.
Core Functions of the Analyzer
-
text segmentation: Split input text into words or phrases. For example, breaking sentences into individual words for word-level indexing and searching. This is especially important for multi-word matching or keyword extraction.
-
Stemming: Reduces words to their root forms. For example, "running" and "ran" will be reduced to the root word "run", allowing searches to include results for word changes.
-
disjunction: Common stop words (e.g., "the", "is", "at", etc.) are automatically removed because they usually do not affect the core semantics of the search. Removing stop words reduces unnecessary matches and improves search precision.
-
character regularization: Converting different character formats (e.g., case conversion) to unify the processing of text data ensures that words with different formats, such as case, can be matched successfully.
-
Language Support: MongoDB supports analyzers in multiple languages to accommodate the text processing needs of different languages. Different languages have their own participle , stemming and deactivation thesaurus to ensure the accuracy of the analysis .
Analyzer in MongoDB
Using Full-Text Search in MongoDBtext
indexes, the parser is not as effective in creating and querying thetext
The indexing comes into play when indexing, specifically in the following scenarios:
-
Create full-text indexes: When a field is created for a
text
For indexing, the parser preprocesses the text data, breaks down the words and generates index entries. -
Execute text queries: In the implementation of
text
When a query is made, the analyzer performs the same process on the query keywords to ensure that the search results match the same root word or phrase.
typical example
For example, let's say we have a document containing the text "Running is fun" and have created fields for thetext
Indexing. When querying, the parser reduces "running" to "run" to ensure that the query for "run" also matches the morphology of " This ensures that the query for "run" also matches the word change "running".
summarize
In MongoDB, the role of parsers is to optimize text processing and indexing to improve the efficiency and accuracy of text search. Parsers make MongoDB's full-text search capabilities more intelligent and semantic through disambiguation, stemming, stop-word removal, and character regularization.
9. What is namespace?
In MongoDB, theNamespacemeanCombination of database name and collection namethat is used to uniquely identify a collection or index in the database. Namespaces are defined internally in MongoDB through theDatabase name. Collection nameformat to represent it. For example, if there is a file namedstudents
The set of sets in theschool
database, its namespace is the。
The role of namespaces
-
Uniquely identifies a collection or index: The namespace ensures that a collection or index is unique throughout the database by combining the database name and collection name, avoiding name conflicts between different databases or collections.
-
Internal storage management: MongoDB manages data in collections and indexes through namespaces in the background. For example, MongoDB uses different namespaces to distinguish collections from their corresponding indexes, and each index will have a unique namespace for storage and retrieval.
-
Distinguish between data and metadata: System collections in MongoDB (such as the
) also differentiate their metadata content through namespaces, helping MongoDB manage data and indexes more efficiently.
Length limit for namespaces
In MongoDB, namespaces are finite in length, usually limited to120 characters(limits vary slightly from version to version). This is mainly because MongoDB wants to reserve storage for namespaces and ensure performance.
give an example
Suppose we have ainventory
The set is located atstore
In the database, then:
- set (mathematics)
inventory
The namespace of the。
- If we are in
inventory
Create an index on the collectionitem_id
, then the namespace of this index might be.$item_id
。
summarize
Namespaces are used in MongoDB to uniquely identify collections and indexes in the database, ensuring the uniqueness of collection and index names and helping to effectively manage and organize data within MongoDB.
10. If a user removes an attribute of an object, is the attribute deleted from the storage tier?
Yes, if a user removes a property of an object in MongoDB and saves the change back to the database, the property is removed from the storage tierphysical deletionThat is, the attribute and its value will no longer be stored in the document in MongoDB. That is, the property and its value will no longer be stored in the document in MongoDB.
Specific operation process
-
Remove Attributes: When a user deletes an attribute (field) of a MongoDB document object in an application, for example, through the
$unset
operator or remove it from the object. -
Updating the database: Changes to the delete attribute need to be committed to MongoDB via an update operation. for example, you can use the
$unset
The update operation explicitly deletes a field or accomplishes this by updating the entire document object. -
Changes in the storage layer: Once the update operation is successful, MongoDB physically removes the field from the storage tier, meaning that the field no longer occupies storage space in the data file.
typical example
Suppose there is a document as follows:
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York" }
If you execute the following command to deletecity
Fields:
({ "_id": 1 }, { $unset: { "city": "" } })
After performing this operation, thecity
The field will be removed from storage and the document becomes:
{ "_id": 1, "name": "Alice", "age": 25 }
caveat
- Physical deletion of non-null attributes: Undefined fields in MongoDB do not take up storage space, so deleted documents take up less storage.
- Modal flexibility: MongoDB is schema-less and deleting fields does not raise structural exceptions, so field deletion is more flexible in MongoDB.
summarize
In MongoDB, after deleting a field attribute in a document, if the change is committed to the database, the field is physically deleted from the storage tier and is not retained in the datastore.
11. Can I use logging features for secure backups?
Secure backups using logging features are possible, especially when it comes to thetransaction log(e.g. MongoDB'sjournal
logs) when this approach is critical to ensure data consistency, resilience, and failure recovery.
In MongoDB, the role of logging is primarily to ensure data persistence and consistency. Log features are not only used for playback of storage operations, but also help in data recovery after a failure. Here are some key points on how to utilize log features for secure backups:
1. MongoDB's Logging Mechanism (Journal)
MongoDB UsageWrite-Ahead Logging (WAL)The log file is often referred to as thejournal. Before each write operation (e.g., insert, update, delete) is persisted to the database, these operations are first recorded in the journal. This mechanism ensures that:
- data consistency: Even in the event of a sudden power outage or crash, MongoDB can be restored to its last consistent state via a journal file.
- incremental backup: By using a journal file, you can implement an incremental backup that saves only the changes since the last backup. This is more efficient than a full backup, especially if the data is large.
2. How to use logging features for secure backups
- Enabling Persistent Logging: First, you need to make sure that MongoDB's journal feature is enabled so that all data writes are logged to the journal. By default, MongoDB refreshes the journal file after each write operation.
- Backup log files: While performing a full backup, you can periodically back up the journal file. In this way, you can capture changes to the data since the last backup and ensure that the data can be recovered even if there is a failure during the backup.
- Log playback: When restoring data, if an incremental backup (including the journal file) is used, you can play back the journal entries in the journal and restore the backup to the last consistent state. This means that point-in-time (PIT) recovery, i.e. restoring to the state of the data at a specific point in time, is possible.
3. Using MongoDB'soplog
Backup
In a distributed MongoDB cluster (especially a replica set), you can use theoplog(operation log) for secure backups. oplog is a ring log in the MongoDB replica set that records all write operations. By backing up and analyzing oplog, you can:
- incremental backup: Back up the change log in oplog to maintain consistency with the master database.
- Point-in-time recovery: Data can be restored to a specific point in time by restoring it from a backup and playing back the oplog logs.
4. Use of backup tools in conjunction with logs
MongoDB provides a variety of backup tools, such as themongodump
cap (a poem)mongorestore
and for distributed environmentsmongodump
Incremental Backup feature. With these tools, you can:
- Regular backups: Make regular full backups, along with journal files or oplogs.
- Recovery mechanisms: Ensure data consistency when recovering by utilizing a backed up journal file or oplog recovery operation.
5. Security and encryption of logs
For added security, log files (including journal and oplog) should be stored encrypted to ensure that data is not compromised or tampered with during backups and restores.MongoDB supports data encryption, which can be accessed via thecryptographic storage engine maybeFile System Encryption Protecting data.
summarize
By using logging features such as journal or oplog, MongoDB enables effective and secure backups. Not only does journaling help enable incremental backups and restores, but it also ensures that data can be restored to a consistent state even in the event of a failure. Backups that utilize these journal features are a key component of a database security strategy.
12. Is the null value null allowed?
In MongoDB, theNull values are allowed (null
)Yes, MongoDB does not have strict constraints on field values unless you explicitly set some limits.null
is a legal data type that can exist as the value of a field in a document. The following is a brief description of thenull
Some key points to deal with:
1. null
Permissibility of values
- MongoDB allows the value of a field to be set to the
null
, which means that the field can store null values. - For example, in the following document, the
age
field is set tonull
:{ "_id": 1, "name": "Alice", "age": null }
2. Differences from other data
-
null
vs. non-existent: In MongoDB, a field'snull
Values and FieldsNot at all.(not defined) is different. A field exists but its value isnull
, indicates that the value of the field is explicitly set to null, while the field being completely absent indicates that no value has been provided for the field. - Example:
- Document 1:
{ "_id": 1, "name": "Alice", "age": null }
——age
field exists and the value isnull
。 - Document 2:
{ "_id": 2, "name": "Bob" }
——age
Field does not exist.
- Document 1:
3. consult (a document etc)null
(be) worth
- When querying, you can use the
null
to find a field with a value ofnull
of the document. Example:
This will look for({ "age": null })
age
The value of the field isnull
or a completely non-existent document.
4. The null value is the same as theundefined
exclusionary rule
-
null
: is an unambiguous null value, meaning "no value". -
undefined
: Indicates that a field is not defined. This can be done at query time with the{$exists: false}
to find out.- For example, to query a document whose fields do not exist:
({ "age": { $exists: false } })
- For example, to query a document whose fields do not exist:
5. treat (sb a certain way)null
implication
-
Storage and Indexing:
null
value of the field can be indexed, and the indexing willnull
treated as a valid value. -
performances: Set the field value to
null
There is usually no significant impact on performance unless a large number of queries or updates are made.
6. fields of thenull
Allowed with Schema Design
-
MongoDB is schema-less: This means that even if you don't explicitly set constraints on a field, MongoDB still allows storing the
null
Value. - If you use theMongoose(a popular ODM library for MongoDB), you can set whether fields are allowed as
null
。const userSchema = new ({ age: { type: Number, required: false, default: null } });
- this means
age
The field can benull
and it is optional.
summarize
In MongoDB, fields are allowed to be empty (i.e., thenull
)。null
is a valid data type that indicates that the value of a field is null, but it is not the same as the field not existing at all.null
Values can be queried, indexed, and the impact on performance is usually minimal. If tighter control is needed, you can set at the application level or through a tool such as Mongoose whether fields are allowed to benull
。
13. Update operations immediately fsync to disk?
In MongoDB, update operations are not always immediately passed through thefsync
(file system synchronization) writes to disk. The specifics of whether or not to usefsync
Depends on several factors, including MongoDB's write mode,write concern
Settings andjournal
The use of.
1. Default behavior
-
Default behavior of update operations: MongoDB's default write operations (e.g., updates) typically write the data in memory first, and then write it in the background via theWrite-Ahead Log (WAL)(i.e.)
journal
) mechanism ensures data persistence. -
fsync
Will not be implemented immediately: By default, the update operation itself does not immediately call thefsync
To synchronize data to disk, the data is first written to memory and thejournal
Documentation.fsync
Primarily used to ensure persistence at the file system level.
2. fsync
role of
-
fsync
: is a file system level operation that ensures that all data in the file buffer is forced to be written to disk.fsync
will ensure that data is persisted at the file system level, not just in MongoDB's in-memory orjournal
in the cache. - In MongoDB, if you want to force a
fsync
operation, you can use the following command:
This locks the database and performs a file system synchronization, ensuring that all data is written to disk.()
3. Write Concern
-
write concern
is a setting for MongoDB write operations that defines the number of write copies a write operation needs to guarantee before it is confirmed. Depending on thewrite concern
configuration, MongoDB may wait for multiple nodes to acknowledge writes or even for data to persist to disk. -
write concern
set up:-
w: 1
: Acknowledgement can be returned after the master node write operation is complete, and write persistence to disk is not required. -
w: "majority"
: Requires most replica nodes to acknowledge writes, possibly including disk synchronization. -
j: true
: It is required to ensure that after the write operation is completed, the log (journal
) has been synchronized to disk. If set toj: true
, MongoDB forces the operation to be written to disk, but this is not performed every time an update operation is performed.
-
4. Persistence guarantees: Journal andfsync
- If the
journal
If MongoDB writes the operation first, MongoDB writes the operation to thejournal
file, and thejournal
will write the operation to disk asynchronously. This means that even if the operation is not performed immediatelyfsync
It can also be used in the event of a MongoDB crash via thejournal
Recover data. - When setting the
write concern
hit the nail on the headj: true
When it does, MongoDB waits for the write operation to be flushed to thejournal
(i.e., disk) before confirming that the operation has completed successfully.
5. Performance considerations
- executable
fsync
This affects performance because it requires all data to be written to disk, which is a more expensive operation.MongoDB does not by default execute thefsync
Instead, it's throughjournal
to ensure data consistency and persistence, which balances performance and data security. - If there is a need to ensure that every operation is synchronized to disk, this may impact performance and is usually only enabled in very critical scenarios.
summarize
The update operation is not called immediatelyfsync
Instead, it is usually done throughjournal
cap (a poem)memory buffer to ensure data persistence. If you need to enforce thefsync
The program can be adjusted by adjusting thewrite concern
Settings (e.g.j: true
) to ensure that write operations are synchronized to disk.fsync
use often has an impact on performance, so MongoDB employs a more efficient write mechanism to balance data persistence with performance.
14. How are transactions/locking performed?
In MongoDB, the(political, economic etc) affairscap (a poem)lockare two mechanisms used to ensure consistency across multiple operations in a distributed database. The following is a detailed description of how to perform transactions and locking in MongoDB:
1. Transactions
MongoDB has supported multi-document transactions since version 4.0, making it possible to perform atomic operations on multiple documents. Transactions allow you to perform a set of operations that either all succeed or all fail, ensuring data consistency.
Basic concepts of transactions:
- atomicity: All operations in a transaction either succeed or fail altogether.
- consistency: The database state is valid at the beginning of the transaction and remains valid after the transaction ends.
- isolation: The operation of a transaction is not visible to other operations until it completes.
- durability: Once a transaction is committed, the data is stored permanently.
How to use transactions in MongoDB:
-
Open session:
Before executing a transaction, you first need to create a session. A session is the basis of a transaction, and multiple operations can be bound to the same session to form a transaction.const session = ();
-
Open a transaction:
Use a session to start a transaction. and MongoDB tracks multiple operations within the transaction until you commit or roll back the transaction.();
-
executable operation:
A series of operations (e.g., inserts, updates, deletes) are performed in a transaction. All of these operations must be performed through a session.try { ('users').updateOne({ _id: 1 }, { $set: { name: "Alice" } }, { session }); ('orders').insertOne({ user_id: 1, item: "Laptop" }, { session }); } catch (error) { ("Error executing transaction:", error); (); }
-
Commit or rollback transactions:
- Submission of transactions: If all operations complete successfully, the transaction can be committed.
- Rolling back transactions: If you encounter an error, you can roll back the transaction and undo all operations.
(); // Commit the transaction // or (); // Rollback the transaction
-
end of session:
Remember to end the session when the transaction is complete.();
Example:
const session = ();
try {
();
// Perform multiple operations in a transaction
('users').updateOne({ _id: 1 }, { $set: { name: "Bob" } }, { session });
('orders').insertOne({ user_id: 1, item: "Smartphone" }, { session });
// Submit the transaction
(); }
} catch (error) {
// If an error occurs, roll back the transaction
(); } catch (error) { // If something goes wrong, roll back the transaction.
} finally {
(); } finally {
}
Limitations of the transaction:
- Transactions in a sharded cluster: MongoDB supports transactions across multiple shards, but executing transactions in a sharded environment can introduce performance overhead.
- Performance Impact: Transactions add some performance overhead, so the need to use them should be weighed against the specific application scenario.
2. Locks
MongoDB provides different levels of locks to ensure data consistency, especially in the case of concurrent access. Although MongoDB uses locks to ensure data consistency and security, it is a highly concurrent database and does not lock all operations like a traditional RDBMS.
The type of lock:
-
global lock:
Early MongoDB used global locks, which meant that only one operation could be executing at a given moment. However, this approach was less efficient, and as MongoDB evolved, the granularity of locks was refined. -
Database level locks:
MongoDB may use database-level locks for certain operations, such as database backups. -
Set Level Lock:
In most of MongoDB's operations, the granularity of locks has been refined to the level ofSet level. This means that different collections in the same database can perform read and write operations simultaneously without interfering with each other. -
document level lock:
Starting with MongoDB version 3.0, MongoDB uses thedocument level lockThis means that only operations on the same document are locked and other documents can be accessed in parallel. This greatly improves concurrency performance. -
Write Lock:
Write operations acquire write locks to ensure that no other operation can modify the data while one operation is in progress.
Example: Explicit Locking
Although MongoDB's locking mechanism is managed automatically, there may be situations where you need to explicitly control locks or ensure the atomicity of operations. In transactions, MongoDB handles locks automatically without requiring the user to explicitly add locks.
3. Collection-level locking (writeConcern)
MongoDB provideswriteConcern
parameter that controls the response requirements to a write operation. The response requirements for write operations are controlled by setting thew
parameter is 1 ormajority
This ensures that the write operation is acknowledged before it completes, which is equivalent to "locking" the write operation to some extent.
For example, using thewriteConcern
It is possible to ensure that a write operation is not considered successful until a majority of copies have acknowledged it, which indirectly has the effect of adding locks:
('users').updateOne({ _id: 1 }, { $set: { name: "Alice" } }, { writeConcern: { w: "majority" } });
summarize
-
(political, economic etc) affairs: MongoDB supports multi-document transactions to ensure the atomicity and consistency of a set of operations. Use the
session
Starts a transaction and commits or rolls back the transaction when the operation is complete. -
lock: MongoDB automatically uses locking mechanisms (e.g., collection-level locks, document-level locks) to ensure data consistency. Although MongoDB's locking granularity is small, there are some operations where you can explicitly use the
writeConcern
or manage data consistency through transactions.
Transactions and locking are the two main ways that MongoDB provides to ensure data consistency and concurrent operations, and you can choose the appropriate strategy based on your application scenario.
15. Why is my data file so large?
There can be a variety of reasons why data files become very large in MongoDB, usually related to factors such as how the data is stored, updated, index management, and space reclamation. Here are some common reasons and possible solutions:
1. Space not reclaimed after document deletion
- MongoDB UsageWiredTiger Storage EngineIt marks the data as deleted when the document is deleted, but does not reclaim disk space immediately. The deleted data still occupies space until the document is compressed by background operations.
-
prescription: Regular use
compact
command to compress the collection, or to increase space reclamation in the database during operations:
Note, however, that compression operations may cause performance degradation and should therefore be performed during low peak hours.();
2. Update operations do not compress storage space
- When updating a document, if the document becomes larger, MongoDB allocates space in the file for the new data and does not automatically reclaim the old space. This leads to space fragmentation, especially for large document updates.
-
prescription: If the data is updated frequently and the size of the document changes significantly after the update, it is recommended to perform periodic
compact
operations, or consider compressing the storage.
3. Indexes take up a lot of space
- Indexes in MongoDB take up storage space. In some cases, too many indexes or unnecessary indexes can cause data files to bloat.
-
prescription: Check for unnecessary indexes in the database and remove them. You can use the following command to view all current indexes:
Deletes an index that is no longer in use:();
("index_name");
4. Frequent insertion and deletion operations
- If you have a large number of insertion and deletion operations in your application without an effective space management strategy, MongoDB's data files can become very large.
-
prescription: Regular implementation
()
to reclaim unused space. This operation reorganizes the database files and compresses them, but it can also cause performance problems and require downtime for maintenance.
5. Inconsistent document size
- In MongoDB, document sizes can vary widely. For example, if a document is inserted and then updated frequently, and the size of the fields varies widely from one update to the next, MongoDB creates a lot of space fragmentation on disk.
- prescription: Optimize document structure to avoid documents becoming too large, or reduce document size fluctuations with appropriate update strategies.
6. WiredTiger Cache
- When MongoDB uses the WiredTiger storage engine, it allocates a certain amount of in-memory cache to optimize performance. This cached data may remain in the file for some time, resulting in a temporary expansion of the file size.
-
prescription: If you are using the WiredTiger storage engine, add the
configuration item to limit the maximum size of the cache. You can manage the balance between memory and disk space by adjusting this parameter.
- The cache size can be viewed with the following command:
().
7. Low data fill rate
- MongoDB allocates a fixed size of space when inserting data, and may waste space by not filling it completely during the space allocation process for the data.
- prescription: Ensure even distribution of data and avoid space wastage on certain nodes through a reasonable sharding strategy or data distribution strategy.
8. The database is not organized
- If the database in MongoDB does not perform any maintenance operations for a long time, the storage files may become very large. Operations including document deletion, updates, etc. can lead to poor utilization of data file space.
-
prescription: Perform regular database maintenance tasks such as running
()
or compressed collections to help organize disk space.
9. Copy set members and write operations
- If you are using a Replica Set, each replica set member needs to store the full data set. If data compression is not properly configured or optimization operations are not performed on a regular basis, the replica set's data files can balloon.
- prescription: Ensure that replica set members have adequate hardware resources and that data compression or space reclamation operations are performed periodically.
10. Fragmentation issues
- Storage engines (especially WiredTiger) may create fragmentation in data files, especially after deletion of documents or large-scale updates, where space in the file is not reclaimed, resulting in an enlarged file.
-
prescription: This can be achieved through regular implementation of
compact
command to defragment and reclaim space.
summarize
Excessively large data files are usually associated with the following factors:
- Deleted or updated spaces are not reclaimed in a timely manner.
- There are too many indexes or invalid indexes in the database.
- Inconsistent document sizes, frequent update operations and drastic changes in document size.
- The storage engine is not configured properly, leading to caching and fragmentation.
- The database is not regularly maintained and compressed for operations.
Solving the problem of large data files usually requires a combination of approaches: performing compression on a regular basis, cleaning up useless indexes, optimizing the document structure, and configuring storage engine parameters appropriately.
16. How long does it take to enable backup failback?
(computing) enable (a feature)Backup Failure Recovery The time depends on a number of factors, including the size of the database, the backup strategy, the backup tools and methods used, and the hardware and network environment of the system.Failure recovery from a backup of MongoDB involves data backup, backup storage, and the recovery process, and the following are some of the key factors and general steps to help estimate the time:
1. Backup Strategy
MongoDB supports a variety of backup strategies, including:
- Full Backup: Backup of the entire database, including all data and configuration. Ideal for smaller databases or scenarios where a full recovery is required.
- incremental backup: Backs up only data that has changed since the last backup. Ideal for large databases to reduce backup time and storage requirements.
- Copy set backup: In a replica set environment, backups can be made from any replica set member. It is common practice to back up from a secondary node (secondary) to avoid impacting the performance of the primary node.
2. Backup Tools
MongoDB provides a variety of backup tools:
-
mongodump
: This is the command line tool provided by MongoDB to create full backups. -
Mongosnapshot
: Tools for cloud backup services. - file system snapshot: Backups are performed using the operating system or a snapshot service from a cloud provider (e.g. AWS, Google Cloud). This approach is very fast, but requires the system to support fast snapshots.
- Ops Manager / Cloud Manager: MongoDB provides an enterprise-class backup solution that supports automatic, incremental, and periodic backups.
3. Backup time estimation
The time to enable backups varies depending on the following factors:
- Database size: The storage size of the database directly affects the time required for backups. Larger data sets usually take longer to complete a backup.
-
Backup method: Use
mongodump
Performing a full backup may be slower than a filesystem snapshot, but a filesystem snapshot usually takes only a few minutes, while amongodump
It may take a couple minutes or more. - incremental backup: Incremental backup is faster because it only backs up changes since the last backup, so it is also faster to restore.
- Storage Performance: The speed of backup to disk is closely related to the read and write performance of the storage hardware. For example, SSDs are usually faster than traditional hard disks.
- Backup copy set members: Backing up from the secondary node of a replica set avoids impacting the performance of the master node.
In an optimal environment, for several gigabytes of data, theFull BackupIt can take 10 to 30 minutes. For larger databases (e.g., terabytes), a full backup can take several hours, especially when using themongodump
When performing a backup.
4. Failure recovery time
Recovery time is affected by the following factors:
- Backup availability: If the backup is stored in a remote location (e.g. cloud storage), the recovery time will be limited by network bandwidth.
- Type of recovery: Recovering the entire database is not the same as recovering a specific collection, and recovering a specific collection may be much faster.
- incremental recovery: If you use incremental backups, the recovery process may be more complicated, but it is usually more efficient because it only needs to restore changed data.
- hardware performance: Recovery operations are dependent on hardware performance, especially when recovering large databases.
- Other operations during recovery: e.g. data integrity validation, index rebuilding, etc., may increase recovery time.
Estimated recovery time: Recovering aNumber GB of the database, it can usually take between a few minutes and 30 minutes, depending on the size of the backup and the method of recovery. While forTB level of the database, the recovery process can take hours.
5. High Availability Configuration
If MongoDB is configured as a replica set and automatic failover is enabled, the system can automatically switch to another member of the replica set in the event of a failure, minimizing downtime. At this point, the backup and recovery process does not affect application availability.
6. Complexity of recovery
- Restore from full backup: It takes longer, but the process is relatively simple.
- Recovering from an Incremental Backup: Recovery is faster, but you may need to restore multiple incremental backups based on point-in-time.
- Cross-Data Center Recovery: If the backup is stored in a remote location, the recovery time will be affected by network bandwidth and latency.
summarize
The time to enable backup failback depends primarily on the following factors:
- The size of the database and the backup method (full or incremental).
- Storage performance and network bandwidth.
- The backup tools used and the level of automation.
- Whether the database is configured with replica sets and high availability mechanisms.
Generally, enabling backup failover does not take much time, but if it is the first time performing a backup or if the data set is very large, it may take longer to complete the initial backup process. The recovery time also depends on the size of the data and the complexity of the recovery and usually ranges from a few minutes to a few hours.
17. what is master or primary?
In database systems, especially indistributed database cap (a poem)Replica Set Middle.master cap (a poem)primary It is one of the two terms used to refer tomaster node terms, they are often used to describe the nodes in a cluster that are responsible for major write and read/write operations.
1. Master / Primary Role
- Master(in some databases) andPrimary(in databases such as MongoDB) is the primary node in a database cluster that handles all write operations (e.g. inserts, updates, deletes). The state of this node determines the ultimate consistency of the data.
- Primary A node is the only node in a database system that can receive write operations. It is usually associated with other replica nodes (such asSecondary) Relatively, these replica nodes replicate the data on the Primary node and provide backup data for query operations.
2. In MongoDB
In the MongoDB replica set, thePrimary node is the only node allowed to perform write operations. The other nodes (calledSecondary node) then replicates the data from the Primary node to ensure data consistency and availability.
- Primary node: All write operations occur on the Primary node. This node handles write requests from clients and stores the data.
- Secondary node: Secondary nodes replicate data asynchronously from Primary nodes. These nodes provide read-only access and can take over in the event that the Primary node fails (becoming the new Primary node through a failover mechanism).
Primary node selection and failover: In MongoDB, if the current primary node fails, the replica will elect a new primary node through an election mechanism to ensure high availability of the database.
3. Role of Master / Primary
- write operation: In most databases, the Master (or Primary) node is the only node allowed to accept write requests. All writes to data are centralized on this node.
- read-write separation: By distributing read requests to the Secondary node of a replica set, the database can reduce the burden on the Primary node and improve query throughput. This strategy is calledread-write separation, which helps to improve performance.
- data consistency: The Primary node ensures data consistency. After a write operation occurs, the data is synchronized to the Secondary node, ensuring that the data is consistent across nodes.
- High availability and fault toleranceWhen the primary node fails, the replica will automatically elect a new primary node to ensure the high availability of the database and the continuous operation of the business.
4. Difference from Master
- Master cap (a poem)Primary are terms used interchangeably in many database systems, but sometimes with subtle differences. For example, in some traditional relational databases (e.g., MySQL), theMaster is the node primarily responsible for write operations; in MongoDB, thePrimary More common.
- Also.Master-Slave model (such as MySQL's replication model).Master node is responsible for write operations.Slave node is responsible for the read operation. In theReplica Set Middle.Primary is the only node that allows write operations, and allSecondary Nodes are read-only and synchronize data through replication.
5. summarize
- Primary(orMaster) node is the only node in the database that is allowed to handle write operations.
- In distributed databases such as MongoDB, Primary nodes are also responsible for propagating data to replica nodes.
- Secondary node is a backup node that reads the data and has a consistent replication of the data.
- Primary node The election mechanism and failover ensure high availability of the database system.
18. what is secondary or slave?
existdistributed database maybeReplica Set Middle.Secondary maybeSlave A node refers to the storageduplicate nodes of the data. They are similar to thePrimary(orMaster) nodes work together to provide redundant backups of data, read operations, and high availability by replicating the data on the Primary node.
1. Secondary node (in MongoDB)
- Secondary A node is a node in the MongoDB copy set that is responsible for replicating data from the Primary node. These nodes only handleRead Requestand will not receive write operations (unless they are elected as Primary nodes).
- data replication: Secondary nodes maintain data consistency by replicating data synchronously from Primary nodes. Replication issynchronous This means that the Secondary node's data will lag slightly behind the Primary node's latest data, but in most cases this delay is very small.
- read-only operation: By default, Secondary nodes can only perform read operations. In MongoDB, you can reduce the burden on the Primary node by directing certain read requests to the Secondary node through a specific configuration.
- Electoral mechanisms: If the Primary node fails, the replica set will elect a new Primary node through an election mechanism. This election process is automatic and ensures high availability of the system.
2. Slave node (in a traditional database system)
In some traditional relational database systems, such as MySQL, theSlave A node is a node from theMaster The node to which the node replicates the data.
- data replication: The Slave node receives data from the Master node and synchronizes updates. the Slave node is typically read-only and cannot perform direct write operations.
- use: Slave nodes are typically used for read operation sharing, they provide data redundancy guarantees, and can be promoted to a new Master node in the event of a Master node failure.
- asynchronous replication: Similar to MongoDB's Secondary node, Slave nodes in traditional databases can have some latency because they copy data asynchronously from the Master node.
3. Similarities and Differences between Secondary and Slave Nodes
-
resemblance:
- data synchronization: Both Secondary nodes in MongoDB and Slave nodes in traditional databases need to synchronize data from the primary node (Primary or Master).
- read-only operation: They are mainly used to handle read requests to reduce the burden on the master node.
- Fault tolerance and high availability: These nodes provide redundant data to ensure data security and high availability in case of failure of the master node.
-
make a distinction:
- Reproduction method: MongoDB's Secondary nodes support asynchronous replication, and data synchronization is managed automatically, often allowing for more flexible read operations (e.g., read preference configuration). In traditional databases, replication of Slave nodes is usually asynchronous, and some databases allow Slave nodes to perform write operations under certain circumstances (e.g., MySQL's master-slave replication mode).
- Electoral mechanisms: MongoDB replica sets haveAutomatic Electoral MechanismsIf the Primary node fails, the Secondary node automatically elects a new Primary node. The Master-Slave model in traditional databases does not usually have an automatic failover mechanism unless additional tools or manual intervention are used.
4. Advantages of Secondary / Slave Nodes
- high availability: By having multiple Secondary nodes, data is not lost, ensuring that if one node fails, the others can continue to provide service.
- load balancing: By distributing read operations to the Secondary node, you can reduce the load on the Primary node and increase the throughput of the entire system.
- Fault recoveryThe replica set automatically elects a new Primary node if the Primary node fails, reducing downtime and improving system reliability.
- Data redundancy and backup: Secondary nodes provide data backup for the system, which is important to prevent data loss or disaster recovery.
5. Secondary Node Configuration in MongoDB
-
Read Preferences: In MongoDB, you can configure theRead Preference to control the routing of read operations. You can route read requests to the Secondary node to improve read performance, especially when there is a large amount of data in the system. For example:
- primary: Reads data only from the Primary node.
- secondary: Reads data only from the Secondary node.
- primaryPreferred: Reads data from the Primary node first, or from the Secondary node if the Primary node is not available.
- secondaryPreferred: Reads data from the Secondary node first, or from the Primary node if there is no available Secondary node.
- nearest: Reads data from the node with the shortest response time, either Primary or Secondary.
6. summarize
- Secondary(orSlave) node is the backup node in the replica set and is mainly responsible for data replication, read requests and high availability assurance.
- Secondary The nodes are connected to the node by a command from thePrimary Nodes replicate data to maintain consistency and provide redundant data to the system. They can handle read-only operations and perform automatic elections in the event of a master node failure, ensuring high system availability.
19. Do I have to call getLastError to make sure the write is working?
In MongoDB, calling thegetLastError
is one way to ensure that a write operation is successful, but it is not required.MongoDB provides different ways to ensure that a write operation is successful, depending on whichWrite operation confirmation mechanism。
1. getLastError
role of
getLastError
is a command used to check whether the most recent write operation was successful. It returns the result of the operation, including information such as whether the operation was successful or not, and whether an error was triggered.
In earlier versions of MongoDB, developers used to explicitly call thegetLastError
to confirm that the write operation was successful. This is because MongoDB's default behavior does not automatically wait for a write operation to be successfully acknowledged in some cases, especially in the case ofNo confirmation mode maybeDefault Write Attention Level Down.
2. MongoDB's Write Acknowledgement Mechanism
MongoDB provides several ways to ensure that write operations are successful, primarily by setting theWrite concern level (write concern) to implement. The write attention level determines how many replica set members are required to confirm that a write operation has succeeded before the operation returns.
-
w: 1
: A write operation requires an acknowledgement from at least the Primary node. Once a write operation is received and stored by the Primary node, it is considered successful and does not need to wait for an acknowledgement from another node. -
w: 2
: A write operation requires an acknowledgement from at least one Secondary node. This means that both the Primary node and at least one Secondary node need to acknowledge the write operation. -
w: "majority"
: A write operation will require acknowledgement from a majority of replica set members. This is MongoDB's default write concern level and typically ensures data consistency and high availability. -
w: 0
: Write operations do not wait for any acknowledgement. This means that there is no acknowledgement mechanism for a write operation; the operation may have been committed but success is not guaranteed.
By adjusting the write attention level, MongoDB automatically ensures that write operations succeed based on your requirements. For example, when using thew: "majority"
MongoDB ensures that a write operation is not considered successful until a majority of the replica set members have confirmed it, which generally ensures the reliability of the write operation.
3. getLastError
Alternatives to the
In modern versions of MongoDB, thegetLastError
is no longer necessary, as write operations can be performed with theWrite level of concern to automatically ensure success. You can ensure that a write operation is successful by doing the following:
-
utilization
writeConcern
parameters: Each write operation can specify awriteConcern
parameter that determines whether the write operation needs to wait for some node's acknowledgement. Example:({ name: "example" }, { writeConcern: { w: "majority" } });
In this example, the write operation requires confirmation from at least a majority of the replica set members for the operation to be considered successful.
-
utilization
acknowledged
write operation: If you don't care about the acknowledgement level of writes, you can use theacknowledged Write operations (e.g.insertOne
,updateOne
,deleteOne
(etc.). These operations will automatically return a confirmation on successful completion and do not require a manual call to thegetLastError
。 -
Exception handling: You can detect write operation failures by catching exceptions thrown by MongoDB. For example, when a write operation fails to execute successfully, MongoDB throws an error indicating that the operation was unsuccessful.
4. Is it mandatory to usegetLastError
?
In most cases, youNo need for an explicit callgetLastError
because MongoDB's modern write-attention mechanism is powerful enough to automatically handle acknowledgement of write operations. You can do this by setting the appropriatewriteConcern
to ensure operational reliability and consistency.
However, in certain specific use cases, such as requiring additional custom validation mechanisms or using older versions of MongoDB, thegetLastError
It may still be useful.
5. summarize
-
There is no need to have to call the
getLastError
: In modern MongoDB releases, the success of a write operation is usually confirmed by setting the appropriateWrite level of concern to do this without having to manually call thegetLastError
。 -
utilization
writeConcern
to ensure a successful write operation: Settingsw
、j
(write log) andwtimeout
and other options to control the acknowledgement of write operations. -
getLastError
Can still be used to check the status of write operationsThe write-level-of-attention mechanism of modern MongoDB, however, already handles most of the write validation requirements automatically.
During the development process, it is recommended to choose the right one according to your application requirementsWrite level of concernand relies on the built-in mechanisms provided by MongoDB to ensure data consistency and reliability.
20. Should I start a sharded or non-sharded MongoDB environment?
Select StartupCluster Sharded neverthelessnon-clustered sharding of your MongoDB environment depends on a number of factors such as your application needs, data volume, performance requirements, scalability needs, and so on. Here's a comparison of the two deployment options to help you make a decision:
1. Non-clustered sharded MongoDB environments
non-clustered shardingenvironment usually refers to a single-node MongoDB instance, or a small set of replicas. It is suitable for scenarios where the data volume is small, performance requirements are low, or no particular scalability is required.
Applicable Scenarios:
- Small applications or development environments: If you are developing a small application, or if the application has a relatively small amount of data, a single MongoDB instance (or set of replicas) is sufficient to handle the requirements.
- A single node can meet performance requirements: If you have a moderately sized dataset and a lightly loaded system, a non-clustered sharded environment will suffice.
- Simple architecture and low maintenance costs: There is no need to configure and maintain a sharded cluster, the architecture is simpler, and the management burden is lighter.
Pros and Cons:
-
vantage:
- Simpler and easier to deploy and manage.
- Without the complexity and high maintenance costs of cluster sharding.
- Ideal for applications with small data volumes that do not require horizontal scaling.
-
drawbacks:
- Scalability is poor and performance may be limited as data volumes grow.
- Load balancing across nodes is not supported and may lead to a single point of bottleneck.
- Performance problems may occur under high loads and large data volumes.
2. Cluster Sharded MongoDB Environments
cluster shardingThe model is suitable for applications that requireHorizontal scaling cap (a poem)high availability of large applications. In this model, data is distributed across multipleslice nodes, each slice stores a portion of the data, and theConfiguration Server Responsible for managing metadata.routing server Responsible for processing the client's request and routing the request to the appropriate slice.
Applicable Scenarios:
- Large-scale data storage: When the amount of data becomes too large for a single MongoDB instance to handle, cluster sharding provides the ability to scale horizontally (add nodes) by spreading the data across multiple nodes.
- High throughput and low latency requirements: In the case of heavy data and query loads, sharding can help spread the load and improve query performance and write throughput.
- Requires deployment across data centers: Sharded clusters are capable of scaling and redundancy across multiple data centers and geographic locations, improving availability and disaster recovery.
- distributed load balancing: When load balancing and management of multiple nodes is required, sharded clusters are efficiently scheduled through an automatic load distribution mechanism.
Pros and Cons:
-
vantage:
- Horizontal expansion: Cluster sharding can scale storage and compute capacity by adding more shards to support very large data sets.
- high availability: Cluster mode supports replica sets, typically multiple replicas per slice, ensuring data redundancy and fault tolerance.
- load balancing: MongoDB automatically distributes data to multiple shards for load balancing, which improves performance.
-
drawbacks:
- Complex deployment: Cluster slicing requires configuration of multiple slices, configuration servers, routing servers, etc., making deployment and management more complex.
- High maintenance costs: Cluster sharding involves more nodes and components and requires more O&M support, including monitoring, troubleshooting, and scaling.
- network latency: Since the data is distributed over multiple nodes, queries and writes across shards may introduce additional network latency.
3. Basis for decision-making
The choice of a clustered sharding or non-clustered sharding environment depends on several factors:
-
data volume:
- If you have a small amount of data, considernon-clustered sharding(single node or replica set) environment.
- If you have a very large amount of data, or expect it to grow significantly in the future, then you should choose thecluster sharding Environment.
-
Query and write loads:
- If you are facing lighter query and write loads, you can opt for a non-clustered sharded environment that is simple and easy to manage.
- If you have high throughput query and write requirements, cluster sharding can help spread the load and increase the throughput of your system.
-
scalability:
- Cluster sharding provides better scalability if you need to scale horizontally in the future.
- If the need for scaling does not arise in the near future, a non-clustered sharded environment is sufficient.
-
High availability and disaster recovery:
- Cluster sharding supports data redundancy and high availability for environments requiring high availability and disaster recovery.
- Non-clustered sharded environments usually only provide data redundancy and backup through replica sets, but they have poor scalability and failure recovery.
-
Management and operation and maintenance:
- Non-clustered sharded environments are simple to deploy and manage and are suitable for small teams or development environments with limited resources.
- Clustered sharded environments are complex to manage and require specialized operations teams for configuration, monitoring and troubleshooting.
4. summarize
- If youSmaller volume of data,Lighter query loadand does not requireHorizontal expansion,non-clustered shardingMongoDB environments are much simpler and more efficient.
- If you're facingLarge-scale data setsor havehigh throughput cap (a poem)High Availability Requirements,cluster shardingmodel is a more appropriate choice, which provides greater scalability, disaster recovery, and load balancing, but comes with higher complexity and O&M costs.
21. How do sharding and replication work?
Sharding cap (a poem)Replication are two key mechanisms for achieving high data availability and horizontal scalability in MongoDB. They each work differently and serve different purposes, but can be used together to improve system performance, reliability, and scalability. The following is a detailed description of the two:
1. Replication
Replication is used in MongoDB to implementdata redundancy cap (a poem)high availabilityReplication. Replication ensures data backup and redundancy by replicating data from a master node (Primary) to one or more slave nodes (Secondary).The replication mechanism of MongoDB is based on theReplica Set。
How it works:
- Master node (Primary): Each replica set has a master node, and all write operations and read operations (unless specific read preferences are enabled) reach the master node first. The master node is responsible for receiving write requests from clients and applying them to its own dataset.
- Slave node (Secondary): Slave nodes replicate data from the master node, including the operation log (oplog). These slave nodes keep the data synchronized with the master node. Write operations are first executed on the master node and then replicated to all slave nodes.
- automatic election: If the master node fails, the replica set will automatically conduct an election to elect a new master node to ensure high availability of the system.
- Oplog: Each replica set node (master and slave) has an operation log (oplog) that records all write operations to the database. Slave nodes synchronize data by reading the oplog of the master node.
Advantages and disadvantages of replication:
-
vantage:
- data redundancy: Storing data through multiple replica nodes guarantees high availability and disaster tolerance.
- high availability: The replica set automatically switches the master node and maintains service continuity in the event of a master node failure.
- load balancing: You can reduce the pressure on the master node by setting read preferences to assign certain read requests to slave nodes.
-
drawbacks:
- storage cost: Data will be stored on multiple replica nodes, requiring more storage space.
- synchronization delay: The synchronization of the slave nodes is asynchronous, so there may be delays between the master and slave nodes (data consistency issues).
2. Sharding
Segmentation is forHorizontal expansion A database that achieves load balancing of storage and query operations on large data sets by distributing data across multiple shards. Sharding allows MongoDB to handle very large data sets while improving read and write performance.
How it works:
- Shard Key: The core of slicing is done byslice key(Sharding Key) splits the data into different shards. Each shard stores a certain range of data, and the distribution of data depends on the value of the sharding key.
- Shards: Each slice is a MongoDB instance or set of replicas that stores a certain portion of the data. Each slice is a separate MongoDB node.
- Config Servers: The configuration server stores metadata about the entire cluster, containing information about the location of each data block and the allocation of slices. The configuration server's metadata ensures that clients know which slice the data is on when they query it.
- Routing server (Mongos): The routing server is the entry point to the MongoDB cluster and is responsible for routing client requests to the correct shard.Mongos sends requests to the appropriate shard node based on the value of the shard key. Clients do not connect directly to the shard nodes, but communicate through the routing server.
- data allocation: MongoDB is based on theslice key value distributes the data to different slices. The data is passed through thescope segmentation(range-based sharding)maybehash slice(hash-based sharding) for allocation.
Advantages and disadvantages of slicing:
-
vantage:
- Horizontal expansion: The ability to horizontally scale storage and compute capacity without compromising performance by adding more sharded nodes.
- load balancing: Data and requests are distributed among multiple slices, thus avoiding single-point bottlenecks.
- Large data set support: For applications with large data volumes, capable of handling data sets that exceed the storage and computational capabilities of a single node.
-
drawbacks:
- Complex configuration: A sharded cluster requires the configuration and management of multiple components (shards, routing servers, configuration servers, etc.) and is more complex to deploy and maintain than a single replica set environment.
- Cross-Segment Query: While MongoDB is able to handle cross-slice queries, cross-slice queries can incur a performance overhead, especially if the data needs to be queried across multiple slices.
- Split key selection: Choosing the right slicing key is critical, and if it is not chosen correctly, it can lead to uneven data distribution, which in turn affects query performance.
3. Combination of replication and slicing
In practice.Slicing and replication can be used togetherin order to take into accountHorizontal expansion respond in singinghigh availability。
- everyoneslice It's usually areplica collection, so sharding not only provides horizontal scaling, but also high availability through the replica set mechanism.
- sharded cluster compriseslice、Configuration Server cap (a poem)routing server Composition. Replica sets are used within each slice to ensure data redundancy and high availability.
- If the master node of a slice fails, the replica set automatically elects a new master node to ensure data availability. If a configuration server or routing server fails, the MongoDB cluster can automatically recover.
4. summarize
- Replication: Ensures data redundancy, disaster recovery capability and high availability by means of replica sets. Suitable for data backup, failure recovery and load balancing.
- Sharding: Horizontal scaling is achieved by splitting data onto multiple slices, suitable for storing and processing large-scale datasets. Each slice can use replica sets for data redundancy, combining to provide high availability.
Both can be used together, combiningHorizontal Extension of Segmentation cap (a poem)High Availability for ReplicationThe company provides large-scale data storage while ensuring data reliability and fault tolerance.
22. At what point does the data expand into multiple shards?
Data scaling to multiple shards in a MongoDB cluster is accomplished through theslice key(MongoDB divides data into shards based on the shard key selected. When the amount of data reaches a certain level, or the values of the selected shard key are unevenly distributed, the data is spread across multiple shards. Specifically, when data will expand to multiple slices depends on several factors:
1. Split key selection
In MongoDB, the slice key is the key that determines how data is distributed to different slices. The choice of slice keys affects data distribution, performance, and scalability. Poorly chosen slice keys can result in data being concentrated on a few slices, affecting system performance.
How to divide the data:
- When creating a sharded collection, you need to specify aslice key. This slice key is a field in a document, and MongoDB determines how the data is allocated based on the value of that field.
- MongoDB divides data into different data ranges (chunks) based on the value of the slice key. These data ranges are assigned to different chunks.
The way the data is divided:
- Range Sharding: MongoDB divides data into intervals (chunks) according to the value range of the slice key. For example, if the chunk key is a timestamp, the data is divided into chunks according to the time interval. The rules by which data is assigned to individual chunks are based on the range of values.
- Hash Sharding: MongoDB hashes the values of the slice keys and assigns the data to different slices based on the hash values. Hash slicing helps ensure that data is evenly distributed across all slices.
2. Growth in data volume
Once the amount of data grows to a certain size, MongoDB will distribute the data into multiple shards. The process is described below:
- dawn: In a small MongoDB cluster, data may exist on only one slice. When new data is inserted, it is assigned to that slice.
- Scaling to multiple slices: When the amount of data continues to grow and reaches a specific threshold (usually when the amount of data in a single slice exceeds MongoDB's configuration limit), MongoDB splits the data into multiplechunksand assign these chunks to different slices.
- dynamic adjustment: MongoDB dynamically rebalances data across shards based on load and data volume. That is, even if the data has been distributed to multiple shards, MongoDB automatically adjusts the data distribution based on the current data storage situation (e.g., some shards have more data than others) to ensure load balancing.
3. Data migration and rebalancing
MongoDB monitors the amount of data in each slice and performs automaticdata migration cap (a poem)rebalanceto ensure that the data is evenly distributed across all the slices. When the data stored on a slice exceeds a predefined threshold, MongoDB migrates some of the data to other slices.
- rebalancing process: MongoDB automatically moves chunks around based on the storage in each slice of the cluster, thus distributing the data evenly among the slices. This process is transparent and requires no manual intervention.
- Reassigning slice keys: If the initial choice of slice keys results in an uneven distribution of data, or if the data grows to a point where some slices become overloaded, MongoDB can improve the data distribution by reassigning slice keys.
4. Effects of Segmentation Keys
- Uniform distribution of data: If a suitable slice key is chosen (e.g. a field with good hashing properties), the data will be evenly distributed across the different slices. When new data is inserted, it will be assigned to the appropriate slice based on the value of the slice key.
- uneven distribution: If a slice key is not chosen properly (e.g., choosing a field that is not sufficiently decentralized, such as a common fixed value), it may result in the data being concentrated on a few slices, resulting in uneven loads, which in turn affects query performance and the scalability of the system.
5. When does the data scale to multiple slices:
- Initial insertion phase: Initially, data is inserted into a single slice, and only when the volume of data grows does MongoDB automatically divide the data across multiple slices.
- data growth: As the amount of data grows, MongoDB creates new chunks and assigns them to a different slice when the amount of data in a single slice exceeds a threshold.
- On rebalancing: When some shards in the cluster are heavily loaded, MongoDB passes thedata migration cap (a poem)rebalance Migrating data from one slice to another results in a more even distribution of data.
6. summarize
- The process of scaling data to multiple slices in MongoDB is dynamic and relies on theslice key and growth in data volume.
- After the data volume reaches a certain size, MongoDB automatically divides the data into chunks and assigns these chunks to multiple shards.
- MongoDB automatically does this based on the distribution of the shard keydata migration cap (a poem)rebalance, ensuring an even distribution of data and ensuring the scalability and performance of the system.
Thus, data scaling to multiple slices doesn't happen at the beginning, but rather as the data volume grows and theSplit key selection MongoDB automatically handles data sharding and migration.
23. What happens when I try to update a document on a chunk that is being migrated?
When you try to update a document in MongoDB on a chunk that is being migrated, MongoDB automatically handles the situation to ensure data consistency and correct operation. Specifically, MongoDB takes the following steps to deal with this issue:
1. Concurrency of block migration and update operations
When MongoDB performs a block migration, it locks the block of data that is being migrated to ensure that no other writes affect that portion of the data at the same time. During this process, MongoDB's sharding architecture ensures consistency.
2. Behavior during block migration
- Temporary write stop: When a block of data begins to migrate, MongoDB suspends write operations for that block during the migration process. This operation is usually transparent and requires no user intervention.
- Restart writing after migration is complete: When the block migration is complete, MongoDB re-enables writes to the block. At this point, the block's data is fully migrated to the target slice and all subsequent writes are sent to the target slice.
3. Specific behaviors:
- Lock before update operation: If you try to update a document on a block that is being migrated, MongoDB temporarily locks the block before the migration begins. This way, any update operations against that block are cached in apending queue until the block is fully migrated and the write lock is released.
- Operational redirection: If write operations attempt to access migrated data during a block migration, MongoDB automatically redirects those operations to the new slice.Routing server (Mongos) will know the location of the target slice, so it will send the write operation to the correct slice, even if the document is being migrated from one slice to another.
- Consistency Guarantee: MongoDB guarantees that data consistency and transaction consistency will not be broken during a block migration. When a write operation is performed during a migration, MongoDB ensures that the operation will ultimately succeed and will not be lost or misplaced.
4. What can go wrong:
- Network issues during migration: If the network fails during a migration, MongoDB automatically recovers. Typically, these failures do not result in data loss because MongoDB recovers the data through logging (oplog) and resynchronization mechanisms.
- lockout competition: In highly concurrent environments, multiple write operations may attempt to access the block being migrated. While MongoDB handles this situation, it can lead to transient write delays or performance bottlenecks at extremely high loads.
5. Transparency and auto-recovery
MongoDB's block migration process is generally transparent to the client. Regardless of how the blocks are migrated, the client application only needs to focus on normal write requests, and MongoDB automatically manages the location and consistency of the data. There is no need for the client to explicitly intervene or process block migrations; MongoDB automatically ensures that the data is transferred correctly through routing services.
6. summarize
- In MongoDB, if you try to update a document on a block that is being migrated, MongoDB automatically handles the update request.
- Write operations are redirected to the new sliceto ensure that the update can be executed successfully.
- During block migration, write operations are temporarily suspended, but the system ensures that no data is lost and data consistency is maintained during migration.
- These behaviors are usually transparent and do not require special handling by the application.
As a result, MongoDB's mechanism for handling write operations during block migrations isopen (non-secretive) respond in singingConsistency Guarantee This ensures that the system remains functional and avoids data loss in the event of data migration and concurrent operations.
24. What happens if I launch a query when a shard is down or slow?
When a shard in a MongoDB cluster stops or responds very slowly, the initiated query will be affected in some way, depending on several factors, such as the type of query, the configuration of the cluster, and whether a specific fault-tolerance mechanism is enabled. The following are a few scenarios in which this can happen:
1. Routing of queries
MongoDB Usagemongos routerto coordinate query requests from clients. When you initiate a query, mongos routes the query to the appropriate shard based on the query's shard key and the cluster's sharding configuration. The exact behavior of the query depends on whether the query involves a faulty shard.
2. If the slice stops or responds slowly:
-
Segmentation stops completely:
- Unable to route query to this slice: If a slice stops working altogether (e.g., a slice node crashes or loses power), mongos will not be able to send query requests to that slice. Typically, mongos obtains the shard information from the cluster's configuration, and when it finds that the target shard is unavailable, it excludes that shard from the query.
- Query Failure or Degradation: In this case, the query may fail, or MongoDB may return an error indicating that the slice is unavailable. The application can handle such errors through a retry mechanism, or respond with appropriate error-catching logic.
-
Segmentation response is slow:
- Timeout or long wait: If a slower response comes from a slice, client queries may experience longer delays or even timeouts. mongoDB waits for a response from that slice, depending on the configuration of the query, but the query fails if the timeout exceeds the default value (or a customized timeout setting).
- timeout setting: You can set a timeout in the client query to prevent the query from hanging permanently due to slow response. If the slice response times out, mongos returns an error informing the client that the query failed.
3. Types of queries involved in slicing
The type of query also affects the behavior when a slice stops or responds slowly:
-
Queries based on sliced keys:
If the query is based on a slice key (i.e., the query contains conditions for the slice key), MongoDB routes the query directly to one or more specific slices. If a particular slice fails to respond, mongos reroutes the query based on other slices. -
Scope Query:
For range queries (e.g., queries for fields that do not contain a slice key), MongoDB may need to query all slices. If a slice is unavailable or responds slowly, the entire process of the query may become very slow because all of the slices need to participate in the query, and downtime or slow response from a portion of the slices can affect the completion of the entire query. -
Aggregate queries:
Aggregation queries typically involve multiple slices working together. In the event of downtime or slowdown of one or more participants in a slice, the aggregation operation is affected, which may cause the query to slow down or fail.
4. High availability and fault tolerance mechanisms for clusters
MongoDB clusters are typically configured to haveReplica set, which means that there are usually multiple replicas per slice. A replica set allows MongoDB to failback in the event of a failure of a slice or its master node (primary). Specifically:
-
Master node failover (failover):
If the master node of a given slice stops responding or crashes, the replica set will automatically perform afailoverthat elects a new master node. At this point, the query is rerouted to the new master node, and the system continues to work, albeit possibly with a short delay. - read-only copy: If the master node of a slice is unavailable, read requests are usually routed to the replica of that slice (if the replica is set to allow reads). This way, queries can continue to fetch data from the replica even if the master node is down.
5. Query failure or retry mechanism
If the query fails, MongoDB attempts to perform aauto-retry. For example, MongoDB's client driver supports automatic retry operations when a slice is unavailable. This is very helpful for high availability, allowing query requests to be automatically retried after the shard is restored.
6. How to Handle Slicing Failures in Queries
- failover: In the case of using a replica set, MongoDB automatically performs a failover of the master node and reroutes requests to the new master node. This minimizes query failures.
- Backup and downgrade: If a slice is unavailable for an extended period of time, manual intervention may be required to resolve the issue. In some cases, an application can use alternate backup data for queries or downgrade queries to data that does not depend on that slice.
7. summarize
- If a slice stops or slows down, MongoDB handles it accordingly, depending on the slice's configuration and replica set. Queries may fail or be redirected to another shard if the shard becomes unavailable.
- If the query involves multiple slices and one of the slices is unavailable or responds slowly, the query may become very slow or fail with a timeout.
- The use of replica sets improves fault tolerance by allowing data to continue to be fetched from replica nodes in the event of a primary node failure, reducing the impact of the failure.
- The cluster will try to ensure high availability, but in extreme cases, certain queries may not complete due to a shard failure.
25. Can I delete old files from the moveChunk directory?
In MongoDB, themoveChunk
operation is used to move data chunks (chunks) between shards. After the chunk migration is complete, MongoDB creates new files in the target slice and stores the data there, while deleting the old chunk files in the source slice.
moveChunk
old files in the directory:
When MongoDB executes themoveChunk
When operating, it will involve the following steps:
- data migration: MongoDB migrates data blocks from the source slice to the target slice.
- Clear Source Segmentation: After the data migration is complete, MongoDB deletes the old data block files in the source slice.
Deletion of old files:
-
Manual deletion of files is not recommended: The process of deleting MongoDB administrative files is automated.
moveChunk
After the operation is complete, the old files on the source slice should be cleaned up automatically. If you are in themoveChunk
The process sees old files remaining in the file system, and manually deleting them may result in data corruption or other problems. -
Delete Condition: MongoDB automatically deletes old data blocks on the source slice after the migration completes and the target slice confirms that it received the data. The system disposes of these old files during the cleanup phase after the migration completes.
Why not delete it manually?
-
Data consistency issues: Deleting a file manually may compromise the integrity of the file, especially if MongoDB still needs the file for certain operations. Deleting a file can result in an inconsistent or unrecoverable database.
-
Synchronization issues with replica sets: In a replica set environment, data consistency between slices is critical. Manual deletion of files may cause problems with the synchronization of replica sets, affecting data availability.
-
Automatic management: MongoDB automatically manages the deletion of old files. In most cases, these files should be automatically cleaned up after the migration operation is complete without any problems for the cluster.
If the file is not automatically deleted:
If you findmoveChunk
The old files are still not deleted after the operation is completed, probably because:
- Migration operations not fully completed: Check the MongoDB logs to ensure that the migration process was not interrupted and that the data is complete.
- Documentation system issues: In some cases, file system anomalies may prevent MongoDB from deleting files. You can try to clean it up manually, but make sure that there are no other operations going on across the cluster before you perform the cleanup.
Conclusion:
-
Should not be manually deleted
moveChunk
directory of old files, unless it is very certain that the migration has been completely successful and no other operations are in progress. -
Making MongoDB Automatically Clean: If MongoDB completes the
moveChunk
Failure to clean up old files after an operation, checking the logs or considering restarting the sharded node usually solves the problem.
26. How can I see what links Mongo is using?
In MongoDB, to view information about the connections that are currently in use, you can use the following methods:
1. Viewing Connections via the MongoDB shell
In the MongoDB shell, you can use thecurrentOp()
method to view current operations and connections. This is a very useful tool to help you view ongoing operations, connections, and long-running queries that may be causing problems.
()
-
currentOp()
: This command returns a document containing all the current operations, including query, insert, update, delete and so on. You can look for information about the database connection in the returned results, such as the type of operation performed, the time of execution, and so on.
Example:
({ "active": true }) // see all active connections
This command lists all the operations being performed. You can filter further to see specific connections and operations.
2. Viewing MongoDB Connections
MongoDB maintains a connection pool to handle all connections to clients. If you want to see how many connections are currently established with your MongoDB instance, you can use the following command:
().connections
-
()
: This command returns runtime statistics for the MongoDB instance, including details on the number of connections. -
connections
: Returns information about the current connection, including:-
current
: The number of currently active connections. -
available
: The number of available connections. -
totalCreated
: The total number of connections created since startup.
-
3. utilizationnetstat
command to view system-level connections
You can also use the operating system tools (such asnetstat
) to view network connections to MongoDB. This will show all network connections at the system level, including TCP connections to MongoDB.
netstat -an | grep 27017
This command displays all connections to the MongoDB default port (27017
) connection information. With this information, you can view connections from different clients.
4. Viewing MongoDB Logs
MongoDB also records information about connections in its log files. You can view the log files to get detailed information about the connection, especially when high loads or connection problems occur.
- Log files are usually located in the
/var/log/mongodb/
, but also depends on the log path you configured when you installed MongoDB.
tail -f /var/log/mongodb/
5. pass (a bill or inspection etc)mongostat
command to view connections in real time
MongoDB provides a command-line toolmongostat
The MongoDB instance can be displayed in real time to show the running status of the MongoDB instance, including the number of connections, the number of operations, the memory usage, and so on.
mongostat --host <your_mongo_host>
mongostat
Various performance metrics, including number of connections, number of operations, and so on, are displayed periodically to help you monitor the health of your MongoDB instance.
6. View connection pool information (in client code)
If you are using MongoDB connection pooling in your application (for example, with a MongoDB or Java driver), you can view the details of the client connection pool. The exact method depends on the driver you are using.
Example (using MongoDB driver):
const { MongoClient } = require("mongodb");
async function checkConnections() {
const client = new MongoClient('mongodb://localhost:27017');
await ();
const adminDb = ('admin');
const stats = await ({ serverStatus: 1 });
();
await ();
}
checkConnections();
Summary:
- utilization
()
View current operations and connections. - utilization
().connections
View the number of connections to your MongoDB instance. - utilization
netstat
command to view system-level connections. - View MongoDB log files for connection information.
- utilization
mongostat
command to view connections and other performance metrics in real time. - If you use connection pooling in your code, you can view the status of the connection pool.
These methods help you monitor and diagnose the connections that MongoDB is currently using.
27. If the moveChunk operation fails, do I need to manually clear some of the transferred documents?
If you execute themoveChunk
operation fails, there is usually no need to manually purge partially transferred documents.MongoDB handles partial migrations automatically to ensure data consistency and integrity. Here are some possible scenarios and suggestions on how to handle them:
1. MongoDB's Auto-Recovery Mechanism
-
Atomicity and consistency:
moveChunk
The operation is atomic, either succeeding completely or failing completely. If the operation fails in the middle, MongoDB rolls back the changes and ensures that no partially migrated documents remain in the source or target shard. - Rollback mechanism: In the event of a failure during the migration process (such as a network problem, a sharded node going down, etc.), MongoDB will try to roll back all changes to the pre-migration state. In this way, partially migrated documents will not affect the consistency of the data.
2. Possible errors and recovery
- Network failure or node downtime: If a network failure occurs or the target slice becomes unavailable during the migration, MongoDB logs the error in the logs and the migration operation fails. mongoDB attempts to roll back the migrated data to ensure that the data in the source and target slices are consistent.
- Data consistency issues: If the migration operation fails, there may be partially migrated documents between the source and target slices, but MongoDB handles these through its transaction mechanism. You usually don't need to manually remove these documents; MongoDB ensures that the operation is consistent.
3. Check migration status
-
View Log: If you doubt
moveChunk
If the operation was not completely successful, you can check MongoDB's log file (usually located in the/var/log/mongodb/
) for detailed error information. - Checking data integrity: You can run some validation operations to check if the data is migrated correctly. For example, you can run queries on the source and target slices to check that the data is complete and consistent.
4. Cases of manual intervention
If MongoDB fails to fully recover from a migration operation, or if you find that some of the documents were not handled correctly during the migration process (a relatively rare occurrence), consider the following steps:
- Manual cleanup of inconsistent data: If it is confirmed that the data migration has failed and MongoDB is unable to repair it automatically, you may need to manually check and clean the data on the slice and remove the leftover documents.
-
Re-implementation of migration: you can try to re-execute
moveChunk
operation to ensure that the data is migrated to the correct slice.
5. Strategies to prevent partial migration failures
- Ensure a stable network connection: Ensure stable network connectivity between MongoDB's shard nodes to minimize failures due to network issues during migration.
- Optimizing Migrations with Sharding Policies: Avoid migration failures due to hotspot data by choosing the right slicing key. Hotspot data can increase the load during the migration process, leading to failures or delays.
6. summarize
In most cases, MongoDB will automatically process themoveChunk
Failures in operations do not require manual purging of partially migrated documents. Manually purging documents is usually only necessary when MongoDB is unable to recover automatically. Best practice:
- Check the MongoDB logs to confirm that an error has occurred.
- Migrations can be manually cleaned or re-executed if necessary while ensuring data consistency.
28. If I'm using replication, can I use journaling for some parts and not others?
In MongoDB, journaling is a key feature for ensuring data consistency and persistence.MongoDB'sReplication technology (replication) cap (a poem)Journaling are independent and closely related, but journaling cannot be partially enabled or disabled on different nodes of a replica set.
1. Journaling in Replication Sets
-
existReplication set (replication) in which all nodes are required to use thelog (computing) to ensure that data can be recovered in the event of a failure.MongoDB's
journaling
function is used to log write operations to the database, ensuring that data can be recovered in the event of a server crash or power failure. Logging helps MongoDB ensure that operations on data are atomic, persistent, and can be automatically recovered after a system crash. -
Impact of logs on replication: In a MongoDB replica set, theMaster node (primary) respond in singingSlave nodes (secondary) The master node logs all write operations to a log file, and the slave nodes copy these operations from the master's oplog. In this way, MongoDB ensures data consistency across all nodes.
2. Journaling cannot be selectively disabled.
MongoDB does not support partially enabling or disabling journaling on different nodes of a replica set. the journaling mechanism is global in MongoDB and is enabled for all nodes (master and slave) and cannot be individually disabled for some nodes.
-
Reason for Logging Enablement:
- data consistency: MongoDB uses logging to ensure atomicity of transactions and consistency of data. In a replica set, each node needs to ensure data persistence to prevent data loss or corruption due to node crashes or power outages.
- Fault recovery: Logs help MongoDB recover data after a system crash. Nodes without logs may lose data, leading to data consistency issues.
3. Side Effects of Disabling Journaling
Although MongoDB does not allow journaling to be disabled for certain nodes in a replica set, in some scenarios users may choose toDisable journaling to improve performance, especially in development environments where data persistence or consistency is not a concern. Disabling journaling can have a significant impact on performance, but it can also be risky.
Side effect of disabling logging:
- data loss: If logging is disabled, data not written to disk will be lost if MongoDB crashes.
- inconsistency: Disabling logging prevents MongoDB from ensuring data consistency and recoverability, which is undesirable in a production environment.
4. Log Settings
MongoDB allows logging related options to be set at startup, here are some logging related settings:
-
Enabling Logging:
--journal
(enabled by default) -
Disable Log:
--nojournal
(Used only in certain specific scenarios and generally not recommended for use in production environments)
mongod --nojournal # Disable the journaling feature
mongod --journal # Enable journaling (default)
5. summarize
- In MongoDB, theAll nodes in the replica set must have journaling enabled, which cannot be selectively enabled or disabled for certain nodes.
- Disable Log practice is not recommended for production environments because it sacrifices data persistence and consistency and increases the risk of data loss.
- If you wish to disable logging or optimize performance, you should consider this setting in single-node deployments or non-production environments, where enabling logging is standard practice for data security and consistency.
29. What happens when you update a document on a chunk that is being migrated?
When updating a document on a Chunk that is being migrated, MongoDB ensures that the operation is atomic and consistent and uses internal mechanisms to handle the situation. The following is a detailed explanation of what happens when you update a document on a Chunk that is being migrated:
1. Overview of the Chunk Migration Process
In MongoDB, theSharding Splits the data into multiple chunks and distributes them to different shards.MongoDB uses themoveChunk
operation to move a Chunk from one slice to another. This operation is performed in the background and is usually transparent.
- Chunk migration is a time-consuming operation because it requires migrating data from one slice to another.
- During the migration process, MongoDB replicates the data between the source and target tiles and ensures data consistency.
2. Updating Documentation on Chunks Being Migrated
During the migration process, some documents may still receive update requests. Assuming that a chunk is being migrated from the source shard to the target shard, how does MongoDB handle an update request from an application during this process?
2.1 Locking and coordination
- Targeted Segmentation Updates: MongoDB coordinates between the source and target chunks while the migration operation is in progress. If the update request is for a document in the chunk being migrated, MongoDB uses a locking mechanism to ensure that the update operation for that document is not lost or conflicted during the migration.
-
Coordination process: The migration operation is performed by the
mongos
router (computing) coordinated, it will correctly send the request to the slice where the Chunk being migrated is located based on the routing information. If the request is for the target slice, themongos
will be sent directly to the target slice; if the request is for a source slice, themongos
It will be sent to the source slice first and then process the data on the target slice when the migration is complete.
2.2 Impact of update operations
- source slice: If the update operation occurs on the source slice while the data block is being migrated, MongoDB delays the update request until the migration of the data in the source slice is complete. At that point, the update operation is buffered and applied on the target slice.
- target segmentation: If the update operation occurs on the target slice and the data block is being migrated, MongoDB ensures that the update is executed on the target slice and synchronizes the data with the source slice after the migration is complete to ensure consistency.
2.3 Atomicity Guarantee
MongoDB provides a set of tools to help you with this process through itsDistributed transactions respond in singinglocking mechanism to ensure that all operations are atomic even during a migration. This means that even if documents are updated during a migration, MongoDB is able to ensure that the data is consistent and correct.
3. Concurrency handling during migration
During migration, MongoDB takes the following steps to handle concurrent operations:
- Concurrent request control: MongoDB limits concurrent writes to the migrating chunk during migration to avoid write conflicts or data inconsistencies.
- Operation log (Oplog) synchronization: During the migration process, MongoDB uses the oplog of the replica set to ensure that operations on the source and target tiles remain synchronized. Even if there are update operations during the migration, all changes are logged in the oplog and applied to the target slice.
4. Failure recovery during migration
If a failure occurs during a migration, such as a node going down or a network problem, MongoDB attempts to roll back the operation and restore data consistency. It will ensure that all operations that were not successfully migrated are re-executed, thus avoiding data loss or inconsistency.
5. summarize
When you update documents on a chunk that is being migrated, MongoDB ensures data consistency through the following mechanism:
- utilizationLocking and coordination mechanisms to handle concurrent updates.
- Delaying updates on source slices, until the migration is complete.
- Ensure that all update operations are performed on thetarget segmentation On the correct implementation.
- pass (a bill or inspection etc)Oplog cap (a poem)Distributed transactions to maintain data consistency.
As a result, MongoDB ensures that update operations don't break data consistency during the migration process and that concurrent operations are handled correctly.
30. MongoDB builds an index on A:{B,C}, will queries A:{B,C} and A:{C,B} use the index?
In MongoDB, indexes are used based on how well the query criteria match the index. If you add an index to the fieldA
A compound index is created on{A: 1, B: 1, C: 1}
, the order of the fields of the query condition and the order of the index is very important.
1. Problems with the order of indexes
-
What MongoDB maintains in a compound index is the order of the fields. If you create a compound index
{A: 1, B: 1, C: 1}
, it will optimize the query in that order. Therefore, the query criteria should try to match the index field order. -
consult (a document etc)
A:{B,C}
cap (a poem)A:{C,B}
For this index is used in a different way, because the order of their fields is different from the order of the index.
2. Whether the query A:{B,C} will use an index or not.
Suppose you query{A: <value>, B: <value>, C: <value>}
, this query would be a good match for{A: 1, B: 1, C: 1}
This composite index. which MongoDB uses to speed up queries.
For example, the query condition is{A: 1, B: 2, C: 3}
MongoDB will utilize the{A: 1, B: 1, C: 1}
index to execute the query because this index exactly matches the query condition.
3. Whether the query A:{C,B} will use an index or not.
If the query condition is{A: <value>, C: <value>, B: <value>}
Although the fieldB
cap (a poem)C
exists in the index, but since the order of the index's fields is{A: 1, B: 1, C: 1}
, MongoDB does not directly utilize this index to perform queries.
This is because MongoDB's composite indexes can only efficiently match query criteria in theOrder of fields from left to right. That is, if you specify in the queryA
cap (a poem)B
The MongoDB will utilize the{A: 1, B: 1, C: 1}
index, but if you swap theB
cap (a poem)C
location, MongoDB will not be able to use this index directly.
4. Prefix rules for indexing
MongoDB will follow aPrefix rules, i.e., the query condition must be derived from the index'sfar left Starts matching. Assuming the index is{A: 1, B: 1, C: 1}
, the following are the rules for query and index matching:
- consult (a document etc)
{A: <value>, B: <value>, C: <value>}
: will match exactly, using the index. - consult (a document etc)
{A: <value>, C: <value>}
: will match{A: 1, B: 1, C: 1}
Indexed, but not providedB
If the indexes are not indexed, MongoDB uses the indexes and scans them.C
。 - consult (a document etc)
{A: <value>, B: <value>}
: Indexes will be used. - consult (a document etc)
{C: <value>, B: <value>}
:will not (act, happen etc)This index is used because it does not start at the leftmost point of the index (i.e., it is not specified that theA
(field).
5. summarize
- consult (a document etc)
A:{B,C}
know how to use{A: 1, B: 1, C: 1}
index, because the query field order is the same as the order of the index. - consult (a document etc)
A:{C,B}
Usually not used{A: 1, B: 1, C: 1}
indexes, because the indexes are organized sequentially and the order of the query conditions does not match the order of the indexes.
If you wish to be able to supportA:{C,B}
For a query like this, consider creating another index{A: 1, C: 1, B: 1}
This will allow to pressA, C, B
The query is executed in the order in which the corresponding indexes are used.
31. What happens if a query is initiated when a shard is stopped or very slow?
When a shard stops or is very slow, MongoDB relies on itsSegmented Architecture cap (a poem)fault tolerance mechanism to ensure that the system continues to run and to minimize the impact of queries. Here's what happens to a query when there is a problem with a slice:
1. Query Handling When Segmentation Stops or Goes Slow
-
Copy set fault tolerance: In MongoDB, each slice is usually composed of aReplica Set composition, which provides high availability to the shards. If the master node (Primary) of a slice stops working or becomes very slow, thereplica collection A new master node (Primary) is automatically selected. Even if the Primary node stops working, the Replica node can still process the query request (although there may be a delay). At this point, if the query is for this slice, the
mongos
The router will attempt to send the query to the slave node (Secondary) in the replica set to ensure that the query operation does not fail because the master node of the slice is stalled. -
Impact of query routing: If a slice stops completely.
mongos
The router will try to route the query request to another healthy slice.mongos
will monitor the status of the shards and ensure that queries are only routed to shards that are online and responding properly. If there are multiple slices, the query may return some of the data through other slices, but this depends on the type of query and the scope of the data involved.
2. Impact of slow slicing
If a particular slice becomes very slow, it may affect the performance of the query. The exact impact depends on whether the query involves load on that slice, here are two possible scenarios:
-
global search: If the query needs to span multiple slices (e.g., the query is an aggregate or lookup across all slices) and one slower slice is particularly slow, this slow slice may slow down the response time of the entire query because MongoDB must wait for all relevant slices to complete their operations before merging the results.
-
Segment-specific queries: If the query involves only a specific slice (for example, querying a specific range of data on a slice), then the impact of slow slicing may result in an increase in the response time for that slice, which ultimately affects the overall performance of the query. mongoDB continues to wait for the slow slice to respond until a timeout occurs or until a request is made to return a result.
3. Query Timeout
- If the response to a slice is very slow, a MongoDB query may encounter aovertime pay issues, especially when query timeouts (e.g., the
maxTimeMS
) is set to be shorter. Slow slicing can cause queries to time out or to wait for a slow slice to respond when other slices in the cluster have already returned results.
4. mongos
fault tolerance
mongos
The router selects the best query routing path based on the health status of the cluster.mongos
Regular contact will be made withConfig Servers interaction to get the latest metadata about the cluster. If a slice is unavailable or faulty, themongos
will avoid routing the query to that slice and try to get data from other slices to avoid query failures as much as possible.
- load balancing: When a shard fails, MongoDB's load balancing mechanism automatically tries to adjust query routing to shift traffic to other healthy shards. If a slice is overloaded, it may affect the query response rate, but if other slices in the cluster are working properly, the query can still continue.
5. Data loss and consistency
- If a slice stops completely and there are no backups or the replica set is not configured properly, it may happen that thedata loss. However, MongoDB usually avoids this with replica sets, which ensure redundant backups of data.
- If the query involves data that exists on an inaccessible slice, then the query result will be incomplete. This is evidenced by the return of partial data or, in extreme cases, the query may fail.
6. How to mitigate the effects of slow slicing
- Ensure slice equalization: Ensure that the load on the cluster is evenly distributed. If a slice has a high load, it may cause that slice to become slow. Load balancing can be optimized by adjusting the slice key or manually migrating chunks.
-
Monitoring Cluster Status: Using the MongoDB-suppliedMonitoring Tools(e.g.
mongostat
maybemongotop
) to monitor the health of the cluster. If you find a slower response from a slice, you can take timely action to increase hardware resources or optimize queries. - Enhanced replica set configuration: Ensure that each slice has multiple replicas, and in particular configure the replica set for each slice so that even if the master node stalls or fails, the replica nodes can still handle query requests.
7. summarize
- If a slice stops working or is very slow, MongoDB uses the replica set to keep the data available. If the shard is not completely stopped, MongoDB tries to use slave nodes to process queries.
-
mongos
The router dynamically adjusts the query routing to avoid sending queries to unavailable or slow-responding slices and to minimize the impact on the query. - Slow sharding may affect query response times, especially when querying across shards, and the entire query may be slowed down. Proper load balancing and monitoring can help mitigate these issues.
32. Does MongoDB support stored procedures? If so, how does it work?
MongoDB does not directly support traditionalstored procedureUnlike stored procedures in relational databases (RDBMSs), MongoDB is a document-based database that focuses on flexible document storage and querying. As a result, it does not have the "wrapped execution" stored procedure functionality that is used on database servers like MySQL or SQL Server.
However, MongoDB providesJavaScript support, and it can be used with theEmbedded Scripts respond in singingaggregation framework to achieve stored procedure-like functionality. Specifically, MongoDB provides the following ways to handle procedure-like operations:
1. JavaScript execution in MongoDB
MongoDB supports the execution of JavaScript code in the database via theeval
method to execute a script, or use themapReduce
to perform more complex operations.
-
eval()
methodologies: You can get a good idea of what's going on with theeval()
Execute JavaScript code in MongoDB. This method can be used to execute a piece of JavaScript code that manipulates data in the database.Example:
(function() { var result = ().toArray(); return result; });
Note: After MongoDB 4.0.
eval()
method is deprecated, try to avoid using it.
2. MapReduce operations
MongoDB providesMapReduce MapReduce is typically used to aggregate and transform data in a collection. You can define a Map function to process each document and a Reduce function to aggregate the results.
Example:
var mapFunction = function() {
emit(, 1); // categorized as key,be valued at 1
};
var reduceFunction = function(key, values) {
return (values); // Calculate the number of each classification
};
(mapFunction, reduceFunction, { out: "result" });
This approach allows you to implement some custom aggregation operations in MongoDB, but performance may not be as good as using the aggregation framework.
3. MongoDB Aggregation Framework
MongoDB providesaggregation framework, which can handle complex data processing tasks such as grouping, sorting, filtering, transforming, etc. Aggregation frameworks are more complex thanmapReduce
More efficient and powerful, it can be used to implement stored procedure-like business logic, especially when dealing with big data.
Example:
([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
]);
The Aggregation Framework allows you to build complex query logic and run it directly in MongoDB without the need for separate stored procedures.
4. (political, economic etc) affairs
MongoDB supportsMulti-Document TransactionThis allows you to perform multiple operations in a single transaction, thus ensuring atomicity of operations. Although this is not the same as a stored procedure in a traditional database, it can be used as part of a transactional operation to accomplish complex multi-document processing logic.
Example:
const session = ();
();
try {
db.({ _id: 1 }, { $set: { status: "A" } }, { session });
db.({ item: "ABC", qty: 100 }, { session });
();
} catch (error) {
();
} finally {
();
}
With transactions, you can perform multiple operations like stored procedures, guaranteeing their atomicity.
5. Custom JavaScript Scripts
If you need to perform complex business logic, MongoDB allows you to store JavaScript scripts in the database and call them through your application. You can store these scripts asClient Script maybeServer Scriptsand then execute them when needed.
Example: Storing JavaScript scripts as in the function and execute it:
({
_id: "myFunction",
value: function(a, b) { return a + b; }
});
("return myFunction(5, 10)");
6. Other alternatives
You can also find out more about it through theMongoDB Change Streams to listen for data changes and trigger actions when the data changes, thus implementing stored procedure-like behavior at the application level. For example, when some data is updated, you can trigger automatic follow-up logic (such as calling an external API or updating other data).
summarize
Although MongoDB does not support stored procedures in the traditional sense, it provides a variety of ways (e.g., JavaScript execution, MapReduce, aggregation frameworks, transactions, etc.) to implement complex data processing and manipulation logic. Therefore, you can choose the appropriate way to implement stored procedure-like functionality according to your business needs.
33. How do you understand the GridFS mechanism in MongoDB and why does MongoDB use GridFS to store files?
In MongoDB, theGridFS is a mechanism for storing and retrieving large files (such as audio, video, images, documents, etc.). Since MongoDB itself is not suitable for storing large files directly (file sizes are usually limited to 16MB), it introduces GridFS as a solution for storing and managing large files in multiple small data blocks.
How the GridFS mechanism works
GridFS splits large files into severalchunks(data blocks), which are then stored in MongoDB collections. The size of each block is typically255KB(default), this is to keep each file block small enough for easy storage and handling.
A core component of GridFS:
-
set (mathematics):
This is the collection of data blocks that store the actual file. Each data block holds a portion of the file and contains the following fields:-
files_id
: Reference to the ID of the file to which the block belongs. -
n
: Identifies the location of the current block in the file. -
data
: Stores the actual data of the file.
For example, a file may be divided into several 255KB blocks, each with a
files_id
will be the same, butn
The values will be different to ensure that the files are in order. -
-
set (mathematics):
This is the collection that stores metadata about files. Each file has an entry in this collection that records the file's ID, filename, upload date, file size, and other meta information. This collection provides basic operations on files, such as viewing file information, retrieving files, and so on.documentation
Documents typically contain the following fields:
-
_id
: A unique identifier for the file. -
length
: The total size of the file. -
chunkSize
: The size of each data block. -
uploadDate
: The date the file was uploaded. -
filename
: File name. -
metadata
: Additional metadata for the file (e.g., file type, author, etc.).
-
How GridFS stores files
When you upload a large file to MongoDB, GridFS will:
- Splits the file into multiple chunks (default 255KB per chunk) and stores these chunks in the
In the assembly.
- Stores metadata about the file (e.g., filename, size, upload time, etc.) in the
In the assembly.
- Each data block and file metadata is passed through the
files_id
fields are associated together.
Why does MongoDB use GridFS to store files?
MongoDB uses GridFS to store files, primarily to overcome several limitations:
1. 16MB Document size limit
MongoDB can only store a maximum of 16MB of data in a single document. Since many files (such as video, audio, or high-resolution images) are much larger than this size, GridFS provides a way to split these large files into multiple smaller chunks, each of which can be stored individually with thefiles_id
Associate these blocks with the original file.
2. Supports large file storage
GridFS splits files into smaller chunks, allowing MongoDB to store files of any size. Each block of data can be stored as a separate document in MongoDB, avoiding performance problems caused by a single file being too large.
3. easy retrieval
GridFS provides a structured way to store and retrieve large files. Each file is given a unique_id
and that each block of the file can be used in accordance with thefiles_id
Lookup. You can use the file ID to retrieve the entire file, just like a normal MongoDB query.
4. Provide document metadata support
GridFS not only stores the content of the file, but also the metadata of the file (e.g. file name, upload time, size, etc.), making file management more efficient and flexible. Metadata is stored in the in the collection, making it easier to retrieve and manage documents.
5. Distributed storage and replication
Files and blocks stored by GridFS follow MongoDB's distributed architecture. Files and blocks are distributed and replicated across a MongoDB cluster, which improves the availability, reliability, and scalability of file storage. You can utilize MongoDB's replication feature (Replication) to ensure redundant backups of files.
6. Load files on demand
GridFS supports loading blocks of files on demand. When you request a file, MongoDB loads it from the The corresponding block in the collection is fetched and assembled into a complete file. This on-demand file loading reduces memory and storage consumption and is suitable for handling large files.
Storing Files with GridFS
The following is an example of using MongoDB's GridFS to store and read files:
Store the file:
// utilization MongoDB (used form a nominal expression) GridFS
const { MongoClient, GridFSBucket } = require('mongodb');
// connect to MongoDB clustering
async function storeFile() {
const client = await ('mongodb://localhost:27017');
const db = ('mydb');
const bucket = new GridFSBucket(db, { bucketName: 'myfiles' });
// Read the file and upload it to the GridFS
const fs = require('fs');
const uploadStream = ('');
('').pipe(uploadStream);
('File uploaded successfully!');
}
storeFile();
Read the file:
const { MongoClient, GridFSBucket } = require('mongodb');
// connect to MongoDB clustering
async function readFile(fileId) {
const client = await ('mongodb://localhost:27017');
const db = ('mydb');
const bucket = new GridFSBucket(db, { bucketName: 'myfiles' });
// through (a gap) GridFS Read file in
const downloadStream = (fileId);
(('downloaded_example.txt'));
('File downloaded successfully!');
}
readFile('some-file-id'); // Use the file's ObjectId
summarize
- GridFS is a mechanism provided by MongoDB for storing large files, which overcomes MongoDB's 16MB document size limitation by splitting the file into multiple chunks and storing them in different documents.
- It is characterized by high availability, easy management and retrieval, and is suitable for storing large files such as audio and video.
- With GridFS, MongoDB can manage large files like ordinary data and provide support for file metadata, making file storage more efficient and flexible.
ultimate
The above is V organized on the MongoDB interview topic, the wrong place welcome correction, pay attention to Vigo love programming, life is boundless.