Important premise: The GitLab data mount disk must be able to be read normally, and
/var/opt/gitlab/git-data/repositories
The data in the directory can be copied in full.
Recovering the project code becomes especially critical when the GitLab server unexpectedly goes down and there is no backup. The following is an optimized recovery process, which is simpler and more efficient than the traditional method.
1. Data copying and preparation
-
Mount the data disk
Mount the data disk of the downtime server to another running host or server. make sure/var/opt/gitlab/git-data
All contents in the directory can be completely copied to a new host or server.sudo mount /dev/sdX /mnt/data # Sample mount command, need to be adjusted according to actual situation
-
Copy data
Will/var/opt/gitlab/git-data
All contents in the directory are completely copied to the specified directory of the new host, for example/mnt/recovery
。sudo cp -r /mnt/data/var/opt/gitlab/git-data /mnt/recovery/
2. Identify project data
GitLab's project data is stored in/var/opt/gitlab/git-data/repositories/@hashed
In the directory, the folder name hashed and the project information cannot be directly identified. But each project folder (such as) under )
config
The file stores some information related to the project, and you can extract the warehouse owner and warehouse name.
Notice:
and
Folders can usually be ignored because they do not contain important code data, and
config
The file does not contain the warehouse owner and warehouse name.
3. Simplify recovery methods
Traditional recovery methods usually require building a new GitLab server and mirroring data, but this method has the following problems:
- You need to ensure that the GitLab versions of new and old servers are exactly the same, otherwise the data may not be mirrored correctly.
- The operation steps are cumbersome, time-consuming and error-prone.
In fact, we can use a simpler method to directly recover code without building a new server.
Take the project folder73/47/
As an example, the following are the specific steps:
-
Setting up a security directory
Since GitLab's project directory may be recognized as an unsafe directory, it needs to be marked as a safe directory through the following command:git config --global --add /mnt/recovery/repositories/@hashed/73/47/
-
Clone project
As mentioned above,config
The complete repository owner and repository name are stored in the file (e.g.author/project_name
). We can restore the project to the local directory through cloning operations. Assume that the target project path isyour_clone_dir/author/project_name
, then you can execute the following command to complete the cloning:git clone /mnt/recovery/repositories/@hashed/73/47/ your_clone_dir/author/project_name
4. Automatic recovery script
To further simplify operations, the following is a Python script that can quickly perform the above operations, simply providing the source directory of the hashed repository and the target directory of the cloned repository.
#!/usr/bin/env python
# -*-coding:utf-8 -*-
# =======================================================================================================
# Copyright (c) 2025 laugh12321 Authors. All Rights Reserved.
#
# Licensed under the GNU General Public License v3.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# /licenses/gpl-3.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the license.
# =======================================================================================================
# File : hashed_repo_cloner.py
# Version: 1.0
# Author : laugh12321
# Contact : laugh12321@
# Date : 2025/03/31 14:51:38
# Desc : None
# =======================================================================================================
from pathlib import Path
import configparser
import subprocess
import argparse
from typing import Optional
from import track
import sys
def extract_repo_name_from_config(config_path: Path) -> str:
"""
Extract the full path of the repository from the Git configuration file
:param config_path: Git configuration file path
:return: The complete path of the warehouse
:raises ValueError: If the configuration is missing the gitlab segment or fullpath key
:raises FileNotFoundError: If the configuration file does not exist
"""
if not config_path.is_file():
raise FileNotFoundError(f"Git config file not found: {config_path}")
config = ()
(config_path)
if 'gitlab' not in config or 'fullpath' not in config['gitlab']:
raise ValueError(f"Config file missing required gitlab section or fullpath key: {config_path}")
return ('gitlab', 'fullpath')
def add_safe_directory(git_dir: Path) -> None:
"""
Add Git directory to the secure directory list
:param git_dir: Git repository path
"""
(
["git", "config", "--global", "--add", "", str(git_dir)],
check=True,
stdout=, # Redirect stdout to /dev/null
stderr= # Redirect standard error to /dev/null
)
def clone_repository(source_dir: Path, target_dir: Path, repo_name: str) -> None:
"""
Clone the repository to the target directory
:param source_dir: source Git repository path
:param target_dir: target directory path
:param repo_name: repository name
"""
target_path = target_dir / repo_name
(
["git", "clone", str(source_dir), str(target_path)],
check=True,
stdout=, # Redirect stdout to /dev/null
stderr= # Redirect standard error to /dev/null
)
def process_git_repositories(hashed_repos_dir: Path, output_dir: Path) -> None:
"""
Process all hashedged Git repositories and clone them to the output directory
:param hashed_repos_dir: directory containing hashed repository
:param output_dir: output directory
"""
# Pre-filter .git directory, exclude wiki and design repositories
git_folders = [
folder for folder in hashed_repos_dir.rglob("*.git")
if not ((".", ".")))
]
if not git_folders:
print("No valid Git repositories found to process.")
Return
for git_folder in track(git_folders, description="Processing repositories"):
config_path = git_folder / "config"
try:
repo_name = extract_repo_name_from_config(config_path)
add_safe_directory(git_folder)
clone_repository(git_folder, output_dir, repo_name)
except Exception as e:
print(f"Error processing {git_folder.name}: {e}")
() # Terminate the program
def validate_directory(path: Optional[str]) -> Path:
"""
Verify and convert path string to Path object
:param path: path string
:return: Path object
:raises ValueError: if the path does not exist or is not a directory
"""
if path is None:
raise ValueError("Path cannot be None")
path_obj = Path(path)
if not path_obj.exists():
raise ValueError(f"Path does not exist: {path}")
if not path_obj.is_dir():
raise ValueError(f"Path is not a directory: {path}")
return path_obj
def parse_arguments():
"""
Resolve command line parameters
:return: namespace containing parameters
"""
parser = (
description="Clone the GitLab hashed repository to the target directory",
formatter_class=
)
parser.add_argument(
"--source",
type=str,
required=True,
help="The source directory containing the hashedged repository (must)"
)
parser.add_argument(
"--output",
type=str,
required=True,
help="Clone the target directory of the repository (must)"
)
return parser.parse_args()
def main():
args = parse_arguments()
try:
source_dir = validate_directory()
output_dir = Path()
process_git_repositories(source_dir, output_dir)
except ValueError as e:
print(f"Argument error: {e}")
return 1
return 0
if __name__ == "__main__":
exit(main())
How to use
Run the following command to start the script:
python hashed_repo_cloner.py --source gitlab_hashed_dir --output project_out_dir
5. Follow-up operations
-
Verify recovery results
Go to the cloned project directory, check the code integrity, and make sure all branches and commit records are correctly restored.cd project_out_dir/author/project_name git log # View submission history git branch -a # View all branches
-
Rehost to GitLab or other platform
If you need to rehost your recovered code to GitLab or other code hosting platforms, you can follow these steps:- Create a new repository on the target platform.
- Push locally cloned projects to a new repository:
git remote add origin <URL of the new repository> git push -u origin --all git push -u origin --tags
Through the above method, we do not need to build a new server or worry about version compatibility issues, and we can quickly and efficiently restore GitLab project code.