Refreshing a low version of gitlab - how to implement high version API functionality on a low version of gitlab

Preface:This article mainly records the low version of gitlab (v3 api) based on the implementation of in-line comment function in the process of stepping on the pit and the corresponding solution, in theory, other low version of gitlab does not have the API can be referred to such a method of realization (as long as you can operate through the page to capture the relevant interfaces), so as to make the low version of gitlab a new lease on life ~!

contexts

Recently we landed on AI Code review practice, in the CICD process when merge request is created, AICR is performed based on MR's diff code and the issue is submitted as a comment as shown below:

We would like to further improve the experience: in addition to the overall comment, we would also like to insert all questions in the form of in-line comments to the corresponding line number of the code snippet, so as to create a seamless experience of looking at the code and the question at the same time when developing the CR (no need to frequently switch between the Discussion question list and the Code Changes panel). )

However, for some historical reasons, we still have two sets of gitlabs, the old and new, and the old gitlab has a lower version (8.16 v3 API) and does not have an existing in-line comment API.The following is a two-phase presentation of how to implement in-line comment functionality on a lower version of gitlab and how it is compatible with some of the special scenarios to enhance the user experience

Phase I(Initial realization of functionality)

Demand:

According to the line number returned by AI, the corresponding issue in-line comment to the code changes panel, convenient to develop CR code can see the relevant issues, the desired effect is shown in the figure below:

Question:

Lower versions of gitlab do not have an in-line comment API.

Solution:

By crawling the interface related to the page operation, and finally by simulating the login state and obtaining all the required parameters, the in-line comment request is constructed as equivalent to the operation on the web page.

Solution Process:

1) Interface analysis

By grabbing the in-line comment request on the web page, it was found that there is a NOTES request. After analyzing and replaying the attempt, it was found that there are the following mandatory parameters that need to be dynamically obtained depending on the situation:

merge_request_diff_head_sha (head_sha)
target_id (unique ID of the Merge Request)
note[noteable_id] (consistent with target_id)
note[position] (new/old file paths, base_sha, start_sha, head_sha, new_line, old_line, etc.; of these, new_line and old_line have stepped in the most potholes because of them, as will be mentioned below)
note[note] (content of the comment)

2) Parameter acquisition

Parameters such as base_sha, start_sha, head_sha, etc. can be obtained through the `/projects/:id/merge_requests/:merge_request_id/versions` interface
Parameters such as old_path, new_path, etc. can be obtained through the `/projects/:id/merge_requests/:merge_request_id/changes` interface
authenticity_token requires access to the page being operated on, and then parsing the page to get it (this can be done in conjunction with BeautifulSoup to parse the structure of the page).
It is also necessary to construct the login state (based on the privileged account login and save the session, and then combined with () for subsequent requests)
new_line this stage is directly using the line number provided by the AI (old_line did not pass, because the test scenario is purely new code old_line does not pass can also be successful, only to realize that it is a pit)

3) Request construction

Assemble these parameters in the relevant structure to construct an in-line comment request equivalent to the page action.

This completes the basic in-line comment functionality.

Main Code:

in-line comment request

    def gitlab_v3_in_line_comment(self, project_id, mr_id, mr_iid, comment, file_name, old_line, new_line):
        '''gitlabLow version not readily availablein-line comment API,Therefore, the interface curve through the page crawl(You need to simulate the login state and construct the relevant parameters)
        comment: Elements to be Remarked,i.e. code problem description
        file_name: AIThe name of the file returned
        code_line: AIReturned line number
        '''
        # gaingitlab session
        session = self.gitlab_v3_get_login_session()
        # Step 3: Re-extract the comment page's authenticity_token(You cannot follow the login page'sauthenticity_token)
        path_with_namespace = self.gitlab_get_project_by_id(project_id).get('path_with_namespace', '')
        comment_page = (f"/{path_with_namespace}/merge_requests/{mr_iid}")
        soup = BeautifulSoup(comment_page.text, "")
        authenticity_token = ("input", {"name": "authenticity_token"})["value"]
        # print("MRweb pageauthenticity_token", authenticity_token)
 
        # step 4: gainMRRelated information
        base_sha, start_sha, head_sha = self.gitlab_v3_get_mr_diff_versions(project_id, mr_id)  # gainMR(used form a nominal expression)diffversion information
        # groundAIThe name of the file returned,gainold_path、new_path
        result = self.gitlab_v3_get_single_mr_changes(project_id, mr_id, file_name)
        if result:
            old_path = result[0].get('old_path', '')
            new_path = result[0].get('new_path', '')
        else:
            old_path = new_path = file_name
 
        # Step 5: Structuring a comment request
        comment_url = f"/{path_with_namespace}/notes"
        comment_payload = {
            "utf8": "✓",
            "authenticity_token": authenticity_token,
            "view": "inline",
            "line_type": "",    # Add Code Passingnew,Already have code to pass empty(But it works even if it's empty.)
            "merge_request_diff_head_sha": head_sha,
            "target_type": "merge_request",
            "target_id": f"{mr_id}",  # Merge Request (used form a nominal expression)唯一 ID
            "note[commit_id]": "",
            # "note[line_code]": "25d1157f6eee34be77947760c3d83e1f34efeb31_235_238",    # web page锚点id_oldline_newline,optional parameter
            "note[noteable_id]": f"{mr_id}",   # [necessarily]together with target_id concordance
            "note[noteable_type]": "MergeRequest",
            "note[type]": "DiffNote",
            # "note[position]": '{"old_path":"","new_path":"","new_line":55,"base_sha":"29f230c853e957e049c7ef3cf8ba7435f82479ef","start_sha":"29f230c853e957e049c7ef3cf8ba7435f82479ef","head_sha":"3160984ed116026742a911ec7d8cf332e4fd4c3c"}',
            "note[position]": ({
                "old_path": old_path,   # Old file path,pass (a bill or inspection etc)gitlab_v3_get_single_mr_changes接口gain
                "new_path": new_path,   # New file path,ibid
                "old_line": old_line,   # 若是新旧行号不concordance(used form a nominal expression)情况,old_line/new_lineall must be transmitted,Otherwise, an error will be reported.(采用web page解析方案gain，Stage 1 stepped in the hole by not passing this parameter)
                "new_line": new_line,  # 针对新增(used form a nominal expression)代码进行in-line comment,Just pass in this parameter,old_linemay not be passed on
                "base_sha": base_sha,   # MR(used form a nominal expression)diffversion information,pass (a bill or inspection etc)gitlab_v3_get_mr_diff_versions接口gain
                "start_sha": start_sha,  # together withbase_shagain方法concordance
                "head_sha": head_sha  # together withbase_shagain方法concordance
            }),
            "note[note]": comment,   # 具体(used form a nominal expression)评论内容
            "commit": "Comment"
        }
        # print('\n -----构造(used form a nominal expression)评论参数----- ', comment_payload)
 
        # Step 6: Send comment request
        response = (comment_url, data=comment_payload)
        # print('Comment results : ', response.status_code, )
        return response

gitlab session handling

    def gitlab_v3_get_login_session(self):
        '''
        gitlab login state
        If the session data in the cache is not invalid, get the data directly from the cache and construct the session.
        If it's not, log in again to get the session and cache it.
        '''
        SESSION_CACHE_KEY = "gitlab_session_for_AICR"
        SESSION_TIMEOUT = 3600 * 8

        # Get the authenticated session, or use it if it exists in the cache
        session_data = (SESSION_CACHE_KEY)
        if session_data.
            # Retrieve the session from the cache
            session = ()
            (session_data.get("cookies", {}))
            (session_data.get("headers", {})))
            return session

        # If the cache doesn't exist, log in and cache again
        # Step 1: Get the login page and extract authenticity_token
        login_url = "/users/sign_in"
        session = ()
        login_page = (login_url)
        soup = BeautifulSoup(login_page.text, "")
        auth_token_input = ('input', {'name': 'authenticity_token', 'type': 'hidden'})
        if auth_token_input.
            authenticity_token = auth_token_input['value']
        else: authenticity_token = auth_token_input['value']
            return ''

        # Step 2: Login
        login_payload = {
            "user[login]": settings.AICR_USER, # AI-CodeReviewer username
            "user[password]": settings.AICR_PASS, # AI-CodeReviewer password
            "authenticity_token": authenticity_token # authenticity_token for the login page
        }
        response = (login_url, data=login_payload)
        # Check if login was successful (based on what GitLab's page returned or the status code)
        if response.status_code ! = 200 or "Sign in" in .
            ("gitlab Login failed. Please check credentials.")
        else.
            # Cache session (session objects cannot be serialized directly, cookies and headers need to be extracted for subsequent construction)
            session_data = {
                "cookies": .get_dict(), "headers": dict(), "cookies": .get_dict(), "headers": .get_dict()
                "headers": dict()
            }
            (SESSION_CACHE_KEY, session_data, timeout=SESSION_TIMEOUT)
        return session

Phase II(Special scenario compatible)

Question:

After the first phase of the function has been running online for some time, we found that many AICR issues were not successfully submitted in-line comment, after analyzing the following two main reasons:

The line number returned by AI is hidden and collapsed in the gitlab MR changes panel (by default we can't select it on the web page, and naturally we can't submit a comment based on it).
For the changed code, there will be old_line, new_line inconsistent scenarios, in this case you need to pass both the new/old line number in order to successfully add the comment (and the old/new line number can only be obtained on the page, there is no corresponding api to get)

Anticipated demand:

To address the above issues, we hope to achieve the following results to improve the user experience

1) When the line number returned by the AI is hidden and collapsed in the MR changes panel, note the relevant AICR question to the most neighboring line as follows

2) For the case where the change code old_line and new_line are inconsistent, get the actual new/old line number displayed on the page to construct the request.

Solution:

1) Construct the login state request interface to get the html source code of all diff files

2) Combine BeautifulSoup to parse the html source code obtained in step 1, find the corresponding file div, and then get the data-position information of all line numbers

3) Construct line_mapping_dict, which is used to determine the validity of the incoming line number parameter and do compatibility processing

Solution Process:

1) Page analysis

MR Changes panel each file is a large div block, where each line of code corresponds to a tr, and then each tr has the relevant line number information, we just need to parse the entire page and then get the relevant line numbers can be

There is also a small episode in the process: the actual application found that the line number in the MR page does not show the case, corresponding to the line number of the in-line comment parameter must be passed null, if you pass the <tr> tags to show the original line number will also report an error ... ... Therefore, continue to look for laws on the page, found in the <td> tag in the data-position attribute meets the needs (the page does not show the line number of the value of null), so further adjustments to the parsing logic

2) Dynamic interface information acquisition and parsing

In addition, we also found that MR's change page is not a purely static page, the code diff information is obtained dynamically through the interface. Therefore, we can't get the code diff content directly by accessing the URL of MR, we need to call the corresponding interface to get it, so we need to parse the return content of the interface (the good thing is that the return content of the interface is also in html format, which can be combined with BeautifulSoup to quickly deal with it).

3) Determine the valid new/old line number based on the line number given by the AI

The new/old line numbers displayed on the page for each file are stored and then a validity determination is made based on the line numbers given by the AI:

If the given line number does not exist, return the most adjacent new/old line number displayed on the most page
If it exists, the corresponding new/old line number is returned directly

After stepping through various potholes, the final result is as follows:

Main Code:

gitlab MR changes page - Getting line number information for code snippets

    def get_mr_changefile_display_lines(self, project_id, mr_iid, file_name).
        '''
        Get all the displayed line numbers of the corresponding files in the MR changes board, for subsequent processing compatible with the following special cases.
        1. The line numbers returned by AI are collapsed and hidden in the MR changes section.
        2. In case of code change, the line number of the code changes, the line number of the in-line comment needs to be passed to both old_line and new_line.
        return: line_mapping_dict eg: {4: (4, 4), 5: (5, 5), 357: (357, 360), 360: (357, 360)}
        '''
        session = self.gitlab_v3_get_login_session()
        path_with_namespace = self.gitlab_get_project_by_id(project_id).get('path_with_namespace', '')
        resp = (f"/{path_with_namespace}/merge_requests/{mr_iid}/")
        html_content = ().get('html', '')



        # Parsing HTML fragments with BeautifulSoup
        soup = BeautifulSoup(html_content, '')

        # Define the keyword for the target file
        target_keyword = f"{file_name}/diff" # e.g. services/numeric/numeric_service.go/diff
        # Find the div tag whose data-blob-diff-path attribute contains a specific string
        target_div = (
            'div',
            attrs={"data-blob-diff-path": lambda value: value and target_keyword in value}
        )
        # If the target div is found, parse it further
        if target_div.
            # Find all classes under this div that match the condition <td>, like <td class="line_content new noteable_line old">.
            td_elements = target_div.find_all(
                'td',
                attrs={'class': lambda x: x and set(['line_content', 'noteable_line']).issubset(())}
            )
            # Extract all line numbers and construct a line mapping dictionary, e.g. {4: (4, 4), 5: (5, 5), 357: (357, 360), 360: (357, 360)}
            line_mapping_dict = {}
            for td in td_elements.
                # Get the data-position attribute
                data_position = ('data-position')
                if data_position: {} for td in td_elements: # Get the data-position attribute.
                    try.
                        # Parse the JSON string into a dictionary
                        position_data = (data_position)
                        old_line = position_data.get('old_line')
                        new_line = position_data.get('new_line')
                        # old_line, new_line as the key stored once, so that you can subsequently match any line number query to the corresponding old_line, new_line pair
                        if old_line: line_mapping_dict[old_line] = (old_line, new_line) # Avoid storing this None: (None, 357)
                        if new_line: line_mapping_dict[new_line] = (old_line, new_line)
                    except : (None, 357)
                        ("Failed to decode JSON in data-position: %s", data_position)
            # print(line_mapping_dict)
            return line_mapping_dict
        else: "Failed to decode JSON in data-position: %s", data_position
            # print(f "Div containing {target_keyword} not found")
            return None

Line Number Validity Determination and Compatible Processing Logic

def find_line_mapping(line_number, line_mapping_dict)::
    """
    Finds the associated new/old line number in line_mapping_dict based on the line number provided by the AI.
    If the line number exists, return it directly; if not, return the closest line number.
    return: (old_line, new_line)
    """
    if not line_number or not line_mapping_dict.
        return None
    # print("line_number : ", line_number)
    if line_number in line_mapping_dict: return line_mapping_dict.
        return line_mapping_dict[line_number]
    # Get all the line numbers and sort them
    sorted_lines = sorted(line_mapping_dict.keys())
    # Find the closest line number
    closest_line = None
    min_diff = float('inf') # Initially set to infinity for the first comparison.
    for line in sorted_lines.
        diff = abs(line - line_number)
        if diff < min_diff.
            min_diff = diff
            closest_line = line
    return line_mapping_dict[closest_line]

The above is a low version of gitlab on the implementation of in-line comment the whole process, I hope to also have such needs of friends to provide a reference idea