Based on YOLO to realize the slider authentication code cracking

Disclaimer: The ideas and techniques in this case are for learning and communication only. Please do not use them for illegal behavior.

I. Training models

Detailed training steps and exported model referencesSlider CAPTCHA Recognition Model Training

II. Model trial

Run the model through YoloDotNet, calculate the slider notch position and then return the coordinates to other applications using a RESTful interface.YoloDotNet Case ReferenceA First Look at YoloDotNet, an Object Detection Framework。

Main Steps:

1. Create webapi (c#) project

# -n:Specify the project name asslider_api
dotnet new webapi -n slider_api
# Switch to the project directory
cd slider_api
# Add two dependencies
dotnet add package 
dotnet add package YoloDotNet

2. Create the controller that will be used to calculate the slider position and return it. A few points to note:

The OnnxModel parameter when creating the Yolo model should be set to the model file trained in the first step.
The result of the model recognition is a BoundingBox object, not a coordinate position, which needs to be handled manually.The BoundingBox type is SKRectI, which can be understood as a rectangle. Observe that the attribute values in BoundingBox are Left, Right, Top, Bottom, Width and so on. Then the X coordinate axis of the center point of the gap in the background image should be calculated as Left+(Width/2).
There may be an error between the X-axis coordinates calculated in the experiment and the real coordinates of the manually dragged slider, but after several tests it was observed that this error does not fluctuate much and is within a relatively fixed range. Therefore the calculated coordinate value needs to be adjusted, what is the adjustment value? With the same CAPTCHA images were tested in the real environment and the model, get a set of data to calculate the average value, in this case only took 10 or so sets of data.
The model will return a pair of coordinate data, in this case only the position of the X-axis coordinates will be used for validation, because the background image of the CAPTCHA and the slider image are exactly the same height, and the slider will only move horizontally when the mouse is dragged over it.

Core Code:

    [ApiController]
    [Route("api/yolo")]
    public class YoloController : ControllerBase
    {

        private Yolo yolo;

        public YoloController()
        {
            yolo = new Yolo(new YoloOptions
            {
                OnnxModel = $"e:/4-code-space//models/",
                ModelType = ,
                Cuda = false,
                GpuId = 0,
                PrimeGpu = false,
            });
        }

        [HttpPost]
        public IActionResult Detect([FromForm] IFormFile image)
        {
            try
            {
                using var memoryStream = new MemoryStream();
                (memoryStream);
                var imageBytes = ();
                using var imageStream = new MemoryStream(imageBytes);
                using var skBitmap = (imageStream);
                using var skImage = (skBitmap);


                var results = (skImage, confidence: 0.25, iou: 0.7);
                int x = results[0]. + (results[0]. / 2);
                x-=20;//Comparison of results with the same picture compared to wq yolov5, the average needs to be reduced by 20
                ($"Recognition result: {x}");
                return Ok(x);
            }
            catch (Exception ex)
            {
                ($"Recognition error: {}");
                return BadRequest();
            }
        }
    }

3. Route the http request to the controller, modified by adding a () before the (). The sdk version of this project is 8.0.403.

III. Validation model calculations

1. According to the results calculated in the second step of the simulation to generate validation data, in accordance with the interface requirements of the data format splicing http message sent to the server to verify that the identification results are correct. Open the chrome developer tool to observe the real validation data, and summarize the following characteristics of the validation data after many experiments:

Slider CAPTCHA verification is carried out on the server side, the mouse drag slider process will produce a series of trajectory data, this trajectory data record is js implementation, the verification process will be validation code id and trajectory data and sent to the server side, the server side validation through the return of the target data.
Trajectory data has four attributes x, y, t, type, which represent horizontal coordinate, vertical coordinate, time, and mouse movement (down, move, up).
The y coordinate value will not be involved in the validation, but the y value in the trajectory data is not constant. Because dragging the mouse horizontally forms not an absolute horizontal line but an approximate horizontal line, the final y-value fluctuates up and down around 0, but the range of fluctuation cannot be too large.
The spacing of the t-values of the trajectory data changes from small to large and then from large to small. This is because dragging the slider is a process from slow to fast and then from fast to slow to finally stop. The x-value of the last trajectory point only needs to be close to the true value.
The x-value of the trajectory data is gradually increased from 0 to close to the actual value (the position of the gap returned by the model), and after approaching it can be reversed away from the actual value by another small distance. Because the slider in the real environment may drag too much and then back again.

As per the above analysis to generate simulated data, the java implementation code is as follows.

private List<MouseTrajectory> generateTrajectory(Integer width) {
    List<MouseTrajectory> trajectories = new ArrayList<>();
    Random random = new Random();
    int x = 0, y = 0, tmp = 0, t = 0;
    t = (50) + 50;
    // Press the mouse.
    (new MouseTrajectory(x, y, "down", t));
    // drag the mouse
    while (x < width) {
        x += (5) + 2;
        tmp = (100) % 4;
        if (tmp == 0) {
            y = (7) - 3;
        }
        t += (60) + 20;
        if (x > width) {
            x = width;
        }
        (new MouseTrajectory(x, y, "move", t));
    }
    // Release the mouse.
    (new MouseTrajectory(x, y, "up", t));
    return trajectories;
}

2. The generated track data and other query parameters are assembled according to the target interface format and sent to the server. the header value of the http message is kept exactly the same as the data in the chrome developer tool, as much as possible to simulate the real environment. Finally get the correct server response, the return data in this case is the base64 encoded value of the json format string, which needs to be decoded again.

In order not to expose the system interface address in the case, only a small portion of the returned data and the simulation-generated trajectory data are taken as screenshots:

Summarize two key points for the success of the experiment:

1. Model recognition accuracy. About 300 real environment CAPTCHA images were used to train the model.

2. Trajectory validation data as close as possible to the real data.