Reverse Virustotal Search Interface X-VT-Anti-Abuse-Header

Search example

look for sth.123, the web address is:/gui/search/123/comments

截图_20241008183426

request interface

GET /ui/search?limit=20&relationships%5Bcomment%5D=author%2Citem&query=123 HTTP/1.1
Accept-Encoding: gzip, deflate, br, zstd
Accept-Ianguage: en-US,en;q=0.9,es;q=0.8
Accept-Language: zh-CN,zh;q=0.9
Cache-Control: no-cache
Connection: keep-alive
Cookie: _gid=GA1.2.1662779803.1728383656; _ga=GA1.2.686372046.1728383655; _gat=1; _ga_BLNDV9X2JR=GS1.1.1728383655.1.1.1728383759.0.0.0
DNT: 1
Host: 
Pragma: no-cache
Referer: /
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36
X-Tool: vt-ui-main
X-VT-Anti-Abuse-Header: MTgwNjgyNDI1ODItWkc5dWRDQmlaU0JsZG1scy0xNzI4MzgzNzYxLjMxMg==
accept: application/json
content-type: application/json
sec-ch-ua: "Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
x-app-version: v1x304x0

Note the observation in which the parameters are found:X-VT-Anti-Abuse-Header It needs to be reversed, visually base64 encrypted, and the meaning of this parameter is clear - "anti-abuse header". When writing the implementation of the code layer of the crawler, try to include theUser-Agent、X-Tool、x-app-version Completion, as they are site-specific or common anti-crawl identification parameters.

It is worth noting thatX-Tool、x-app-version are fixed values.x-app-version You need to check the official website regularly for updates, not updating may be fine, test it yourself.

In other words, we currently get the following Headers:

{
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
    'X-Tool': 'vt-ui-main',
    'x-app-version': 'v1x304x0',
}

Starting to reverse

Next, reverseX-VT-Anti-Abuse-Header, overall no difficulty, but this site has anti-debugging measures (can't breakpoint debugging, can't backtrack from the requested initiator), this anti-debugging I didn't bother to address, but by looking it up directly.

look for sth.X-VT-Anti-Abuse-Header, can be obtained:

截图_20241008190340

directly into that js file:

截图_20241008190408

As you can see, our target parameter consists of the method() Calculated from. Global SearchcomputeAntiAbuseHeader：

截图_20241008190551

Serial number 1 is what we need, i.e., the implementation of the function, and serial number 2 is the call to that function, i.e., above theX-VT-Anti-Abuse-Header Source. Go to the js file where 1 is located.

截图_20241008190849

As you can see, the method is really very simple and doesn't require any backtracking or breakpoints, and can be hand scrubbed right out of the box. This site typically puts the anti-crawl inIt is better to guard against gentlemen than villains.Well done. Getting back to this method:

First get the second timestamp of the current time
It then generates a random number slightly larger than 1e10 and 5e14, and returns if the number generated is less than 50."-1"Otherwise, return the integer part of the number
Finally, the generated random numbers, fixed strings"ZG9udCBiZSBldmls"(meaning "don't cheat") and the current timestamp are spliced and performedbase64 encrypted

Interesting point:

> atob('ZG9udCBiZSBldmls ')
< 'dont be evil'

Fixed strings tell us not to do evil ...... This is nifty, hahahahaha

cryptographic implementation

# 
import base64
import random
import time


def computeAntiAbuseHeader():
    e = ()
    n = 1e10 * (1 + () % 5e4)
    raw = f'{n:.0f}-ZG9udCBiZSBldmls-{e:.3f}'
    res = base64.b64encode(())
    return ()


if __name__ == '__main__':
    print(computeAntiAbuseHeader())

Is it over?

See here you think it's over? oh~No, still a little bit short of it, despite your analysis and implementation of the above, you find that how you request is useless! The data is still not given to you, why?

Once again, move your eyes to the request interface:

Accept-Ianguage: en-US,en;q=0.9,es;q=0.8
Accept-Language: zh-CN,zh;q=0.9

Here's an old 6 request header that's easy to overlook:Accept-IanguageWell, that's the end of it, take a look at the full code example below.

"""
Page Flip Capture Implementation
2024surname Nian9moon27date solved
/gui/
"""
import time
import requests
import header
from import urlencode, urlparse

base_url = "/ui/search"
initial_params = {
    "limit": 20,
    "relationships[comment]": "author,item",
    "query": "baidu"
}

proxies = {
    'http': None,
    'https': None
}


def build_url(url, params):  # ☆
    return urlparse(url)._replace(query=urlencode(params)).geturl()


def get_headers():
    return {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
        'X-Tool': 'vt-ui-main',
        'X-VT-Anti-Abuse-Header': (),
        'x-app-version': 'v1x304x0',
        'accept-ianguage': 'en-US,en;q=0.9,es;q=0.8'
    }


def fetch_data(url):
    response = (url, headers=get_headers(), proxies=proxies)
    return ()


def process_data(data):
    for item in data['data']:
        print(f"ID: {item['id']}, Type: {item['type']}")


# primary cycle
next_url = build_url(base_url, initial_params)
while next_url:
    print(f"Fetching: {next_url}")
    json_data = fetch_data(next_url)

    # Checking for the availability of data
    if not json_data.get('data'):
        print("No more data.")
        break

    # Processing data on the current page
    process_data(json_data)

    # Get the next page of the URL
    next_url = json_data.get('links', {}).get('next')

    if not next_url:
        print("No more pages.")
        break

    (1)

print("Finished fetching all pages.")