Reverse Virustotal Search Interface X-VT-Anti-Abuse-Header
Search example
look for sth.123
, the web address is:/gui/search/123/comments
request interface
GET /ui/search?limit=20&relationships%5Bcomment%5D=author%2Citem&query=123 HTTP/1.1
Accept-Encoding: gzip, deflate, br, zstd
Accept-Ianguage: en-US,en;q=0.9,es;q=0.8
Accept-Language: zh-CN,zh;q=0.9
Cache-Control: no-cache
Connection: keep-alive
Cookie: _gid=GA1.2.1662779803.1728383656; _ga=GA1.2.686372046.1728383655; _gat=1; _ga_BLNDV9X2JR=GS1.1.1728383655.1.1.1728383759.0.0.0
DNT: 1
Host:
Pragma: no-cache
Referer: /
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36
X-Tool: vt-ui-main
X-VT-Anti-Abuse-Header: MTgwNjgyNDI1ODItWkc5dWRDQmlaU0JsZG1scy0xNzI4MzgzNzYxLjMxMg==
accept: application/json
content-type: application/json
sec-ch-ua: "Google Chrome";v="129", "Not=A?Brand";v="8", "Chromium";v="129"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
x-app-version: v1x304x0
Note the observation in which the parameters are found:X-VT-Anti-Abuse-Header
It needs to be reversed, visually base64 encrypted, and the meaning of this parameter is clear - "anti-abuse header". When writing the implementation of the code layer of the crawler, try to include theUser-Agent
、X-Tool
、x-app-version
Completion, as they are site-specific or common anti-crawl identification parameters.
It is worth noting thatX-Tool
、x-app-version
are fixed values.x-app-version
You need to check the official website regularly for updates, not updating may be fine, test it yourself.
In other words, we currently get the following Headers:
{
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
'X-Tool': 'vt-ui-main',
'x-app-version': 'v1x304x0',
}
Starting to reverse
Next, reverseX-VT-Anti-Abuse-Header
, overall no difficulty, but this site has anti-debugging measures (can't breakpoint debugging, can't backtrack from the requested initiator), this anti-debugging I didn't bother to address, but by looking it up directly.
look for sth.X-VT-Anti-Abuse-Header
, can be obtained:
directly into that js file:
As you can see, our target parameter consists of the method()
Calculated from. Global SearchcomputeAntiAbuseHeader
:
Serial number 1 is what we need, i.e., the implementation of the function, and serial number 2 is the call to that function, i.e., above theX-VT-Anti-Abuse-Header
Source. Go to the js file where 1 is located.
As you can see, the method is really very simple and doesn't require any backtracking or breakpoints, and can be hand scrubbed right out of the box. This site typically puts the anti-crawl inIt is better to guard against gentlemen than villains.
Well done. Getting back to this method:
- First get the second timestamp of the current time
- It then generates a random number slightly larger than 1e10 and 5e14, and returns if the number generated is less than 50.
"-1"
Otherwise, return the integer part of the number - Finally, the generated random numbers, fixed strings
"ZG9udCBiZSBldmls"
(meaning "don't cheat") and the current timestamp are spliced and performedbase64
encrypted
Interesting point:
> atob('ZG9udCBiZSBldmls ')
< 'dont be evil'
Fixed strings tell us not to do evil ...... This is nifty, hahahahaha
cryptographic implementation
#
import base64
import random
import time
def computeAntiAbuseHeader():
e = ()
n = 1e10 * (1 + () % 5e4)
raw = f'{n:.0f}-ZG9udCBiZSBldmls-{e:.3f}'
res = base64.b64encode(())
return ()
if __name__ == '__main__':
print(computeAntiAbuseHeader())
Is it over?
See here you think it's over? oh~No, still a little bit short of it, despite your analysis and implementation of the above, you find that how you request is useless! The data is still not given to you, why?
Once again, move your eyes to the request interface:
Accept-Ianguage: en-US,en;q=0.9,es;q=0.8
Accept-Language: zh-CN,zh;q=0.9
Here's an old 6 request header that's easy to overlook:Accept-Ianguage
Well, that's the end of it, take a look at the full code example below.
"""
Page Flip Capture Implementation
2024surname Nian9moon27date solved
/gui/
"""
import time
import requests
import header
from import urlencode, urlparse
base_url = "/ui/search"
initial_params = {
"limit": 20,
"relationships[comment]": "author,item",
"query": "baidu"
}
proxies = {
'http': None,
'https': None
}
def build_url(url, params): # ☆
return urlparse(url)._replace(query=urlencode(params)).geturl()
def get_headers():
return {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36',
'X-Tool': 'vt-ui-main',
'X-VT-Anti-Abuse-Header': (),
'x-app-version': 'v1x304x0',
'accept-ianguage': 'en-US,en;q=0.9,es;q=0.8'
}
def fetch_data(url):
response = (url, headers=get_headers(), proxies=proxies)
return ()
def process_data(data):
for item in data['data']:
print(f"ID: {item['id']}, Type: {item['type']}")
# primary cycle
next_url = build_url(base_url, initial_params)
while next_url:
print(f"Fetching: {next_url}")
json_data = fetch_data(next_url)
# Checking for the availability of data
if not json_data.get('data'):
print("No more data.")
break
# Processing data on the current page
process_data(json_data)
# Get the next page of the URL
next_url = json_data.get('links', {}).get('next')
if not next_url:
print("No more pages.")
break
(1)
print("Finished fetching all pages.")