Location>code7788 >text

Use C# to crawl fast author homepage and download videos/galleries (with source code and software download links)

Popularity:433 ℃/2024-08-26 13:50:33

Recently found some fast authors, the work is not bad, for the purpose of learning and research, decided to see how to crawl the data. Now there are some crawler tools on the Internet, but most of them are invalid, or not open source. So they wrote a small tool. First look at the results:
image
image
The software only needs to fill in the author's uid and the request cookie of the web version to realize the automatic download, the download directory is in the Download folder in the root directory of the program.
The software has also made countermeasures due to the fact that Racer's wind control is more powerful. However, it requires the user to click on the prompt text in the software, copy and paste it into the browser, and save the requested json to a local file. Use the parse local json button provided by the software to parse and download. If the returned json file is very short or there is no data, you need to refresh any page in Racer, that is, to tell Racer wind control that it is now normal browsing, there is no robot behavior.

Here's the idea behind building the whole app.

1. Preparation of Racer web site

  1. show (a ticket)/ , search for the author's nickname you want to crawl at the top to get to the author's homepage. You can also share the link to the author's homepage from the app and paste it in. Once the author homepage is loaded, the address in the address bar must be similar:/profile/xxxxxx。 The xxxxxx at the end is the user id of the author. remember this, copy it, you will use it later.

  2. Press F12 to open your browser's developer tools (I've said before that developer tools are good, a must for researching crawlers, so be sure to learn them).

  3. Select "Network" at the top of the developer tools, "All" as shown in the picture. Find the user id in the list of requests, click on it and the request header will appear on the right. There's a cookie in there that you need to memorize and copy out. If it's not there, remember to refresh the page.
    image

  4. In the list you can see a lot of requests, we need to find the web page to display the list of requests, that is, the beginning of the public, or directly in the upper left corner of the search for public, you can filter most of the irrelevant requests. The response data of this request contains the complete json response of the author's work.
    image

You can right-click on it to open it in a new tab page, and when it opens the address bar will show the address of the completed browser request. This URL need to remember, will be used subsequently. That count default is 12 or 20, we use to time, directly pull full, 9999 can be.
image
image

2. Postman intercepts the request, simulates the request, and generates C# request code

  1. Install the postman interceptor interceptor at/detail/postman-interceptor/aicmkgpgakddgnaphhhpliifpcfhicfo Needless to say, this is another godsend, and paired with the developer tools, it could theoretically take care of almost any crawler need.

  2. Open Postman and click Start Proxy in the lower right corner.
    image
    After you turn on blocking, go back to the web version of the author's homepage, refresh the page, and when it finishes loading, click Stop Blocking. Otherwise the list will keep growing as he will intercept all web requests from the computer. At this point the Postman interceptor will intercept a whole bunch of requests. Similarly, find the public requests, or type public in the upper left corner to filter out the ones we need.
    image
    Click on this request link
    image
    This is where Postman opens a new window containing all the parameters for requesting this link as well as the header information.
    image
    Click on the code tool on the far right of Postman to generate the code we need. You can choose from C#, python, js, curl, and more.
    image

3. Using WPF to write the interface and download logic

  1. New WPF project, in order to interface good-looking, this time I used the open source WPF UI, previously used HandyControl, MicaWPF, these are good UI control library.
    Download the use of open source Downloader, request the use of RestSharp, parse Json using NewtonsoftJson, in addition to recommending a free icon library FlatIcon.
    The interface is as follows:
Click to view code
<ui:FluentWindow
  x:Class=""
  xmlns="/winfx/2006/xaml/presentation"
  xmlns:x="/winfx/2006/xaml"
  xmlns:d="/expression/blend/2008"
  xmlns:local="clr-namespace:KuaishouDownloader"
  xmlns:mc="/markup-compatibility/2006"
  xmlns:ui="/wpfui/2022/xaml"
  Title="MainWindow"
  Width="900"
  Height="760"
  ExtendsContentIntoTitleBar="True"
  WindowBackdropType="Mica"
  WindowCornerPreference="Default"
  WindowStartupLocation="CenterScreen"
  mc:Ignorable="d">
  <Grid>
    <>
      <RowDefinition Height="Auto" /> <Grid> <> <>
      <RowDefinition Height="*" /> </>
    </>
    <ui:TitleBar Title="Racer author homepage artwork crawl" Height="32" /> </> <RowDefinition Height="*" /> </>
    <ui:Button
      x:Name="themeButton"
      x:Name="themeButton" = "1"
      Width="32"
      Height="32"
      Margin="0,0,8,0"
      Padding="0"
      HorizontalAlignment="Right"
      VerticalAlignment="Top"
      Click="Theme_Click"
      CornerRadius="16"
      FontSize="24"
      Icon="{ui:SymbolIcon WeatherMoon48}"
      ToolTip="Toggle Theme" />
    <ui:SnackbarPresenter
      x:Name="snackbarPresenter"
      = "1"
      VerticalAlignment="Bottom" /> <ui:SnackbarPresenter
    <StackPanel
      ="1"
      HorizontalAlignment="Center"
      VerticalAlignment="Center" /> <StackPanel ="1
      <Border
        Width="200"
        Height="200"
        HorizontalAlignment="Center"
        CornerRadius="100"> <Border Width="200" Height="200" HorizontalAlignment="Center
        <ui:Image
          x:Name="imgHeader"
          Width="200"
          Height="200"
          CornerRadius="100" /> </Border>.
      </Border>
      <ui:TextBlock
        x:Name="tbNickName"
        Margin="0,12,0,0"
        HorizontalAlignment="Center" /> <ui:TextBlock
      <StackPanel Margin="0,12,0,0" Orientation="Horizontal" >
        <ui:TextBlock
          Width="60"
          Margin="0,12,0,0"
          VerticalAlignment="Center"
          Text="uid" /> <ui:TextBlock Width="60
        <ui:TextBox
          x:Name="tbUid"
          Width="660"
          Height="36"
          VerticalContentAlignment="Center"
          ToolTip="App goes to author's homepage, share homepage - copy link, open link with browser, address bar usually changes to /profile/xxxxxx/ beginning, copy xxxxxx over" />
      </StackPanel>
      <StackPanel Margin="0,12,0,0" Orientation="Horizontal">
        <ui:TextBlock
          Width="60"
          VerticalAlignment="Center"
          Text="cookie" /> <ui:TextBlock Width="60" VerticalAlignment="Center
        <ui:TextBox
          x:Name="tbCookie"
          Width="660"
          Height="36"
          VerticalContentAlignment="Center"
          ToolTip="Get from the web-request header using browser developer tools" />
      </StackPanel>
      <StackPanel
        Margin="0,12,0,0"
        HorizontalAlignment="Center"
        Orientation="Horizontal"> <StackPanel
        <ui:Button
          x:Name="btnDownload"
          Height="32"
          Appearance="Primary"
          Click="Download_Click"
          Content="Start Download"
          CornerRadius="4 0 0 4"
          ToolTip="Default download to root directory of program, file date is the date of work release" />
        <ui:Button
          x:Name="btnParseJson"
          Height="32"
          Appearance="Primary"
          Click="ParseJson_Click"
          Content="..."
          CornerRadius="0 4 4 0"
          ToolTip="Parse json data saved from web or postman" /> /StackPage="ParseJson_Click" /> /StackPage="ParseJson_Click" />
      </StackPanel>
      <TextBlock
        Width="700"
        Margin="0,12,0,0"
        Foreground="Gray"
        MouseDown="CopyUrl"
        Text="Don't panic if you are being wind controlled by Racer, open the web version of Racer in your browser, scan the code to log in, click on me to copy the URL and paste it into your browser to open it. After opening if there is a very long long json data return, it is right. Copy the json and save it to a local json file, then use the second button to parse the json data to download it."
        TextWrapping="Wrap" />;
      <Expander Margin="0,12,0,0" Header="More Options">
        <StackPanel Orientation="Horizontal"&>
          <CheckBox
            x:Name="cbAddDate"
            Margin="12,0,0,0"
            VerticalAlignment="Center"
            Content="Add date to filename."
            IsChecked="True"
            ToolTip="Precede filenames with a logo like 2024-01-02 13-00-00 for easy sorting" />
          <CheckBox
            x:Name="cbLongInterval"
            Margin="12,0,0,0"
            VerticalAlignment="Center"
            Content="Increase download delay for artwork"
            IsChecked="True"
            ToolTip="Checked by default, the download delay between works is 5~10 seconds. Uncheck 1~5 seconds randomly, may be wind control" /> /StackPage="Increase download delay between works" />
        </StackPanel>
      </Expander>
    </StackPanel>
    <StackPanel
      ="1"
      Margin="0,0,0,-2"
      VerticalAlignment="Bottom"> <StackPanel = "1" Margin="0,0,0,-2
      <TextBlock x:Name="tbProgress" HorizontalAlignment="Center" />
      <ProgressBar x:Name="progress" Height="8" />
    /> </StackPanel>
    <ui:Button
      x:Name="infoButton"
      x:Name="infoButton" = "1"
      Width="32"
      Height="32"
      Margin="0,0,8,8"
      Padding="0"
      HorizontalAlignment="Right"
      VerticalAlignment="Bottom"
      Click="Info_Click"
      CornerRadius="16"
      FontSize="24"
      Icon="{ui:SymbolIcon Info28}"
      ToolTip="Acknowledgments" />
    <ui:Flyout
      x:Name="flyout"
      x:Name="flyout" = "1"
      HorizontalAlignment="Right"> <ui:Flyout
      <ui:TextBlock Text="Acknowledgments: &#xA;1. Microsoft Presentation Foundation&#xA;2. WPF-UI&#xA;3. RestSharp&#xA;4. &#xA;5. Downloader&#xA;6. Icon from FlatIcon" />
    </ui:Flyout>
  </Grid>.
</ui:FluentWindow>.

  1. The backend logic doesn't use MVVM, it's just a matter of convenience.
Click to view code
using ;
using ;
using RestSharp;
using ;
using ;
using ;
using ;
using ;
using ;

namespace KuaishouDownloader
{
    /// <summary>
    /// Interaction logic for 
    /// </summary>
    public partial class MainWindow
    {
        string downloadFolder = ;
        SnackbarService? snackbarService = null;

        public MainWindow()
        {
            InitializeComponent();
             += MainWindow_Loaded;
        }

        private void MainWindow_Loaded(object sender, RoutedEventArgs e)
        {
            snackbarService = new SnackbarService();
            (snackbarPresenter);

            if ((""))
            {
                var model = <AppConfig>((""));
                if (model != null)
                {
                     = ;
                     = ;
                }
            }
        }

        private void Theme_Click(object sender, RoutedEventArgs e)
        {
            if (() == )
            {
                 = new SymbolIcon(SymbolRegular.WeatherSunny48);
                ();
            }
            else
            {
                 = new SymbolIcon(SymbolRegular.WeatherMoon48);
                ();
            }
        }

        private async void Download_Click(object sender, RoutedEventArgs e)
        {
            try
            {
                 = false;
                 = false;

                if (() || ())
                {
                    snackbarService?.Show("draw attention to sth.", $"Please enteruidas well ascookie", , null, (3));
                    return;
                }

                var json = (new AppConfig() { Uid = , Cookie =  }, );
                ("", json);

                var options = new RestClientOptions("")
                {
                    Timeout = (15),
                    UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36",
                };
                var client = new RestClient(options);
                var request = new RestRequest($"/live_api/profile/public?count=9999&pcursor=&principalId={}&hasMore=true", );
                ("host", "");
                ("connection", "keep-alive");
                ("cache-control", "max-age=0");
                ("sec-ch-ua", "\"Not)A;Brand\";v=\"99\", \"Google Chrome\";v=\"127\", \"Chromium\";v=\"127\"");
                ("sec-ch-ua-mobile", "?0");
                ("sec-ch-ua-platform", "\"Windows\"");
                ("upgrade-insecure-requests", "1");
                ("accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7");
                ("sec-fetch-site", "none");
                ("sec-fetch-mode", "navigate");
                ("sec-fetch-user", "?1");
                ("sec-fetch-dest", "document");
                ("accept-encoding", "gzip, deflate, br, zstd");
                ("accept-language", "zh,en;q=0.9,zh-CN;q=0.8");
                ("cookie", );
                ("x-postman-captr", "9467712");
                RestResponse response = await (request);
                ();

                var model = <KuaishouModel>(!);
                if (model == null || model?.Data?.List == null || model?.Data?.List?.Count == 0)
                {
                    snackbarService?.Show("draw attention to sth.", $"Failed to get,It may have triggered Racer's wind control mechanism,Please wait a while and try again.。", , null, (3));
                    return;
                }

                await Download(model!);
            }
            finally
            {
                 = true;
                 = true;
            }
        }

        private async void ParseJson_Click(object sender, RoutedEventArgs e)
        {
            try
            {
                 = false;
                 = false;

                var dialog = new Microsoft.();
                 = "Jsonfile(.Json)|*.json";
                bool? result = ();
                if (result == false)
                {
                    return;
                }
                var model = <KuaishouModel>(()!);
                if (model == null || model?.Data?.List == null || model?.Data?.List?.Count == 0)
                {
                    snackbarService?.Show("draw attention to sth.", $"erroneousjson", , null, (3));
                    return;
                }

                await Download(model!);
            }
            finally
            {
                 = true;
                 = true;
            }
        }

        private async Task Download(KuaishouModel model)
        {
             = 0;
             = 0;
             = (double)model?.Data?.List?.Count!;
            snackbarService?.Show("draw attention to sth.", $"resolve to{model?.Data?.List?.Count!}entry,Start Download", , null, (5));

             = new (new Uri(model?.Data?.List?[0]?.Author?.Avatar!));
             = model?.Data?.List?[0]?.Author?.Name;

            string pattern = @"\d{4}/\d{2}/\d{2}/\d{2}";

            for (int i = 0; i < model?.Data?.List!.Count; i++)
            {
                DateTime dateTime = ;
                string fileNamePrefix = "";
                var item = model?.Data?.List[i]!;
                Match match = (!, pattern);
                if ()
                {
                    dateTime = new DateTime((("/")[0]), (("/")[1]),
                        (("/")[2]), (("/")[3]), 0, 0);
                    if ( == true)
                        fileNamePrefix = ("/")[0] + "-" + ("/")[1] + "-" + ("/")[2]
                            + " " + ("/")[3] + "-00-00 ";
                }
                downloadFolder = (, "Download", item?.Author?.Name! + "(" + item?.Author?.Id! + ")");
                (downloadFolder);

                switch (item?.WorkType)
                {
                    case "single":
                    case "vertical":
                    case "multiple":
                        {
                            await (item?.ImgUrls!, dateTime, downloadFolder, fileNamePrefix);
                        }
                        break;
                    case "video":
                        {
                            await (new List<string>() { item?.PlayUrl! }, dateTime, downloadFolder, fileNamePrefix);
                        }
                        break;
                }

                 = i + 1;
                 = $"{i + 1} / {model?.Data?.List!.Count}";
                Random random = new Random();
                if ( == true)
                    await ((5000, 10000));
                else
                    await ((1000, 5000));
            }

            snackbarService?.Show("draw attention to sth.", $"Download complete,Total downloads{model?.Data?.List!.Count}entry", , null, (1));
        }

        private void CopyUrl(object sender,  e)
        {
            if (())
            {
                snackbarService?.Show("draw attention to sth.", "Please enteruidas well ascookie", , null, (3));
                return;
            }
            ($"/live_api/profile/public?count=9999&pcursor=&principalId={}&hasMore=true");

            snackbarService?.Show("draw attention to sth.", "Duplication completed,Please paste into your browser to open", , null, (3));
        }

        private void Info_Click(object sender, RoutedEventArgs e)
        {
             = true;
        }
    }
}
  1. Download class, after downloading the file, the log of the file will be modified to publish the log, which is convenient for sorting as well as data analysis.
Click to view code
public static async Task Download(List<string> urls, DateTime dateTime, string downloadFolder, string fileNamePrefix)
{
    string file = ;
    try
    {
        var downloader = new DownloadService();
        foreach (var url in urls)
        {
            Uri uri = new Uri(url);
            file = downloadFolder + "\\" + fileNamePrefix + ();
            if (!(file))
                await (url, file);

            //Change the file date time to the time of the blog posting
            (file, dateTime);
            (file, dateTime);
            (file, dateTime);
        }
    }
    catch
    {
        (file);
        (new TextWriterTraceListener(downloadFolder + "\\_FailedFiles.txt", "myListener"));
        (file);
        ();
    }
}
  1. Source code sharing
    The full version of the code has been uploaded to Github/hupo376787/KuaishouDownloader , like to click Star thanks.

4. Downloading for use

show (a ticket)/hupo376787/KuaishouDownloader/releases/tag/1.0, click to download the zip file, unzip it, and use it as you did at the beginning.
image
image