Maple Dataset
The Maple Dataset Maple Dataset, publicly released by the Northeast Forestry University Cybersecurity Lab (/lab/), is a dataset for intrusion detection evaluation, which aims to improve the performance and reliability of anomalous basic Intrusion Detection Systems (IDSs) as well as Intrusion Prevention Systems (IPSs). With the increasing sophistication of cyberattacks, having a reliable and up-to-date dataset is extremely important for testing and validating IDS and IPS solutions.
And with today's diverse cyberattacks, attackers often employ hybrid attacks, such as combining viruses, *s and phishing at the same time. In this case, without high-quality data sets to test IDSs and IPSs, it is difficult to ensure that they are effective against these complex attacks. For example, unknown threats such as emerging zero-day attacks require up-to-date datasets to train and validate protection systems for timely detection and prevention.
The Maple Leaf dataset is designed to provide up-to-date and diverse attack data to help researchers and developers better evaluate and improve their intrusion detection and prevention systems. We generate and capture malicious traffic on a large number of services, containing the latest CVEs, as well as the types of malicious attacks that exist in the real world.
Dataset Official Website:/
Nature of the dataset:For research and academic may be free for public use, but please cite our official website or papers.
List of data sets.
DDoS: HTTP (Plain/gzip/random), TCP, UDP, ReCOIL, LOIC
DNS: DoH, DoQ, DoT (coming soon)
ICMP: Normal ICMP, Smuggled ICMP
MySQL: CVE-2012-2122
Nginx: CVE-2017-7529
OpenSSL: CVE-2022-0778, HeartBleed, Normal traffic
Windows OS: Windows 10 provision, Windows Update
VPN: Cisco AnyConnect, DNS Leak, * traffic (coming soon)
How to use
Using CSV files directly
The CSV provided in the dataset already conforms to the columns and meta-information in the CIC-IDS.
Just change the name of the *.csv loaded in Python.
Manual CSV generation from traffic files
Prepare the dataset traffic file (*.pcap) that you finished downloading above.
Open it using CICFlowMeter (/ahlashkari/CICFlowMeter).
Select Offline Mode to export to a CSV file.
Background of the study
Traditional methods of evaluating attack traffic and exploitation of datasets tend to have outdated content, insufficient traffic diversity, insufficient attack variety, and insufficient characterization. Importantly, with HTTPS/TLS encryption now commonplace, malicious traffic is encrypted in layers and cannot be parsed by security devices.
Thus, the Maple Leaf dataset provides a comprehensive, modern dataset for machine learners to learn malicious traffic features for intrusion detection research.
Code compatible with the use of the CIC-IDS dataset
If your code or model was trained or written with the CIC-IDS dataset, it can be directly replaced with the Maple Leaf dataset, our format is compatible with CIC-IDS.
You can use the tool CICFlowMeter directly to generate CSV files for input into machine learning models.
No need to rewrite code or make other changes.
Overview of dataset categories (with content)
What: The dataset contains the latest common attacks, similar to real-world network traffic (PCAP/PCAPNG format).
Flow Analysis: The results of network traffic analysis using CICFlowMeter, tagging flows based on timestamps, source and destination IP addresses, ports, protocols, and attack types, are stored in CSV files.
DDoS Attacks: The dataset includes DDoS attacks, which are common in real-world network traffic and are more diverse due to random content. The dataset is more diverse due to the randomized content; GET, POST, HEAD, and OPTIONS are the most common HTTP methods.
Traffic packets and datasets for each service segmentation : We provide datasets for each service (HTTP, HTTPS, SMTP, IMAP, POP3, FTP, SSH, RESTful API, gRPC, WASM).
Diverse traffic: For ping or HTTP, DDoS comes in many forms, TCP, UDP, SYN attacks, and ICMP smuggling, all of which are covered in our dataset.
N-Day Vulnerabilities: The dataset includes n-day vulnerabilities, such as the well-known vulnerability HeartBleed in OpenSSL, with the intention of including more CVE vulnerabilities in the future.
More features coming soon
DPDK, PF_RING support
If you have any questions or suggestions, please give us feedback.
Data generation
Unlike completely random traffic, we orchestrate the configuration of the traffic structure based on the behavioral patterns of users, endpoints and traffic in the real world. The traffic encryption method is based on HTTP, HTTPS and SM3/4 for packet construction. Simulations were made for SSH, RESTful API, gRPC, WASM traffic, these modern protocols and their various implementations, etc., form the main part of this dataset.
Processing tools
In the process of creating the dataset, we used a number of tools that we developed ourselves.
They are open source and can be downloaded for free from GitHub.
Tutorials are available in the repository for most tools.
Contact Us
Please feel free to contact us if you have any questions or need assistance:
E-mail: maple@
GitHub:/maple-nefu
QQ swarm:631300176
The official website of Northeast Forestry University Laboratory: /lab/