• Stars
    star
    198
  • Rank 196,567 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Malware dataset for security researchers, data scientists. Public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers

Total Downloads

Windows Malware Dataset with PE API Calls

Our public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers for malware analysis in csv file format for machine learning applications.

Cite The DataSet
If you find those results useful please cite them :

@article{10.7717/peerj-cs.346,
 title = {Data augmentation based malware detection using convolutional neural networks},
 author = {Catak, Ferhat Ozgur and Ahmed, Javed and Sahinbas, Kevser and Khand, Zahid Hussain},
 year = 2021,
 month = jan,
 keywords = {Convolutional neural networks, Cybersecurity, Image augmentation, Malware analysis},
 volume = 7,
 pages = {e346},
 journal = {PeerJ Computer Science},
 issn = {2376-5992},
 url = {https://doi.org/10.7717/peerj-cs.346},
 doi = {10.7717/peerj-cs.346}
}

Publications

The details of the Mal-API-2019 dataset are published in following the papers:

  • [Link] AF. Yazı, FÖ Çatak, E. Gül, Classification of Metamorphic Malware with Deep Learning (LSTM), IEEE Signal Processing and Applications Conference, 2019.
  • [Link] Catak, FÖ., Yazi, AF., A Benchmark API Call Dataset for Windows PE Malware Classification, arXiv:1905.01999, 2019.

Introduction

This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.

Malware Types and System Overall

In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.

Malware Family Samples Description
Spyware 832 enables a user to obtain covert information about another's computer activities by transmitting data covertly from their hard drive.
Downloader 1001 share the primary functionality of downloading content.
Trojan 1001 misleads users of its true intent.
Worms 1001 spreads copies of itself from computer to computer.
Adware 379 hides on your device and serves you advertisements.
Dropper 891 surreptitiously carries viruses, back doors and other malicious software so they can be executed on the compromised machine.
Virus 1001 designed to spread from host to host and has the ability to replicate itself.
Backdoor 1001 a technique in which a system security mechanism is bypassed undetectably to access a computer or its data.

Figure shows the general flow of the generation of the malware data set. As shown in the figure, we have obtained the MD5 hash values of the malware we collect from Github. We searched these hash values using the VirusTotal API, and we have obtained the families of these malicious software from the reports of 67 different antivirus software in VirusTotal. We have observed that the malicious software families found in the reports of these 67 different antivirus software in VirusTotal are different.

Screenshot

Data Description

More Repositories

1

lstm_malware_detection

Jupyter Notebook
24
star
2

6g-channel-estimation-dataset

6G Wireless Communication Security - Deep Learning Based Channel Estimation Dataset
Jupyter Notebook
23
star
3

6g_security

6G and Security repository for telecommunications and AI research. We will share our implementations and publications in 5G and beyond technology, 6G, Security, Machine learning on 6G, Massive MIMO, THz communication and communication networks.
Jupyter Notebook
8
star
4

apache-http-logs

to detect vulnerability scans, XSS and SQLI attacks, examine access log files for detections.
8
star
5

devsecops-tutorial

DevSecOps best practices with a vulnerable Flask based web application
HTML
4
star
6

trustworthyai

Trustworthy AI: From Theory to Practice book. Explore the intersection of ethics and technology with 'Trustworthy AI: From Theory to Practice.' This comprehensive guide delves into creating AI models that prioritize privacy, security, and robustness. Featuring practical examples in Python, it covers uncertainty quantification, adversarial ML
Jupyter Notebook
4
star
7

spectrum_sensing

Jupyter Notebook
3
star
8

adversarial-ml-training

Jupyter Notebook
2
star
9

TradeRES-BC-Portal

TradeRES EU Project: A pioneering Ethereum-based blockchain framework for facilitating secure and efficient energy trading. This repository houses the smart contracts for the EnergyToken and EnergyExchange platforms, enabling the production, consumption, and trading of renewable energy tokens. Explore the future of decentralized energy markets.
Python
2
star
10

ocatak.github.io

JavaScript
1
star
11

book-projects

1
star
12

adversarial-detection

Jupyter Notebook
1
star
13

5g-rng

5G Spectrogram-based Random Number Generation for Devices with Low Entropy Sources
Jupyter Notebook
1
star
14

PMU-Anomaly-Detection

Trustworthy Cyber-physical Power Systems using AI: Dueling Algorithms for PMU Anomaly Detection and Cybersecurity
PureBasic
1
star