• Stars
    star
    213
  • Rank 181,514 (Top 4 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Free Malware Training Datasets for Machine Learning

MalwareTrainingSets

Please check it out: https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/

For an updated followUP please check it out: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/

Cite The DataSet
If you find those results useful please cite them :

@misc{ MR,
   author = "Marco Ramilli",
   title = "Malware Training Sets: a machine learning dataset for everyone",
   year = "2016",
   url = "https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/",
   note = "[Online; December 2016]"
 }

UPDATE Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section). You might use mist_json.py as a reporting module from CuckooSandbox and the script fromMongoToARFF.py to generate ARFF files suitables for WEKA.

If you are going to create new datasets by running your local CuckooSandbox using mist_json.py module and you wanto to share them, please feel free to make pool requests !

If you want to know more about the working flow, please check this update: https://marcoramilli.com/2019/05/14/malware-training-sets-followup/