SecBERT
SecBERT
is a BERT
model trained on cyber security text, learned CyberSecurity Knowledge.
-
SecBERT
is trained on papers from the corpus of -
SecBERT
has its own vocabulary (secvocab
) that's built to best match the training corpus. We trained SecBERT and SecRoBERTa versions.
Table of Contents
Downloading Trained Models
SecBERT models now installable directly within Huggingface's framework:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecBERT")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecBERT")
tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecRoBERTa")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecRoBERTa")
Pretrained-Weights
We release the the pytorch version of the trained models. The pytorch version is created using the Hugging Face library, and this repo shows how to use it.
Using SecBERT in your own model
SecBERT models include all necessary files to be plugged in your own model and are in same format as BERT.
If you use PyTorch, refer to Hugging Face's repo where detailed instructions on using BERT models are provided.
Fill Mask
We proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q&A) in Cyber Security Domain.
First, as below shows Fill-Mask pipeline in Google Bert, AllenAI SciBert and our SecBERT .
cd lm
python eval_fillmask_lm.py
Downstream-tasks
TODO
Star-History
Donation
If this project help you reduce time to develop, you can give me a cup of coffee :)
AliPay(支付宝)
WechatPay(微信)
License
MIT © Kun