The collection of pre-trained, state-of-the-art AI models.
About ailia SDK
ailia SDK is a self-contained cross-platform high speed inference SDK for AI. ailia SDK provides a consistent C++ API on Windows, Mac, Linux, iOS, Android, Jetson and Raspberry Pi. It also supports Unity(C#), Python and JNI for efficient AI implementation. The ailia SDK makes great use of the GPU via Vulkan and Metal to serve accelerated computing.
How to use
ailia MODELS tutorial ζ₯ζ¬θͺη
Supported models
273 models as of 2023/04/10
Action recognition
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
mars | MARS: Motion-Augmented RGB Stream for Action Recognition | Pytorch | 1.2.4 and later | EN JP | |
st-gcn | ST-GCN | Pytorch | 1.2.5 and later | EN JP | |
ax_action_recognition | Realtime-Action-Recognition | Pytorch | 1.2.7 and later | ||
va-cnn | View Adaptive Neural Networks (VA) for Skeleton-based Human Action Recognition | Pytorch | 1.2.7 and later | ||
driver-action-recognition-adas | driver-action-recognition-adas-0002 | OpenVINO | 1.2.5 and later | ||
action_clip | ActionCLIP | Pytorch | 1.2.7 and later |
Anomaly detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
padim | PaDiM-Anomaly-Detection-Localization-master | Pytorch | 1.2.6 and later | EN JP | |
spade-pytorch | Sub-Image Anomaly Detection with Deep Pyramid Correspondences | Pytorch | 1.2.6 and later |
Audio processing
Model | Reference | Exported From | Supported Ailia Version | Blog |
---|---|---|---|---|
crnn_audio_classification | crnn-audio-classification | Pytorch | 1.2.5 and later | EN JP |
deepspeech2 | deepspeech.pytorch | Pytorch | 1.2.2 and later | EN JP |
pytorch-dc-tts | Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Pytorch | 1.2.6 and later | EN JP |
unet_source_separation | source_separation | Pytorch | 1.2.6 and later | EN JP |
transformer-cnn-emotion-recognition | Combining Spatial and Temporal Feature Representions of Speech Emotion by Parallelizing CNNs and Transformer-Encoders | Pytorch | 1.2.5 and later | |
auto_speech | AutoSpeech: Neural Architecture Search for Speaker Recognition | Pytorch | 1.2.5 and later | EN JP |
voicefilter | VoiceFilter | Pytorch | 1.2.7 and later | EN JP |
whisper | Whisper | Pytorch | 1.2.10 and later | JP |
clap | CLAP | Pytorch | 1.2.6 and later | |
wespeaker | WeSpeaker | Onnxruntime | 1.2.9 and later |
Background removal
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
U-2-Net | U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection | Pytorch | 1.2.2 and later | EN JP | |
u2net-portrait-matting | U^2-Net - Portrait matting | Pytorch | 1.2.7 and later | ||
u2net-human-seg | U^2-Net - human segmentation | Pytorch | 1.2.4 and later | ||
deep-image-matting | Deep Image Matting | Keras | 1.2.3 and later | EN JP | |
indexnet | Indices Matter: Learning to Index for Deep Image Matting | Pytorch | 1.2.7 and later | ||
modnet | MODNet: Trimap-Free Portrait Matting in Real Time | Pytorch | 1.2.7 and later | ||
background_matting_v2 | Real-Time High-Resolution Background Matting | Pytorch | 1.2.9 and later | ||
cascade_psp | CascadePSP | Pytorch | 1.2.9 and later | ||
rembg | Rembg | Pytorch | 1.2.4 and later | ||
dis_seg | Highly Accurate Dichotomous Image Segmentation | Pytorch | 1.2.10 and later | ||
gfm | Bridging Composite and Real: Towards End-to-end Deep Image Matting | Pytorch | 1.2.10 and later |
Crowd counting
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
crowdcount-cascaded-mtl | CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (Single Image Crowd Counting) |
Pytorch | 1.2.1 and later | EN JP | |
c-3-framework | Crowd Counting Code Framework(C^3-Framework) | Pytorch | 1.2.5 and later |
Deep fashion
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
clothing-detection | Clothing-Detection | Pytorch | 1.2.1 and later | EN JP | |
mmfashion | MMFashion | Pytorch | 1.2.5 and later | EN JP | |
mmfashion_tryon | MMFashion virtual try-on | Pytorch | 1.2.8 and later | ||
mmfashion_retrieval | MMFashion In-Shop Clothes Retrieval | Pytorch | 1.2.5 and later | ||
fashionai-key-points-detection | A Pytorch Implementation of Cascaded Pyramid Network for FashionAI Key Points Detection | Pytorch | 1.2.5 and later | ||
person-attributes-recognition-crossroad | person-attributes-recognition-crossroad-0230 | Pytorch | 1.2.10 and later |
Depth estimation
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
monodepth2 | Monocular depth estimation from a single image | Pytorch | 1.2.2 and later | ||
midas | Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer |
Pytorch | 1.2.4 and later | EN JP | |
fcrn-depthprediction | Deeper Depth Prediction with Fully Convolutional Residual Networks | TensorFlow | 1.2.6 and later | ||
fast-depth | ICRA 2019 "FastDepth: Fast Monocular Depth Estimation on Embedded Systems" | Pytorch | 1.2.5 and later | ||
lap-depth | LapDepth-release | Pytorch | 1.2.9 and later | ||
hitnet | ONNX-HITNET-Stereo-Depth-estimation | Pytorch | 1.2.9 and later | ||
crestereo | ONNX-CREStereo-Depth-Estimation | Pytorch | 1.2.13 and later | ||
mobilestereonet | MobileStereoNet | Pytorch | 1.2.13 and later |
Diffusion
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
latent-diffusion-txt2img | Latent Diffusion - txt2img | Pytorch | 1.2.10 and later | ||
latent-diffusion-inpainting | Latent Diffusion - inpainting | Pytorch | 1.2.10 and later | ||
latent-diffusion-superresolution | Latent Diffusion - Super-resolution | Pytorch | 1.2.10 and later | ||
stable-diffusion-txt2img | Stable Diffusion | Pytorch | 1.2.14 and later |
Face detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
yolov1-face | YOLO-Face-detection | Darknet | 1.1.0 and later | ||
yolov3-face | Face detection using keras-yolov3 | Keras | 1.2.1 and later | ||
blazeface | BlazeFace-PyTorch | Pytorch | 1.2.1 and later | EN JP | |
face-mask-detection | Face detection using keras-yolov3 | Keras | 1.2.1 and later | EN JP | |
dbface | DBFace : real-time, single-stage detector for face detection, with faster speed and higher accuracy |
Pytorch | 1.2.2 and later | ||
retinaface | RetinaFace: Single-stage Dense Face Localisation in the Wild. | Pytorch | 1.2.5 and later | ||
anime-face-detector | Anime Face Detector | Pytorch | 1.2.6 and later | ||
face-detection-adas | face-detection-adas-0001 | OpenVINO | 1.2.5 and later | ||
mtcnn | mtcnn | Keras | 1.2.10 and later |
Face identification
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
vggface2 | VGGFace2 Dataset for Face Recognition | Caffe | 1.1.0 and later | ||
arcface | pytorch implement of arcface | Pytorch | 1.2.1 and later | EN JP | |
insightface | InsightFace: 2D and 3D Face Analysis Project | Pytorch | 1.2.5 and later | ||
cosface | Pytorch implementation of CosFace | Pytorch | 1.2.10 and later |
Face recognition
Frame Interpolation
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
flavr | FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation | Pytorch | 1.2.7 and later | EN JP | |
cain | Channel Attention Is All You Need for Video Frame Interpolation | Pytorch | 1.2.5 and later | ||
film | FILM: Frame Interpolation for Large Motion | Tensorflow | 1.2.10 and later |
Generative adversarial networks
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
pytorch-gan | Code repo for the Pytorch GAN Zoo project (used to train this model) | Pytorch | 1.2.4 and later | ||
council-gan | Council-GAN | Pytorch | 1.2.4 and later | ||
restyle-encoder | ReStyle | Pytorch | 1.2.9 and later | ||
sam | Age Transformation Using a Style-Based Regression Model | Pytorch | 1.2.9 and later | ||
gfpgan | GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior | Pytorch | 1.2.10 and later | ||
sber-swap | SberSwap | Pytorch | 1.2.12 and later | ||
encoder4editing | Designing an Encoder for StyleGAN Image Manipulation | Pytorch | 1.2.10 and later |
Hand detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
yolov3-hand | Hand detection branch of Face detection using keras-yolov3 | Keras | 1.2.1 and later | ||
hand_detection_pytorch | hand-detection.PyTorch | Pytorch | 1.2.2 and later | ||
blazepalm | MediaPipePyTorch | Pytorch | 1.2.5 and later |
Hand recognition
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
blazehand | MediaPipePyTorch | Pytorch | 1.2.5 and later | EN JP | |
hand3d | ColorHandPose3D network | TensorFlow | 1.2.5 and later | ||
minimal-hand | Minimal Hand | TensorFlow | 1.2.8 and later | ||
v2v-posenet | V2V-PoseNet | Pytorch | 1.2.6 and later |
Image captioning
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
illustration2vec | Illustration2Vec | Caffe | 1.2.2 and later | ||
image_captioning_pytorch | Image Captioning pytorch | Pytorch | 1.2.5 and later | EN JP |
Image classification
Image inpainting
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
inpainting-with-partial-conv | pytorch-inpainting-with-partial-conv | PyTorch | 1.2.6 and later | EN JP | |
inpainting_gmcnn | Image Inpainting via Generative Multi-column Convolutional Neural Networks | TensorFlow | 1.2.6 and later | ||
3d-photo-inpainting | 3D Photography using Context-aware Layered Depth Inpainting | Pytorch | 1.2.7 and later | ||
deepfillv2 | Free-Form Image Inpainting with Gated Convolution | Pytorch | 1.2.9 and later |
Image manipulation
Image segmentation
Landmark classification
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
landmarks_classifier_asia | Landmarks classifier_asia_V1.1 | TensorFlow Hub | 1.2.4 and later | EN JP | |
places365 | Release of Places365-CNNs | Pytorch | 1.2.5 and later |
Line segment detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
mlsd | M-LSD: Towards Light-weight and Real-time Line Segment Detection | TensorFlow | 1.2.8 and later | EN JP | |
dexined | DexiNed: Dense Extreme Inception Network for Edge Detection | Pytorch | 1.2.5 and later |
Low Light Image Enhancement
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
agllnet | AGLLNet: Attention Guided Low-light Image Enhancement (IJCV 2021) | Pytorch | 1.2.9 and later | EN JP |
Natural language processing
Model | Reference | Exported From | Supported Ailia Version | Blog |
---|---|---|---|---|
bert | pytorch-pretrained-bert | Pytorch | 1.2.2 and later | EN JP |
bert_maskedlm | huggingface/transformers | Pytorch | 1.2.5 and later | |
bert_ner | huggingface/transformers | Pytorch | 1.2.5 and later | |
bert_question_answering | huggingface/transformers | Pytorch | 1.2.5 and later | |
bert_sentiment_analysis | huggingface/transformers | Pytorch | 1.2.5 and later | |
bert_zero_shot_classification | huggingface/transformers | Pytorch | 1.2.5 and later | |
bert_tweets_sentiment | huggingface/transformers | Pytorch | 1.2.5 and later | |
gpt2 | GPT-2 | Pytorch | 1.2.7 and later | |
rinna_gpt2 | japanese-pretrained-models | Pytorch | 1.2.7 and later | |
fugumt-en-ja | Fugu-Machine Translator | Pytorch | 1.2.9 and later | |
bert_sum_ext | BERTSUMEXT | Pytorch | 1.2.7 and later | |
sentence_transformers_japanese | sentence transformers | Pytorch | 1.2.7 and later | |
presumm | PreSumm | Pytorch | 1.2.8 and later | |
t5_base_japanese_title_generation | t5-japanese | Pytorch | 1.2.13 and later |
Neural Rendering
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
nerf | NeRF: Neural Radiance Fields | Tensorflow | 1.2.10 and later | EN JP |
NSFW detector
Model | Reference | Exported From | Supported Ailia Version | Blog |
---|---|---|---|---|
clip-based-nsfw-detector | CLIP-based-NSFW-Detector | Keras | 1.2.10 and later | JP |
Object detection
Object detection 3d
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
3d_bbox | 3D Bounding Box Estimation Using Deep Learning and Geometry | Pytorch | 1.2.6 and later | ||
3d-object-detection.pytorch | 3d-object-detection.pytorch | Pytorch | 1.2.8 and later | EN JP | |
mediapipe_objectron | MediaPipe Objectron | TensorFlow Lite | 1.2.5 and later | ||
egonet | EgoNet | Pytorch | 1.2.9 and later | ||
d4lcn | D4LCN | Pytorch | 1.2.9 and later |
Object tracking
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
deepsort | Deep Sort with PyTorch | Pytorch | 1.2.3 and later | EN JP | |
person_reid_baseline_pytorch | UTS-Person-reID-Practical | Pytorch | 1.2.6 and later | ||
abd_net | Attentive but Diverse Person Re-Identification | Pytorch | 1.2.7 and later | ||
siam-mot | SiamMOT | Pytorch | 1.2.9 and later | ||
bytetrack | ByteTrack | Pytorch | 1.2.5 and later | EN JPγ | |
qd-3dt | Monocular Quasi-Dense 3D Object Tracking | Pytorch | 1.2.11 and later | γ |
Optical Flow Estimation
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
raft | RAFT: Recurrent All Pairs Field Transforms for Optical Flow | Pytorch | 1.2.6 and later | EN JPγ |
Point segmentation
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
pointnet_pytorch | PointNet.pytorch | Pytorch | 1.2.6 and later |
Pose estimation
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
openpose | Code repo for realtime multi-person pose estimation in CVPR'17 (Oral) | Caffe | 1.2.1 and later | ||
lightweight-human-pose-estimation | Fast and accurate human pose estimation in PyTorch. Contains implementation of "Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose" paper. |
Pytorch | 1.2.1 and later | EN JP | |
pose_resnet | Simple Baselines for Human Pose Estimation and Tracking | Pytorch | 1.2.1 and later | EN JP | |
blazepose | MediaPipePyTorch | Pytorch | 1.2.5 and later | ||
efficientpose | Code repo for EfficientPose | TensorFlow | 1.2.6 and later | ||
movenet | Code repo for movenet | TensorFlow | 1.2.8 and later | EN JP | |
animalpose | MMPose - 2D animal pose estimation | Pytorch | 1.2.7 and later | EN JP | |
mediapipe_holistic | MediaPipe Holistic | TensorFlow | 1.2.9 and later | ||
ap-10k | AP-10K | Pytorch | 1.2.4 and later | ||
posenet | PoseNet Pytorch | Pytorch | 1.2.10 and later | ||
e2pose | E2Pose | Tensorflow | 1.2.5 and later |
Pose estimation 3d
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
lightweight-human-pose-estimation-3d | Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU. |
Pytorch | 1.2.1 and later | ||
3d-pose-baseline | A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17. |
TensorFlow | 1.2.3 and later | ||
pose-hg-3d | Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach | Pytorch | 1.2.6 and later | ||
blazepose-fullbody | MediaPipe | TensorFlow Lite | 1.2.5 and later | EN JP | |
3dmppe_posenet | PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" | Pytorch | 1.2.6 and later | ||
gast | A Graph Attention Spatio-temporal Convolutional Networks for 3D Human Pose Estimation in Video (GAST-Net) | Pytorch | 1.2.7 and later | EN JP | |
mediapipe_pose_world_landmarks | MediaPipe Pose real-world 3D coordinates | TensorFlow Lite | 1.2.10 and later |
Road detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
codes-for-lane-detection | Codes-for-Lane-Detection | Pytorch | 1.2.6 and later | EN JP | |
roneld | RONELD-Lane-Detection | Pytorch | 1.2.6 and later | ||
road-segmentation-adas | road-segmentation-adas-0001 | OpenVINO | 1.2.5 and later | ||
cdnet | CDNet | Pytorch | 1.2.5 and later | ||
lstr | LSTR | Pytorch | 1.2.8 and later | ||
ultra-fast-lane-detection | Ultra-Fast-Lane-Detection | Pytorch | 1.2.6 and later | ||
yolop | YOLOP | Pytorch | 1.2.6 and later | ||
hybridnets | HybridNets | Pytorch | 1.2.6 and later |
Rotation prediction
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
rotnet | CNNs for predicting the rotation angle of an image to correct its orientation | Keras | 1.2.1 and later |
Style transfer
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
adain | Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization | Pytorch | 1.2.1 and later | EN JP | |
psgan | PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer | Pytorch | 1.2.7 and later | ||
beauty_gan | BeautyGAN | Pytorch | 1.2.7 and later | ||
animeganv2 | PyTorch Implementation of AnimeGANv2 | Pytorch | 1.2.5 and later | ||
pix2pixHD | pix2pixHD: High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs | Pytorch | 1.2.6 and later |
Super resolution
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
srresnet | Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | Pytorch | 1.2.0 and later | EN JP | |
edsr | Enhanced Deep Residual Networks for Single Image Super-Resolution | Pytorch | 1.2.6 and later | EN JP | |
han | Single Image Super-Resolution via a Holistic Attention Network | Pytorch | 1.2.6 and later | ||
real-esrgan | Real-ESRGAN | Pytorch | 1.2.9 and later | ||
rcan-it | Revisiting RCAN: Improved Training for Image Super-Resolution | Pytorch | 1.2.10 and later | ||
swinir | SwinIR: Image Restoration Using Swin Transformer | Pytorch | 1.2.12 and later |
Text detection
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
craft_pytorch | CRAFT: Character-Region Awareness For Text detection | Pytorch | 1.2.2 and later | ||
pixel_link | Pixel-Link | TensorFlow | 1.2.6 and later | ||
east | EAST: An Efficient and Accurate Scene Text Detector | TensorFlow | 1.2.6 and later |
Text recognition
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
etl | Japanese Character Classification | Keras | 1.1.0 and later | JP | |
deep-text-recognition-benchmark | deep-text-recognition-benchmark | Pytorch | 1.2.6 and later | ||
crnn.pytorch | Convolutional Recurrent Neural Network | Pytorch | 1.2.6 and later | ||
paddleocr | PaddleOCR : Awesome multilingual OCR toolkits based on PaddlePaddle | Pytorch | 1.2.6 and later | EN JP | |
easyocr | Ready-to-use OCR with 80+ supported languages | Pytorch | 1.2.6 and later | ||
ndlocr_text_recognition | NDL OCR | Pytorch | 1.2.5 and later |
Vehicle recognition
Model | Reference | Exported From | Supported Ailia Version | Blog | |
---|---|---|---|---|---|
vehicle-attributes-recognition-barrier | vehicle-attributes-recognition-barrier-0042 | OpenVINO | 1.2.5 and later | EN JP | |
vehicle-license-plate-detection-barrier | vehicle-license-plate-detection-barrier-0106 | OpenVINO | 1.2.5 and later |
Commercial model
Model | Reference | Exported From | Supported Ailia Version | Blog |
---|---|---|---|---|
acculus-pose | Acculus, Inc. | Caffe | 1.2.3 and later |