zhiqwang/sightseq

Stars
123
Rank 290,145 (Top 6 %)
Language
Python
License
MIT License
Created about 6 years ago
Updated about 5 years ago

zhiqwang/sightseq

zhiqwang

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Text Recognition
- Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Object Detection
- New Ren et al. (2015), Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Additionally:

All features of fairseq
Flexible to enable convolution layer, recurrent layer in CRNN
Positional Encoding of images

General Requirements and Installation

PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
Python version >= 3.5
Fairseq version >= 0.7.1
torchvision version >= 0.3.0
For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.

yolort

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.

shufaCV

demonet

Yet another ssd, with its runtime stack for libtorch, onnx and specialized accelerators.

simple-faster-rcnn

Object detection from torchvision, just make it more convenient to do some experiments.

zhiqwang.github.io

Codes and Notes

huo

🔥 日出江花红胜火，春来江水绿如蓝

yir

zhiqwang

sightsee