There are no reviews yet. Be the first to send feedback to the community and the maintainers!
MLE-LLaMA
Multi-language Enhanced LLaMAVisual-LLaMA
Open LLaMA Eyes to See the WorldIEA
Image Editing AnythingDiS
Scalable Diffusion Models with State Space BackboneVideo-Stable-Diffusion
Generate consistent videos with stable diffusion modelsGradient-Free-Textual-Inversion
Gradient-Free Textual Inversion for Personalized Text-to-Image GenerationStable-Edit
Text-based real image editing with stable diffusion modelsPerceiver-Music-Generation
music generation with perceiver-ar modelDeeCap
Dynamic Early Exit for Image CaptioningVespa
Video Diffusion State Space ModelsVisual-ChatGLM
Open ChatGLM Eyes to See the WorldPNAIC
Partially Non-Autoregressive Image CaptioningAIO
All In One: General Multimodal Large Language ModelFuture-Caption
Efficient modeling of future context for image captioningMeta-Ensemble
Meta-Ensemble Parameter LearningImage-Caption-Pytorch
Pytorch implementation for image caption baseline modelUAIC
Uncertainty-away image caption generationLatent-Dynamics
Exploring latent dynamics for visual storytellingMaskGMT
Masked generative music transformerMatrix-Analysis-and-Application
References and coding homework in matrix analysis and application course in UCASCleaned-Webvid
Use strategy to achieve clean webvid-10m datasetDiverse-Image-Caption
Promoting Coherence and Diversity in Image CaptioningVisual-MOSS
Makes MOSS model understand visual informationACSG
Actor-Critic Sequence Generation for Relative Difference CaptioningLQMA
Language Quantized Masked AutoEncodersDSC
descriptive synthetic captions in dalle3feizc
MAIC
Memory augmented image captioningSAIC
Semi-Autoregressive Image CaptioningarXiv-MM
Multimodal dataset for arXivDiffuCap
Controllable Image Captioning with Diffusion ModelUnion
Unifying Language-Image Pre-training via Single-Tower TransformerAAT
Attention-Aligned Transformer for Image CaptioningCLIP-MAE
When clip meet mae and beyondChinese-Image-Caption
An image captioner with Chinese languageViD
Text-to-Image Diffusion Models as Refined Visual LearnersMeta-ViT
Meta-ensemble parameter learning for Vision TransformerClipCap
Incorporating CLIP features into Transformer-based image captioningCLKA
Cross Lingual Knowledge Alignment for Stable Diffusion ModelsDiffusion-Model
A tutorial of diffusion model for text-guide image generationLLaMA-XL
LLaMA model Beyond Length LimitationGameTag
official implementation for GameTag algorithmMoE-MLLM
Mixture-of-Experts for Multimodal Large Language ModelsLove Open Source and this site? Check out how you can help us