Laion dreams
Overarching goal: Enable the open source community to openly build datasets, papers, models and tools in order to let AGI benefit humankind even faster.
Intro
Laion was initiated with the Laion5B project that successfully produced a 5B (image, text) pairs dataset by processing commoncrawl and filtering with clip. That method proved that itโs cheap to collect large scale dataset from the web using models like clip that give the similarity between items from 2 modalities.
Many models have been trained on laion400m proving the value of this method, with in particular openclip that reproduced the same results that the initial openai clip.
Letโs reproduce that method to more modalities!
Overall rationale
These projects and directions are projects that we would like to promote and help. We do not claim ownership as an organization on these projects. The people that build the projects own these projects.
Directions
Methods
- Open source: releasing everything openly
- Code: on github with an open license
- Model: freely distributed models
- Dataset: freely distributed datasets
- Open development: development is done in public on github and discord, everyone is encouraged to participate, whatever their nationality, age and diploma
Axis of work:
- Open tools
- Dataset collection
- Dataset preparation
- Distributed inference
- Distributed training
- Evaluation
- Datasets
- Open distribution
- Papers
- Models
- Open training
- Open distribution
Scientific domains
- All modalities dataset building
- Text image
- Text audio
- Text video
- Text 3d
- Contrastive and generative
- Contrastive
- Text image
- Text audio
- Text video
- Generative
- Text to image
- Image to text
- Contrastive
Projects
These projects are collaborations between many people. If you want to know who, check the links and ask in discord. We are open to new collaborators!
Dataset
Name | Modality | Status | Notes |
Laion400m | image/text | Done | > 10 papers using it |
Laion5B | image/text | Done | Largest open text/image dataset |
Laion5B high-resolution | image/text | Done | Largest open high-resolution text/image dataset |
Laion5B balanced | image/text | Just started | Balanced LAION-5B dataset for more efficient training |
laion3d | 3d/image/text | Just started | Trying to expand the laion idea to 3d |
Audio dataset | text/audio | Started | Started to be used to train an audio clip |
Model
Name | Modality | Kind | Status | Notes |
Openclip B/16 | image/text | contrastive | released | Reproduced openai clip |
Dalle2 prior/decoder | image/text | generative | Just started | Trying to reproduce dalle2 |
Clipcap | image/text | generative | works | Generate text from embedding |
Audio clip | audio/text | contrastive | Training on going | |
Video clip | video/text | contrastive | Just started | |
Mclip vit-l/14 | image/text | contrastive | Just started | Aligning a text encoder to be in clip space. Collaboration with mclip author |
Super-resolution | image->image | generative | Just started | Using a high-resolution subset of LAION-5B for the training |
Medical CLIP | image/text | contrastive | Just started | Using CLIP to improve MRI -> image synthesis (see project outline). |
NSFW detection | image/text | contrastive | Done | Using CLIP to detect NSFW in images. |
Watermark detection | image/text | contrastive | Done | Using CLIP to detect watermarks in images. |
electric sheep | image/text/audio/video | contrastive/generative | Just started | Train contrastive and generative models on all modalities. |
Tools
Name | Modality | Status | Notes |
img2dataset | image/text | working | Used to download laion5B in a week, twice |
Clip retrieval | image/text | working | Used to compute 5B Vit-L/14 embeddings |
Crawlingathome-gpu-hcloud | image/text | done | Filtering common crawl using clip |
clip benchmark | image/text | wip | Evaluating clip performances easily |
Papers
Name | Modality | Status | Notes |
Laion400m | image/text | In arvix | Cited many times |
laion5B | image/text | started | |