• Stars
    star
    1,073
  • Rank 43,114 (Top 0.9 %)
  • Language
    HTML
  • License
    MIT License
  • Created 7 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DataComp for Language Models

More Repositories

1

open_clip

An open source implementation of CLIP.
Python
9,941
star
2

open_flamingo

An open-source framework for training large multimodal models.
Python
3,716
star
3

MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.
749
star
4

datacomp

DataComp: In search of the next generation of multimodal datasets
Python
628
star
5

wise-ft

Robust fine-tuning of zero-shot models
Python
618
star
6

open_lm

A repository for research on medium sized language models.
Python
475
star
7

model-soups

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Python
412
star
8

task_vectors

Editing Models with Task Arithmetic
Python
397
star
9

open-diffusion

Simple large-scale training of stable diffusion with multi-node support.
Python
120
star
10

scaling

Language models scale reliably with over-training and on downstream tasks
Jupyter Notebook
90
star
11

patching

Patching open-vocabulary models by interpolating weights
Python
87
star
12

VisIT-Bench

Python
46
star
13

imagenet-captions

Release of ImageNet-Captions
45
star
14

tableshift

A benchmark for distribution shift in tabular data
Python
38
star
15

clip_quality_not_quantity

Python
28
star
16

rtfm

Research on Tabular Foundation Models
Python
20
star
17

dataset2metadata

Python
19
star
18

spark-commoncrawl

Jupyter Notebook
6
star
19

datacomp_site

HTML
6
star
20

tabliblib

A Python library for processing and filtering TabLib
Python
5
star
21

webdataset-resharder

Efficiently process webdatasets
Python
4
star
22

imagenet-applications-transfer

Python
2
star
23

au21

Jupyter Notebook
1
star
24

advancedml-sp23

CSS
1
star