Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Objective-C

Go

OCaml

Lua

Groovy

Julia

Scala

Kotlin

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Swift

Scala

Dart

C#

Clojure

Lua

Elm

C++

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇨🇲 Cameroon

🇻🇪 Venezuela

🇵🇰 Pakistan

🇧🇹 Bhutan

🇳🇮 Nicaragua

🇵🇱 Poland

🇵🇦 Panama

🇨🇮 Côte d'Ivoire

All Countries Compare Countries

bigscience-workshop/carbon-footprint

Stars
10
Rank 1,807,489 (Top 36 %)
Language
Jupyter Notebook
Created over 3 years ago
Updated about 2 years ago

bigscience-workshop/carbon-footprint

bigscience-workshop

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A repository for `codecarbon` logs.

petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

promptsource

Toolkit for creating, sharing and using natural language prompts.

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

xmtf

Crosslingual Generalization through Multitask Finetuning

Jupyter Notebook

t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

biomedical

Tools for curating biomedical training data for large-scale language modeling

data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Jupyter Notebook

lam

Libraries, Archives and Museums (LAM)

data_tooling

Tools for managing datasets for governance and training.

multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

evaluation

Code and Data for Evaluation WG

data_sourcing

This directory gathers the tools developed by the Data Sourcing Working Group

metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

model_card

tokenization

bloom-dechonk

A repo for running model shrinking experiments

historical_texts

BigScience working group on language models for historical texts

Jupyter Notebook

catalogue_data

Scripts to prepare catalogue data

Jupyter Notebook

pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon

training_dynamics

bibliography

A list of BigScience publications

scaling-laws-tokenization

scaling-laws-tokenization

datasets_stats

Generate statistics over datasets used in the context of BS

evaluation-robustness-consistency

Tools for evaluating model robustness and consistency

interpretability-ideas

evaluation-results

Dump of results for bigscience.