There are no reviews yet. Be the first to send feedback to the community and the maintainers!
SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]tree-of-thought-llm
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language ModelsSimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821SWE-bench
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?MeZO
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333PURE
[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812LM-BFF
[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723SimPO
SimPO: Simple Preference Optimization with a Reference-Free RewardDensePhrases
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured PruningALCE
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction TuningAutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long ContextsWebShop
[NeurIPS 2022] ๐WebShop: Towards Scalable Real-World Web Interaction with Grounded Language AgentsTRIME
[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408OptiPrompt
[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240TransformerPrograms
[NeurIPS 2023] Learning Transformer ProgramsEntityQuestions
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535QuRating
[ICML 2024] Selecting High-Quality Data for Training Language ModelsCEPE
[ACL 2024] Long-Context Language Modeling with Parallel EncodingsDinkyTrain
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration ๐LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction FollowingMQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop QuestionsUSACO
Can Language Models Solve Olympiad Programming?ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"NLProofS
EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMsMADE
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-AnsweringLM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643c-sts
[EMNLP 2023] C-STS: Conditional Semantic Textual Similaritycalm-textgame
[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based GamesDataMUX
[NeurIPS 2022] DataMUX: Data Multiplexing for Neural NetworksShortcutGrammar
EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560LitSearch
A Retrieval Benchmark for Scientific Literature SearchCollie
[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation TasksEvalConvQA
[ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question AnsweringHELMET
The HELMET BenchmarkMABEL
EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975LM-Science-Tutor
rationale-robustness
NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790PTP
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073corpus-poisoning
[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156InstructEval
[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.Edge-Pruning
Code and data for the paper "Finding Transformer Circuits with Edge Pruning".WhatICLLearns
[ACL 2023 Findings] What In-Context Learning โLearnsโ In-Context: Disentangling Task Recognition and Task LearningCognac
Repo for paper: Controllable Text Generation with Language ConstraintsELIZA-Transformer
Representing Rule-based Chatbots with Transformerssemsup
Semantic Supervision: Enabling Generalization over Output Spacesbenign-data-breaks-safety
SRL-NLC
Safe Reinforcement Learning with Natural Language Constraintsdatamux-pretraining
MUX-PLMs: Pretraining LMs with Data MultiplexingXTX
[ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text GamesMultilingualAnalysis
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"dyck-transformer
[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languagesblindfold-textgame
[NAACL 2021] Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agentsalign-mlm
metric-wsd
NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguationsemsup-xc
SemSup-XC: Semantic Supervision for Extreme ClassificationHeuristic-Core
[ACL 2024] The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models - https://arxiv.org/abs/2403.03942CopyCat
NegotiationToM
Code release for Improving Dialog Systems for Negotiation with Personality Modeling.CARETS
SPARTAN
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformersil-scaling-in-games
Official code repo of "Scaling Laws for Imitation Learning in Single-Agent Games"attribute-tagging
[LaReL 2022] Towards an Enhanced, Faithful, and Adaptable Web Interaction EnvironmentMoQA
Love Open Source and this site? Check out how you can help us