• Stars
    star
    220
  • Rank 180,422 (Top 4 %)
  • Language
  • License
    MIT License
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ“– A curated list of LegalNLP resources from all around the web.

Awesome License

Legal Natural Language Processing

๐Ÿ—‚ Datasets

Legal Judgement Prediction (LJP)

Dataset Links Domain Language Size
FSCS (Niklaus et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Swiss court judgments ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น 85K cases w/ 2 outcomes
ECtHR (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿค— EU court judgments ๐Ÿ‡ฌ๐Ÿ‡ง 11K cases w/ 11 outcomes
ECHR (Aletras et al., 2019) ๐Ÿ“„ ๐Ÿ’พ EU court judgments ๐Ÿ‡ฌ๐Ÿ‡ง 11.5K cases w/ 11 outcomes
CAIL (Xiao et al., 2018) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 2.6M cases w/ 6 outcomes

Legal Text Classification (LTC)

Dataset Links Domain Language Size
GLC (Papaloukas et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Greek legislation ๐Ÿ‡ฌ๐Ÿ‡ท 47.5K laws w/ 2.7K labels
CUAD (Hendrycks et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 510 contracts w/ 41 classes
MultiEURLEX (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป EU legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡ช๐Ÿ‡ธ (18+) 65K laws w/ 4.5K labels
LEDGAR (Tuggener et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 60.5K contracts w/ 12.6K labels
Contract Discovery (Borchmann et al., 2020) ๐Ÿ“„ ๐Ÿ’ป Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 2.6K clauses w/ 21 classes
EURLEX-57K (Chalkidis et al., 2019) ๐Ÿ“„ ๐Ÿ’พ EU legislation ๐Ÿ‡ฌ๐Ÿ‡ง 57K laws w/ 4.3K labels
Unfair-ToS (Lippi et al., 2018) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 9.4K sentences w/ 9 classes
Contract Elements (Chalkidis et al., 2017) ๐Ÿ“„ ๐Ÿ’พ Contracts ๐Ÿ‡ฌ๐Ÿ‡ง 2.4K contracts w/ 10 classes
OPP-115 (Wilson et al., 2016) ๐Ÿ“„ ๐Ÿ’พ Privacy laws ๐Ÿ‡ฌ๐Ÿ‡ง 115 policies w/ 23K labels

Legal Information Retrieval (LIR)

Dataset Links Domain Language Size
BSARD (Louis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป Belgian legislation ๐Ÿ‡ซ๐Ÿ‡ท 1.1K questions w/ 22.6K candidate statutory articles
EU2UK (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿ’พ EU & UK legislation ๐Ÿ‡ฌ๐Ÿ‡ง 2K query documents w/ 52.5K candidate documents
UK2EU (Chalkidis et al., 2021) ๐Ÿ“„ ๐Ÿ’พ EU & UK legislation ๐Ÿ‡ฌ๐Ÿ‡ง 2.1K query documents w/ 3.9K candidate documents
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Canadian precedents ๐Ÿ‡ฌ๐Ÿ‡ง 650 query cases w/ 128K candidate cases
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Japanese legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต 808 questions w/ 768 candidate statutory articles
CAIL2019-SCM (Xiao et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 8.9K triplets of cases

Legal Question Answering (LQA)

Dataset Links Domain Language Size
CaseHOLD (Zheng et al., 2021) ๐Ÿ“„ ๐Ÿ’ป US case holdings ๐Ÿ‡ฌ๐Ÿ‡ง 53.1K multiple-choice questions
JEC-QA (Zhong et al., 2019) ๐Ÿ“„ ๐Ÿ’พ Chinese law ๐Ÿ‡จ๐Ÿ‡ณ 26.3K multiple-choice questions
CJRC (Duan et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Chinese court judgements ๐Ÿ‡จ๐Ÿ‡ณ 50K question-answers from 10K documents
PrivacyQA (Ravichander et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Privacy policies ๐Ÿ‡ฌ๐Ÿ‡ง 1.7K question-answers from 35 documents

Legal Textual Entailment (LTE)

Dataset Links Domain Language Size
COLIEE-Case-Law-Entailment (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Canadian precedents ๐Ÿ‡ฌ๐Ÿ‡ง 425 cases w/ related case
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) ๐Ÿ“„ ๐Ÿ’พ Japanese legislation ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฏ๐Ÿ‡ต 808 questions w/ related statutory article

Legal Text Summarization (LTS)

Dataset Links Domain Language Size
UK-Abs (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ UK court cases ๐Ÿ‡ฌ๐Ÿ‡ง 793 pairs of (case, abastractive summary) from the UK Supreme Court
IN-Abs (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ Indian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court
IN-Ext (Shukla et al., 2022) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ Indian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 50 pairs of (case, extractive summary) from the Indian Supreme Court
TOS;DR (Keymanesh et al., 2020) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 1.6K pairs of (agreement text, summary) from data privacy policies
BillSum (Kornilova et al., 2019) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ’พ US Congressional bills ๐Ÿ‡ฌ๐Ÿ‡ง 22.2K pairs of (bill, summary)
TL;DRLegal (Manor et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 84 pairs of (agreement text, summary) from software licenses
TOS;DR (Manor et al., 2019) ๐Ÿ“„ ๐Ÿ’ป Terms of service ๐Ÿ‡ฌ๐Ÿ‡ง 421 pairs of (agreement text, summary) from data privacy policies
BVA Cases (Zhong et al., 2019) ๐Ÿ“„ ๐Ÿ’ป US court cases ๐Ÿ‡ฌ๐Ÿ‡ง 92 pairs of (case, summary) from the US Board of Veterans' Appeal
LCR (Galgani et al., 2012) ๐Ÿ“„ ๐Ÿ’พ Australian court cases ๐Ÿ‡ฌ๐Ÿ‡ง 3.9K pairs of (case, catchphrases)

Legal Language Modeling (LLM)

Dataset Links Language Size
Pile of Law (Henderson et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง ~256GB of legal and administrative legal text

Benchmarks

Dataset Task Language Tasks
FairLex (Chalkidis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฉ๐Ÿ‡ช ๐Ÿ‡ซ๐Ÿ‡ท ๐Ÿ‡ฎ๐Ÿ‡น ๐Ÿ‡จ๐Ÿ‡ณ Clasification (x1), legal judgement prediction (x3)
LexGLUE (Chalkidis et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง Classsification (x6), multiple-choice QA (x1)

๐Ÿ”ฅ Models

Model Links Language Size
Legal-HeBERT (Chriqui et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฎ๐Ÿ‡ฑ 110M
PoL-BERT-Large (Henderson et al., 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง 336M
Italian-LEGAL-BERT (Licari and Comande, 2022) ๐Ÿ“„ ๐Ÿค— ๐Ÿ‡ฎ๐Ÿ‡น 110M
JuriBERT (Douka et al., 2021) ๐Ÿ“„ ๐Ÿ’พ ๐Ÿ‡ซ๐Ÿ‡ท {6M, 15M, 42M, 110M}
Custom-LEGAL-BERT (Zheng et al., 2021) ๐Ÿ“„ ๐Ÿค— ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง 110M
LEGAL-BERT (Chalkidis et al., 2020) ๐Ÿ“„ ๐Ÿค— ๐Ÿ‡ฌ๐Ÿ‡ง {35M, 110M}
LEGAL-GPT-{1,2} (Borchmann et al., 2020) ๐Ÿ“„ ๐Ÿ’ป ๐Ÿ‡ฌ๐Ÿ‡ง {117M, 1.5B}

๐Ÿ“š Books

  • [2017] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link]

๐Ÿ“„ Surveys

  • [2020-05] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf]
  • [2019-09] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf]
  • [2018-12] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf]

๐ŸŽ™ Talks

  • [2019-06] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides]
  • [2019-04] Artificial Intelligence and Law โ€“ An Overview and History, H. Surden. [video]

๐Ÿ—“ Conferences & Workshops

  • The Natural Legal Language Processing (NLLP) Workshop [website]
  • The International Conference on Artificial Intelligence and Law (ICAIL) [website]
  • The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
  • The EXplainable AI in Law (XAILA) Workshop [website]
  • The International Workshop on Juris-informatics (JURISIN) [website]
  • The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
  • The International Workshop on Legal Information Retrieval [website]

More Repositories

1

graphdoc

โ†” Drag-and-drop editor for docassemble interviews.
Vue
41
star
2

bsard

๐Ÿ” A statutory article retrieval dataset in French. (ACL 2022)
Python
37
star
3

lleqa

๐Ÿค– Long-form question answering in the legal domain. (AAAI 2024)
Jupyter Notebook
27
star
4

VendorLink

๐ŸŒ Identifying & Linking Vendor Migrants and Aliases on Darknet Markets. (ACL 2023)
Jupyter Notebook
20
star
5

gdsr

๐Ÿ”— A graph-augmented dense statute retriever. (EACL 2023)
Python
17
star
6

case-law-explorer

โ˜๏ธ A network analysis software platform for analyzing Dutch and European court decisions.
Jupyter Notebook
16
star
7

law3027-advanced-legal-analytics

๐Ÿ“š Materials for Advanced Legal Analytics (LAW3027) @ Maastricht University.
Jupyter Notebook
13
star
8

extraction_libraries

Python libraries for extracting from data sources like Rechtspraak, ECHR, Cellar
Python
10
star
9

law3025-legal-analytics

๐Ÿ“š Materials for Legal Analytics (LAW3025) @ Maastricht University
Jupyter Notebook
10
star
10

case-explorer-ui

User interface for the network analysis software platform for analyzing Dutch and European court decisions.
TypeScript
8
star
11

CyberCrimelinker

Connecting Illegal Vendors on Darknet markets: Responsible Authorship Attribution to Link and Connect online Cybercrimes
Jupyter Notebook
4
star
12

ai_ethics_guidelines_analysis

This project aims to investigate if features such as the type of issuing organization, the type of funding, the authors' affiliations etc. impact what AI ethical guidelines are written. We will use topic modelling to detect topic on the set of gathered ethical guidelines documents.
Jupyter Notebook
3
star
13

EU_EurLex_Cellar_reference_querier

Jupyter Notebook
2
star
14

python-club

Python Club & Coding Camp resources from Maastricht Law&TechLab
Jupyter Notebook
2
star
15

IDTraffickers

An Authorship Attribution Dataset for Detecting Human-Trafficking Operations on Escort Advertisements. (EMNLP 2023)
Jupyter Notebook
2
star
16

legalnetworkanalysis_textbook

Jupyter Notebook
2
star
17

citation-enhance-merger

Jupyter Notebook
1
star
18

data_legislation

1
star
19

env-args

Classify legal cases , and analyze environmental arguments in belgian case law
Python
1
star
20

persona-training-scripts

Jupyter Notebook
1
star
21

maastrichtlawtech.eu

HTML
1
star
22

assignment-graph-ui

A very basic cytoscape based graph ready to be used to experiment.
TypeScript
1
star
23

sleeping-beauties-case-law

๐Ÿ’ค A computational study on sleeping beauties discovery in case law.
Jupyter Notebook
1
star
24

closer

NLP pipeline for most legal tasks
Python
1
star
25

cjeu-ontology-mappings

Results for a legal ontology coverage study for properties of case law from the Court of Justice of the European Union (CJEU)
1
star