• Stars
    star
    9,470
  • Rank 3,564 (Top 0.08 %)
  • Language
    Java
  • License
    GNU General Publi...
  • Created almost 11 years ago
  • Updated 1 day ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.

Stanford CoreNLP

Run Tests Maven Central Twitter

Stanford CoreNLP provides a set of natural language analysis tools written in Java. It can take raw human language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize and interpret dates, times, and numeric quantities, mark up the structure of sentences in terms of syntactic phrases or dependencies, and indicate which noun phrases refer to the same entities. It was originally developed for English, but now also provides varying levels of support for (Modern Standard) Arabic, (mainland) Chinese, French, German, Hungarian, Italian, and Spanish. Stanford CoreNLP is an integrated framework, which makes it very easy to apply a bunch of language analysis tools to a piece of text. Starting from plain text, you can run all the tools with just two lines of code. Its analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications. Stanford CoreNLP is a set of stable and well-tested natural language processing tools, widely used by various groups in academia, industry, and government. The tools variously use rule-based, probabilistic machine learning, and deep learning components.

The Stanford CoreNLP code is written in Java and licensed under the GNU General Public License (v2 or later). Note that this is the full GPL, which allows many free uses, but not its use in proprietary software that you distribute to others.

Build Instructions

Several times a year we distribute a new version of the software, which corresponds to a stable commit.

During the time between releases, one can always use the latest, under development version of our code.

Here are some helpful instructions to use the latest code:

Provided build

Sometimes we will provide updated jars here which have the latest version of the code.

At present, the current released version of the code is our most recent released jar, though you can always build the very latest from GitHub HEAD yourself.

Build with Ant

  1. Make sure you have Ant installed, details here: http://ant.apache.org/
  2. Compile the code with this command: cd CoreNLP ; ant
  3. Then run this command to build a jar with the latest version of the code: cd CoreNLP/classes ; jar -cf ../stanford-corenlp.jar edu
  4. This will create a new jar called stanford-corenlp.jar in the CoreNLP folder which contains the latest code
  5. The dependencies that work with the latest code are in CoreNLP/lib and CoreNLP/liblocal, so make sure to include those in your CLASSPATH.
  6. When using the latest version of the code make sure to download the latest versions of the corenlp-models, english-models, and english-models-kbp and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.

Build with Maven

  1. Make sure you have Maven installed, details here: https://maven.apache.org/
  2. If you run this command in the CoreNLP directory: mvn package , it should run the tests and build this jar file: CoreNLP/target/stanford-corenlp-4.5.4.jar
  3. When using the latest version of the code make sure to download the latest versions of the corenlp-models, english-extra-models, and english-kbp-models and include them in your CLASSPATH. If you are processing languages other than English, make sure to download the latest version of the models jar for the language you are interested in.
  4. If you want to use Stanford CoreNLP as part of a Maven project you need to install the models jars into your Maven repository. Below is a sample command for installing the Spanish models jar. For other languages just change the language name in the command. To install stanford-corenlp-models-current.jar you will need to set -Dclassifier=models. Here is the sample command for Spanish: mvn install:install-file -Dfile=/location/of/stanford-spanish-corenlp-models-current.jar -DgroupId=edu.stanford.nlp -DartifactId=stanford-corenlp -Dversion=4.5.4 -Dclassifier=models-spanish -Dpackaging=jar

Models

The models jars that correspond to the latest code can be found in the table below.

Some of the larger (English) models -- like the shift-reduce parser and WikiDict -- are not distributed with our default models jar. These require downloading the English (extra) and English (kbp) jars. Resources for other languages require usage of the corresponding models jar.

The best way to get the models is to use git-lfs and clone them from Hugging Face Hub.

For instance, to get the French models, run the following commands:

# Make sure you have git-lfs installed
# (https://git-lfs.github.com/)
git lfs install

git clone https://huggingface.co/stanfordnlp/corenlp-french

The jars can be directly downloaded from the links below or the Hugging Face Hub page as well.

Language Model Jar Last Updated
Arabic download (HF Hub) 4.5.6
Chinese download (HF Hub) 4.5.6
English (extra) download (HF Hub) 4.5.6
English (KBP) download (HF Hub) 4.5.6
French download (HF Hub) 4.5.6
German download (HF Hub) 4.5.6
Hungarian download (HF Hub) 4.5.6
Italian download (HF Hub) 4.5.6
Spanish download (HF Hub) 4.5.6

Thank you to Hugging Face for helping with our hosting!

Install by Gradle

If you don't know Gradle itself, see official site: https://gradle.org

Write the following in your build.gradle according to Maven Central:

dependencies {
    implementation 'edu.stanford.nlp:stanford-corenlp:4.5.5'
}

If you want to analyse English, add following:

    implementation "edu.stanford.nlp:stanford-corenlp:4.5.5:models"
    implementation "edu.stanford.nlp:stanford-corenlp:4.5.5:models-english"
    implementation "edu.stanford.nlp:stanford-corenlp:4.5.5:models-english-kbp"

If you use another version, replace "4.5.5" to a version you use.

Useful resources

You can find releases of Stanford CoreNLP on Maven Central.

You can find more explanation and documentation on the Stanford CoreNLP homepage.

For information about making contributions to Stanford CoreNLP, see the file CONTRIBUTING.md.

Questions about CoreNLP can either be posted on StackOverflow with the tag stanford-nlp, or on the mailing lists.

More Repositories

1

dspy

DSPy: The framework for programming—not prompting—foundation models
Python
11,014
star
2

stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Python
7,059
star
3

GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
C
6,705
star
4

cs224n-winter17-notes

Course notes for CS224N Winter17
TeX
1,579
star
5

treelstm

Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)
Lua
878
star
6

pyreft

ReFT: Representation Finetuning for Language Models
Python
687
star
7

python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.
Python
513
star
8

string2string

String-to-String Algorithms for Natural Language Processing
Jupyter Notebook
494
star
9

mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Python
487
star
10

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Python
479
star
11

phrasal

A large-scale statistical machine translation system written in Java.
Java
207
star
12

spinn

SPINN (Stack-augmented Parser-Interpreter Neural Network): fast, batchable, context-aware TreeRNNs
Python
205
star
13

coqa-baselines

The baselines used in the CoQA paper
Python
174
star
14

cocoa

Framework for learning dialogue agents in a two-player game setting.
Python
155
star
15

stanza-old

Stanford NLP group's shared Python tools.
Python
141
star
16

chirpycardinal

Stanford's Alexa Prize socialbot
Python
129
star
17

stanfordnlp

[Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
Python
110
star
18

wge

Workflow-Guided Exploration: sample-efficient RL agent for web tasks
Python
104
star
19

cs224n-web

http://cs224n.stanford.edu
HTML
62
star
20

pdf-struct

Logical structure analysis for visually structured documents
Python
58
star
21

ColBERT-QA

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
41
star
22

stanza-train

Model training tutorials for the Stanza Python NLP Library
Python
37
star
23

phrasenode

Mapping natural language commands to web elements
Python
37
star
24

edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data
Jupyter Notebook
35
star
25

color-describer

Code for Learning to Generate Compositional Color Descriptions
OpenEdge ABL
27
star
26

contract-nli-bert

A baseline system for ContractNLI (https://stanfordnlp.github.io/contract-nli/)
Python
25
star
27

python-corenlp-protobuf

Python bindings for Stanford CoreNLP's protobufs.
Python
21
star
28

stanza-resources

21
star
29

miniwob-plusplus-demos

Demos for the MiniWoB++ benchmark
17
star
30

multi-distribution-retrieval

Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
Python
13
star
31

huggingface-models

Scripts for pushing models to huggingface repos
Python
11
star
32

sentiment-treebank

Updated version of SST
Python
9
star
33

nlp-meetup-demo

Java
8
star
34

plot-data

datasets for plotting
Jupyter Notebook
7
star
35

en-worldwide-newswire

NER dataset built from foreign newswire
6
star
36

plot-interface

Web interface for the plotting project
JavaScript
4
star
37

contract-nli

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
HTML
4
star
38

pdf-struct-models

A repository for hosting models for https://github.com/stanfordnlp/pdf-struct
HTML
2
star
39

wob-data

Data for QAWoB and FlightWoB web interaction benchmarks from the World of Bits paper (Shi et al., 2017).
Python
2
star
40

pdf-struct-dataset

Dataset for pdf-struct (https://github.com/stanfordnlp/pdf-struct)
HTML
1
star
41

handparsed-treebank

Extra hand parsed data for training models
Perl
1
star
42

coqa

CoQA -- A Conversational Question Answering Challenge
Shell
1
star
43

chirpy-parlai-blenderbot-fork

A fork of ParlAI supporting Chirpy Cardinal's custom neural generator
Python
1
star
44

nn-depparser

A re-implementation of nndep using PyTorch.
Python
1
star