• Stars
    star
    208
  • Rank 189,015 (Top 4 %)
  • Language
    Java
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A large-scale statistical machine translation system written in Java.

Phrasal: A statistical machine translation system

Phrasal is licensed under the GPL (v3+). For details, please see the file LICENSE.txt in the root directory of this software package.

Copyright (c) 2007-2016 The Board of Trustees of The Leland Stanford Junior University. All Rights Reserved.

Installation

We use Gradle to build Phrasal. Gradle will install all dependencies. You need Gradle version 2.1+. If you are on OS X, the easiest way to get Gradle is to install Homebrew and then to type brew install gradle.

Linux / Mac OS X

These instructions assume you are using the bash shell, which is usually the default shell.

  1. Switch to the root of the Phrasal repository and execute: gradle installDist

  2. Set PHRASAL_HOME: export PHRASAL_HOME=`pwd`

  3. Set CLASSPATH: export CLASSPATH=$PHRASAL_HOME/build/install/phrasal/lib/*

  4. (Optional) Build Eclipse project files by executing: gradle eclipse.

  5. (Optional, requires g++, JDK) Build the KenLM loader: gradle compileKenLM.

  6. (Optional, requires g++, JDK, and Boost) Build the KenLM language model estimation tools: gradle compileKenLMtools.

Windows

Follow the Linux instructions above. Then be sure to execute gradle startupScripts to generate a .bat file.

Citation

If you use Phrasal for research, then please cite the following paper:

@inproceedings{Green2014,
 author = {Spence Green and Daniel Cer and Christopher D. Manning},
 title = {Phrasal: A Toolkit for New Directions in Statistical Machine Translation},
 booktitle = {In Proceddings of the Ninth Workshop on Statistical Machine Translation},
 year = {2014}
}

Documentation / User Guide

See the user guide for complete installation and configuration instructions. The guide also contains a tutorial for building an MT system from raw text.

Support

We have 3 mailing lists for Phrasal, all of which are shared with other Stanford JavaNLP tools (with the exclusion of the parser).

Each address is at @lists.stanford.edu:

java-nlp-user -- This is the best list to post to in order to ask questions, make announcements, or for discussion among JavaNLP users. You have to subscribe to be able to use it. Join the list via this webpage or by emailing [email protected]. (Leave the subject and message body empty.) You can also look at the list archives.

java-nlp-announce -- This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 message a year). Join the list via via this webpage or by emailing [email protected]. (Leave the subject and message body empty.)

java-nlp-support -- This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, please join and use java-nlp-user. You cannot join java-nlp-support, but you can mail questions to [email protected].

More Repositories

1

dspy

DSPy: The framework for programming—not prompting—foundation models
Python
18,220
star
2

CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
Java
9,678
star
3

stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Python
7,278
star
4

GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
C
6,867
star
5

cs224n-winter17-notes

Course notes for CS224N Winter17
TeX
1,587
star
6

pyreft

ReFT: Representation Finetuning for Language Models
Python
1,137
star
7

treelstm

Tree-structured Long Short-Term Memory networks (http://arxiv.org/abs/1503.00075)
Lua
875
star
8

pyvene

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Python
625
star
9

string2string

String-to-String Algorithms for Natural Language Processing
Jupyter Notebook
533
star
10

python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.
Python
516
star
11

mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Python
494
star
12

spinn

SPINN (Stack-augmented Parser-Interpreter Neural Network): fast, batchable, context-aware TreeRNNs
Python
205
star
13

coqa-baselines

The baselines used in the CoQA paper
Python
176
star
14

cocoa

Framework for learning dialogue agents in a two-player game setting.
Python
158
star
15

stanza-old

Stanford NLP group's shared Python tools.
Python
138
star
16

chirpycardinal

Stanford's Alexa Prize socialbot
Python
131
star
17

stanfordnlp

[Deprecated] This library has been renamed to "Stanza". Latest development at: https://github.com/stanfordnlp/stanza
Python
114
star
18

wge

Workflow-Guided Exploration: sample-efficient RL agent for web tasks
Python
109
star
19

pdf-struct

Logical structure analysis for visually structured documents
Python
81
star
20

edu-convokit

Edu-ConvoKit: An Open-Source Framework for Education Conversation Data
Jupyter Notebook
75
star
21

cs224n-web

http://cs224n.stanford.edu
HTML
60
star
22

ColBERT-QA

Code for Relevance-guided Supervision for OpenQA with ColBERT (TACL'21)
40
star
23

stanza-train

Model training tutorials for the Stanza Python NLP Library
Python
37
star
24

phrasenode

Mapping natural language commands to web elements
Python
37
star
25

contract-nli-bert

A baseline system for ContractNLI (https://stanfordnlp.github.io/contract-nli/)
Python
29
star
26

color-describer

Code for Learning to Generate Compositional Color Descriptions
OpenEdge ABL
26
star
27

stanza-resources

23
star
28

python-corenlp-protobuf

Python bindings for Stanford CoreNLP's protobufs.
Python
20
star
29

miniwob-plusplus-demos

Demos for the MiniWoB++ benchmark
17
star
30

multi-distribution-retrieval

Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval
Python
14
star
31

huggingface-models

Scripts for pushing models to huggingface repos
Python
11
star
32

nlp-meetup-demo

Java
8
star
33

sentiment-treebank

Updated version of SST
Python
8
star
34

en-worldwide-newswire

An English NER dataset built from foreign newswire
Python
7
star
35

plot-data

datasets for plotting
Jupyter Notebook
6
star
36

contract-nli

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts
HTML
4
star
37

plot-interface

Web interface for the plotting project
JavaScript
3
star
38

handparsed-treebank

Extra hand parsed data for training models
Perl
2
star
39

coqa

CoQA -- A Conversational Question Answering Challenge
Shell
2
star
40

pdf-struct-models

A repository for hosting models for https://github.com/stanfordnlp/pdf-struct
HTML
2
star
41

chirpy-parlai-blenderbot-fork

A fork of ParlAI supporting Chirpy Cardinal's custom neural generator
Python
2
star
42

wob-data

Data for QAWoB and FlightWoB web interaction benchmarks from the World of Bits paper (Shi et al., 2017).
Python
2
star
43

pdf-struct-dataset

Dataset for pdf-struct (https://github.com/stanfordnlp/pdf-struct)
HTML
1
star
44

nn-depparser

A re-implementation of nndep using PyTorch.
Python
1
star