cort
cort is a coreference resolution toolkit. It consists of two parts: the coreference resolution component implements a framework for coreference resolution based on latent variables, which allows you to rapidly devise approaches to coreference resolution, while the error analysis component provides extensive functionality for analyzing and visualizing errors made by coreference resolution systems.
If you have any questions or comments, drop me an e-mail at [email protected].
Branches/Forks
- the kbest branch contains code for kbest extraction of coreference information, as described in Ji et al. (2017)
- the v03 branch contains a version of cort with more models and a better train/dev/test workflow. For more details on the models see Martschat (2017).
- Nafise Moosavi's fork of cort implements search space pruning on top of cort, as described in Moosavi and Strube (2016)
Documentation
Installation
cort is available on PyPi. You can install it via
pip install cort
Dependencies (automatically installed by pip) are nltk, numpy, matplotlib, mmh3, PyStanfordDependencies, cython, future, jpype and beautifulsoup. It ships with stanford_corenlp_pywrapper and the reference implementation of the CoNLL scorer.
cort is written for use on Linux with Python 3.3+. While cort also runs under Python 2.7, I strongly recommend running cort with Python 3, since the Python 3 version is much more efficient.
References
Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi and Noah A. Smith (2017). Dynamic Entity Representations in Neural Language Models. To appear in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7-11 September 2017.
PDF
Sebastian Martschat (2017). Structured Representations for Coreference Resolution. PhD thesis, Heidelberg University.
PDF
Nafise Sadat Moosavi and Michael Strube (2016). Search space pruning: A
simple solution for better coreference resolvers. In Proceedings of the 2016
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, San Diego, Cal., 12-17 June 2016,
pages 1005-1011.
PDF
Sebastian Martschat and Michael Strube (2015). Latent Structures for
Coreference Resolution. Transactions of the Association for
Computational Linguistics, 3, pages 405-418.
PDF
Sebastian Martschat, Patrick Claus and Michael Strube (2015). Plug Latent
Structures and Play Coreference Resolution. In Proceedings of
the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China,
26-31 July 2015, pages 61-66.
PDF
Sebastian Martschat, Thierry GΓΆckel and Michael Strube (2015). Analyzing and
Visualizing Coreference Resolution Errors. In Proceedings of the 2015
Conference of the North American Chapter of the Association for Computational
Linguistics: Demonstrations, Denver, Colorado, USA, 31 May-5 June 2015,
pages 6-10.
PDF
Sebastian Martschat and Michael Strube (2014). Recall Error Analysis for
Coreference Resolution. In Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25-29 October
2014, pages 2070-2081.
PDF
Sebastian Martschat (2013). Multigraph Clustering for Unsupervised
Coreference Resolution. In Proceedings of the Student Research Workshop
at the 51st Annual Meeting of the Association for Computational Linguistics,
Sofia, Bulgaria, 5-7 August 2013, pages 81-88.
PDF
If you use the error analysis component in your research, please cite the EMNLP'14 paper. If you use the coreference component in your research, please cite the TACL paper. If you use the multigraph system, please cite the ACL'13-SRW paper.
Changelog
Wednesday, 4 November 2015
Support numeric features. Due to a different feature representation the models changed,
hence I have updated the downloadable models.
Friday, 9 October 2015
Now supports label-dependent cost functions.
Tuesday, 15 September 2015
Minor bugfixes.
Monday, 27 July 2015
Now can perform coreference resolution on raw text.
Tuesday, 21 July 2015
Updated to status of TACL paper.
Wednesday, 3 June 2015
Improvements to visualization (mention highlighting and scrolling).
Monday, 1 June 2015
Fixed a bug in mention highlighting for visualization.
Sunday, 31 May 2015
Updated to status of NAACL'15 demo paper.
Wednesday, 13 May 2015
Fixed another bug in the documentation regarding format of antecedent data.
Tuesday, 3 February 2015
Fixed a bug in the documentation: part no. in antecedent file must be with trailing 0s.
Thursday, 30 October 2014
Fixed data structure bug in documents.py. The results from the paper are not affected by this bug.
Wednesday, 22 October 2014
Initial release.