• Stars
    star
    103
  • Rank 333,046 (Top 7 %)
  • Language
    JavaScript
  • License
    GNU General Publi...
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
Documentation Status http://applejack.science.ru.nl/lamabadge.php/flat Project Status: Active – The project has reached a stable, usable state and is being actively developed. Latest release in the Python Package Index

FLAT - FoLiA Linguistic Annotation Tool

FLAT is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. FLAT allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. It is a document-centric tool that fully preserves and visualises document structure.

FLAT is written in javascript with jquery on a backend of Python using the Django framework. The FoLiA Document Server (https://github.com/proycon/foliadocserve) , the back-end of the system, is written in Python with CherryPy and is used as a RESTful webservice.

FLAT is open source software developed at the Centre of Language and Speech Technology, Radboud University Nijmegen. It is licensed under the GNU Public License v3. The project is funded in the scope of the larger CLARIAH project.

Features

  • Web-based, multi-user environment
  • Server-side document storage, divided into 'namespaces', by default each user has his own namespace. Active documents are held in memory server-side. Read and write permissions to access namespaces are fully configurable.
  • Concurrency (multiple users may edit the same document similtaneously)
  • Full versioning control for documents (using git), allowing limitless undo operations. (in foliadocserve)
  • Full provenance information, including annotator names and timestamps, is stored in the FoLiA XML and can be displayed by the interface. The git log also contains verbose information on annotations.
  • Annotators can indicate their confidence for each annotation
  • Highly configurable interface; interface options may be disabled on a per-configuration basis. Multiple configurations can be deployed on a single installation
  • Displays and retains document structure (divisions, paragraphs, sentence, lists, etc)
  • Support for corrections, of text or linguistic annotations, and alternative annotations. Corrections are never mere substitutes, originals are always preserved!
  • Spelling corrections for runons, splits, insertions and deletions are supported.
  • Supports FoLiA Set Definitions to display label sets. Sets are not predefined in FoLiA and anybody can create their own.
  • Supports Token Annotation and Span Annotation
  • Supports complex span annotations such as dependency relations, syntactic units (constituents), predicates and semantic roles, sentiments, stratements/attribution, observations
  • Simple metadata editor for editing/adding arbitrary metadata to a document. Selected metadata fields can be shown in the document index as well.
  • User permission model featuring groups, group namespaces, and assignable permissions
  • File/document management functions (copying, moving, deleting)
  • Allows converter extensions to convert from other formats to FoLiA on upload
  • In-document search (CQL or FQL), advanced searches can be predefined by administrators
  • Morphosyntactic tree visualisation (constituency parses and morphemes)
  • Higher-order annotation: associate features, comments, descriptions with any linguistic annotation, also allow for arbritary relations between any type of annotation or structure

Architecture

The FLAT architecture consists of three layers:

At the far back-end is the FoLiA Document Server, which loads the requested FoLiA documents, and passes these as JSON and HTML to the middle layer, the FLAT server, this layer in turn passes edits formulated in FoLiA Query Language (FQL) to the Document Server. The user-interface is a modern web-application interacting with the FLAT server and translates interface actions to FQL.

FLAT distinguishes several modes, between which the user can switch to get another take of a document with different editing abilities. There are currently three modes available, and a fourth one under construction:

  • Viewer - Views a document and its annotations, no editing abilities
  • Editor - Main editing environment for linguistic annotation
  • Metadata editor - A simple editor for document-wide metadata
  • Structure Editor - Editing environment for document structure (rudimentary version, not done)

In the viewer and editor mode, users may set an annotation focus, i.e. an annotation type that they want to visualise or edit. All instances of that annotation type will be coloured in the interface and offer a global overview. Moreover, the editor dialog will automatically present fields for editing that type whenever a structure element (usually a word/token) is clicked.

Whereas the annotation focus is primary, users may select what other annotation types they want to view locally, i.e. in the annotation viewer that pops up whenever the user hovers over a structural element. Similarly, users may select what annotation types they want to see in editor, allowing the editing of multiple annotation types at once rather than just the annotation focus.

FoLiA distinguishes itself from some other annotation formats by explicitly providing support for corrections (on any annotation type, including text) and alternative annotations. In FLAT these are categorized as edit forms, as they are different forms of editing; the user can select which form he wants to use when performing an annotation.

The site administrator can configure in great detail what options the user has available and what defaults are selected (for annotation focus for instance), as a full unconstrained environment (which is the default) may quickly be daunting and confusing to the end-user. Different tasks ask for different configurations, so multiple configurations per site are possible; the user chooses the desired configuration, often tied to a particular annotation project, upon login.

Documentation

FLAT Documentation can be found in the form of the following three guides:

Moreover, a screencast video has been created to show FLAT in action and offer an impression of its features:

Screenshots

For some impressions of FLAT, take a look at the following screenshots:

The login screen:

FLAT screenshot

Document index, showing namespaces accessible to the user and the documents within.

FLAT screenshot

Hovering over words reveals annotations:

FLAT screenshot

A particular annotation focus can be set to highlight the most frequent classes in that set:

FLAT screenshot

FLAT screenshot

Editing a named entity in a set for which a set definition is available:

FLAT screenshot

Correcting a word in a spelling-annotation project:

FLAT screenshot

Proper right-to-left support for languages such as Arabic, Farsi and Hebrew.

FLAT screenshot (right to left)

Extensive history with limitless undo ability, git-based:

FLAT screenshot

Advanced search queries in CQL (Corpus Query Language) or FQL (FoLiA Query Language):

FLAT screenshot

FLAT screenshot

Tree visualisation of syntax and morphology:

FLAT screenshot

FLAT screenshot

More Repositories

1

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Python
479
star
2

vocage

A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
Rust
142
star
3

clam

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.
Python
129
star
4

colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
C++
123
star
5

LaMachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Shell
68
star
6

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Python
60
star
7

python-frog

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
Cython
47
star
8

analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
Rust
30
star
9

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Cython
29
star
10

codemetapy

A Python package for generating and working with codemeta
Python
24
star
11

gecco

Generic Environment for Context-Aware Correction of Orthography
Python
21
star
12

homeassistant-config

My elaborate home automation configuration + scripts
Python
21
star
13

dotfiles

My dotfiles
Shell
20
star
14

deepfrog

An NLP-suite powered by deep learning
Rust
19
star
15

hanzigrid

Hanzi grids for studying mandarin chinese (tool & output data)
HTML
18
star
16

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Python
18
star
17

procmapgen

A small toy project written in Rust: procedural generation of various kinds of grid-based maps.
Rust
16
star
18

python-timbl

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
Python
16
star
19

spacy2folia

Use spaCy for NLP and output to the FoLiA XML format.
Python
12
star
20

foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
Python
10
star
21

pbmbmt

Phrase-based Memory-based Machine Translation
Python
10
star
22

unilangforum

UniLang Language Community - Forum
PHP
8
star
23

colibri

THIS PROJECT IS BEING RENDERED OBSOLETE BY NEWER VERSIONS colibri-core and colibri-mt !!
C++
7
star
24

valkuil-gecco

Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco
Python
7
star
25

nederlab-pipeline

Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project
Groovy
7
star
26

anavec

Proof-of-concept spelling correction/normalisation system based on anagram vectors
Python
6
star
27

codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
Shell
6
star
28

foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
Python
5
star
29

piereling

Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.
Python
5
star
30

lingua-cli

Very small simple command-line interface for language detection using lingua-rs
Rust
5
star
31

colibri-mt

A Machine Translation framework that wraps around the Moses Decoder and enables k-NN classifier techniques to be used for modelling source-side-context
C++
5
star
32

semeval2014task5

This is the official repository for SemEval 2014 Task 5: L2 Translation Assistant. It contains the gold standard learner corpus, evaluation results and the Python program library needed for the task. It does not contain a full translation assistance system.
HTML
5
star
33

babelente

BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org
Python
4
star
34

labirinto

A web front-end portal for a virtual laboratory of NLP tools
Vue
4
star
35

clamservices

A collection of CLAM webservices for various of our Natural Language Processing tools
Python
4
star
36

folia-rust

FoLiA library for rust (alpha)
Rust
4
star
37

codemeta-server

Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user
Python
4
star
38

sesdiff

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
Rust
4
star
39

alpino_clam_webservice

A CLAM-powered webservice for Alpino, a dependency parser for Dutch
Python
3
star
40

vocadata

Data for vocabulary learning
3
star
41

parseme-support

FoLiA & FLAT support for PARSEME
Python
3
star
42

spreek2schrijf

Scripts voor Spreek2Schrijf, een project met de Tweede Kamer
Python
3
star
43

svkbd

my fork of suckless' simple virtual keyboard: https://tools.suckless.org/x/svkbd/
C
3
star
44

sxmo-docs

my fork of https://git.sr.ht/~mil/sxmo-docs
Shell
2
star
45

aNtiLoPe

A collection of NLP pipelines powered by Nextflow
Groovy
2
star
46

sxmo-utils

my fork of https://git.sr.ht/~mil/sxmo-utils/
Shell
2
star
47

wrexp

Experiment Wrapper - A framework for launching and keeping track of experiments. Wrexp takes care of storing all stdout/stderr logs and mails you when experiments are completed.
JavaScript
2
star
48

wikiente

A named entity recogniser and linker based on DBPedia Spotlight, with support for the FoLiA format
Python
2
star
49

colibri-apps

Contains NLP applications using Colibri Core, suited for end-users. The applications are generally web-based.
OpenEdge ABL
2
star
50

wsd2

Python
2
star
51

colloquery

Web application for searching for phrases/collocations/synonyms in phrase translation tables
Python
2
star
52

lexmatch

Simple lexicon matcher against a text
Rust
2
star
53

colibri-utils

NLP utilities that rely on Colibri Core: currently only language identification
TeX
2
star
54

nlpsandbox

Natural Language Processing Sandbox - An experimental playground for all kinds of NLP tasks
Python
2
star
55

ssam

split sampler: split your data into multiple sets (e.g. train/test/development)
Rust
2
star
56

LaMachine-docker-test

Meta repository for docker testing of LaMachine on Travis-CI
1
star
57

dwm

my patched fork of dwm
C
1
star
58

unilang_ulr

Collection of open language resources from UniLang; containing mostly phrasebooks and stories
1
star
59

oersetter-models

Models for Oersetter, a Frisian<->Dutch Machine Translation system
1
star
60

chira

Chinese Reading Assistant, pop-up translations for Linux
Python
1
star
61

sxmo-svkbd

My fork of https://git.sr.ht/~mil/sxmo-svkbd
C
1
star
62

valkuil

Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.
Lex
1
star
63

aur-packages

Arch User Repository packages I maintain
Shell
1
star
64

cwrap

Small C wrapper to turn a C function into a very simple webservice
C
1
star
65

campyon

Campyon is both a command-line tool as well as Python library for viewing and manipulating columned data files. It supports various filters, statistics, visualisations, and plotting.
Python
1
star
66

vocavue

A vocabulary trainer with a view
JavaScript
1
star
67

lst-chat

JavaScript
1
star
68

homepage

My website
TeX
1
star
69

hyphertool

Command-line tool for syllabification and hyphenisation for multiple languages
Rust
1
star
70

lamastats

Generates statistical reports on the usage of our software and webservices
Python
1
star
71

charfreq

Very simply command-line tool that counts (unicode) character frequency from standard input
Rust
1
star
72

colibrita

Colibrita is a proof-of-concept translation assistance system, translating L1 fragments in an L2 context, using machine learning and statistical machine translation techniques
Python
1
star