• Stars
    star
    129
  • Rank 277,641 (Top 6 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 14 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice.

CLAM: Computational Linguistics Application Mediator

https://github.com/proycon/clam/actions/workflows/clam.yml/badge.svg?branch=master Documentation Status Latest release in the Python Package Index http://applejack.science.ru.nl/lamabadge.php/clam Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed.

by Maarten van Gompel Centre for Language and Speech Technology, Radboud University Nijmegen & KNAW Humanities Cluster

Licensed under GPLv3

Website: https://proycon.github.io/clam Source repository: https://github.com/proycon/clam/ Documentation: https://clam.readthedocs.io Installation: pip install clam

CLAM allows you to quickly and transparently transform your Natural Language Processing application into a RESTful webservice, with which both human end-users as well as automated clients can interact. CLAM takes a description of your system and wraps itself around the system, allowing end-users or automated clients to upload input files to your application, start your application with specific parameters of their choice, and download and view the output of the application once it is completed.

CLAM is set up in a universal fashion, requiring minimal effort on the part of the service developer. Your actual NLP application is treated as a black box, of which only the parameters, input formats and output formats need to be described. Your application itself needs not be network aware in any way, nor aware of CLAM, and the handling and validation of input can be taken care of by CLAM.

CLAM is entirely written in Python, runs on UNIX-derived systems, and is available as open source under the GNU Public License (v3). It is set up in a modular fashion, and offers an API, and as such is easily extendable. CLAM communicates in a transparent XML format, and using XSL transformation offers a modern client-side generated web-interface for human end users.

Documentation

Documentation is available on https://clam.readthedocs.io

Some screenshots of the web user interface can be found below:

the clam project list

the clam project page during staging

the clam project page when done

Installation

Installation from the Python Package Index using the package manager pip it the recommended way to intall CLAM. This is the easiest method of installing CLAM, as it will automatically fetch and install any dependencies. We recommend to use a virtual environment (virtualenv) if you want to install CLAM locally as a user, if you want to install globally, prepend the following commands with sudo:

CLAM can be installed from the Python Package Index using pip. Pip is usually part of the python3-pip package (Debian/Ubuntu) or similar, note that Python 2.7 is not supported anymore (you might need to call pip3 instead of pip on older system). It downloads CLAM and all dependencies automatically::

$ pip install clam

If you already downloaded CLAM manually (from github), you can do:

$ pip install .
If pip is not yet installed on your system, install it using:

on debian-based linux systems (including Ubuntu):

$ apt-get install python3-pip

on RPM-based linux systems:

$ yum install python3-pip

Note that sudo/root access is needed to install globally. Ask your system administrator to install it if you do not own the system. Alternatively, you can install it locally in a Python virtual environment:

$ virtualenv --python=python3 env

Or:

$ python3 -m venv env

Then activate it as follows:

$ . env/bin/activate

(env)$ pip install clam

CLAM also has some optional dependencies. For MySQL support, install mysqlclient using pip. For FoLiA support, install FoLiA-Tools using pip.

Note: CLAM is designed for Linux-like systems, although the client and data library work everywhere, hosting webservices via clamservice may not work on Windows.

Running a test webservice

If you installed CLAM using the above method, then you can launch a clam test webservice using the development server as follows:

$ clamservice -H localhost -p 8080 clam.config.textstats

Navigate your browser to http://localhost:8080 and verify everything works

Note: It is important to regularly keep CLAM up to date as fixes and improvements are implemented on a regular basis. Update CLAM using:

$ pip install -U clam

More Repositories

1

pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Python
478
star
2

vocage

A minimalistic spaced-repetion vocabulary trainer (flashcards) for the terminal
Rust
142
star
3

colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
C++
123
star
4

flat

FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.
JavaScript
103
star
5

LaMachine

LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilation/installation script
Shell
68
star
6

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for processing FoLiA is implemented as part of PyNLPl, this contains higher-level tools that use the library as well as the full documentation, validation schemas, and set definitions
Python
60
star
7

python-frog

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)
Cython
47
star
8

analiticcl

an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction
Rust
30
star
9

python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).
Cython
29
star
10

codemetapy

A Python package for generating and working with codemeta
Python
24
star
11

gecco

Generic Environment for Context-Aware Correction of Orthography
Python
21
star
12

homeassistant-config

My elaborate home automation configuration + scripts
Python
21
star
13

dotfiles

My dotfiles
Shell
20
star
14

deepfrog

An NLP-suite powered by deep learning
Rust
19
star
15

hanzigrid

Hanzi grids for studying mandarin chinese (tool & output data)
HTML
18
star
16

foliapy

An extensive Python library for dealing with FoLiA (Format for Linguistic Annotation) documents, a rich XML-based format for linguistic annotation finding application in Natural Language Processing (NLP). This library was formerly part of PyNLPl.
Python
18
star
17

procmapgen

A small toy project written in Rust: procedural generation of various kinds of grid-based maps.
Rust
16
star
18

python-timbl

python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.
Python
16
star
19

spacy2folia

Use spaCy for NLP and output to the FoLiA XML format.
Python
12
star
20

foliatools

A number of command-line tools for working with FoLiA (Format for Linguistic Annotation). Includes validators, converters, visualisers, and more.
Python
10
star
21

pbmbmt

Phrase-based Memory-based Machine Translation
Python
10
star
22

unilangforum

UniLang Language Community - Forum
PHP
8
star
23

colibri

THIS PROJECT IS BEING RENDERED OBSOLETE BY NEWER VERSIONS colibri-core and colibri-mt !!
C++
7
star
24

valkuil-gecco

Nederlandse Spellingscontrole / Dutch spelling correction system - powered by Gecco
Python
7
star
25

nederlab-pipeline

Linguistic enrichment pipeline for historical dutch, as used in the Nederlab project
Groovy
7
star
26

anavec

Proof-of-concept spelling correction/normalisation system based on anagram vectors
Python
6
star
27

codemeta-harvester

Harvest and aggregate codemeta/schema.org software metadata from source repositories and service endpoints, automatically converting from known metadata schemes in the process
Shell
6
star
28

semeval2014task5

This is the official repository for SemEval 2014 Task 5: L2 Translation Assistant. It contains the gold standard learner corpus, evaluation results and the Python program library needed for the task. It does not contain a full translation assistance system.
HTML
5
star
29

foliadocserve

FoLiA Document Server - HTTP webservice backend for serving and annotating FoLiA documents using the FoLiA Query Language (FQL). Used by FLAT.
Python
5
star
30

piereling

Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.
Python
5
star
31

lingua-cli

Very small simple command-line interface for language detection using lingua-rs
Rust
5
star
32

colibri-mt

A Machine Translation framework that wraps around the Moses Decoder and enables k-NN classifier techniques to be used for modelling source-side-context
C++
5
star
33

babelente

BabelEnte: Entity Extractor and Translator using BabelFy and Babelnet.org
Python
4
star
34

labirinto

A web front-end portal for a virtual laboratory of NLP tools
Vue
4
star
35

clamservices

A collection of CLAM webservices for various of our Natural Language Processing tools
Python
4
star
36

folia-rust

FoLiA library for rust (alpha)
Rust
4
star
37

codemeta-server

Server for codemeta, in memory triple store, SPARQL endpoint and simple web-based visualisation for end-user
Python
4
star
38

sesdiff

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
Rust
4
star
39

alpino_clam_webservice

A CLAM-powered webservice for Alpino, a dependency parser for Dutch
Python
3
star
40

vocadata

Data for vocabulary learning
3
star
41

parseme-support

FoLiA & FLAT support for PARSEME
Python
3
star
42

spreek2schrijf

Scripts voor Spreek2Schrijf, een project met de Tweede Kamer
Python
3
star
43

svkbd

my fork of suckless' simple virtual keyboard: https://tools.suckless.org/x/svkbd/
C
3
star
44

sxmo-docs

my fork of https://git.sr.ht/~mil/sxmo-docs
Shell
2
star
45

aNtiLoPe

A collection of NLP pipelines powered by Nextflow
Groovy
2
star
46

sxmo-utils

my fork of https://git.sr.ht/~mil/sxmo-utils/
Shell
2
star
47

wrexp

Experiment Wrapper - A framework for launching and keeping track of experiments. Wrexp takes care of storing all stdout/stderr logs and mails you when experiments are completed.
JavaScript
2
star
48

wikiente

A named entity recogniser and linker based on DBPedia Spotlight, with support for the FoLiA format
Python
2
star
49

colibri-apps

Contains NLP applications using Colibri Core, suited for end-users. The applications are generally web-based.
OpenEdge ABL
2
star
50

wsd2

Python
2
star
51

colloquery

Web application for searching for phrases/collocations/synonyms in phrase translation tables
Python
2
star
52

lexmatch

Simple lexicon matcher against a text
Rust
2
star
53

colibri-utils

NLP utilities that rely on Colibri Core: currently only language identification
TeX
2
star
54

nlpsandbox

Natural Language Processing Sandbox - An experimental playground for all kinds of NLP tasks
Python
2
star
55

ssam

split sampler: split your data into multiple sets (e.g. train/test/development)
Rust
2
star
56

LaMachine-docker-test

Meta repository for docker testing of LaMachine on Travis-CI
1
star
57

dwm

my patched fork of dwm
C
1
star
58

unilang_ulr

Collection of open language resources from UniLang; containing mostly phrasebooks and stories
1
star
59

oersetter-models

Models for Oersetter, a Frisian<->Dutch Machine Translation system
1
star
60

chira

Chinese Reading Assistant, pop-up translations for Linux
Python
1
star
61

valkuil

Valkuil.net is een automatische spellingcorrector voor het Nederlands die zowel gewone typefouten als grammaticale fouten en verwarringen tussen bestaande woorden opspoort.
Lex
1
star
62

sxmo-svkbd

My fork of https://git.sr.ht/~mil/sxmo-svkbd
C
1
star
63

aur-packages

Arch User Repository packages I maintain
Shell
1
star
64

cwrap

Small C wrapper to turn a C function into a very simple webservice
C
1
star
65

campyon

Campyon is both a command-line tool as well as Python library for viewing and manipulating columned data files. It supports various filters, statistics, visualisations, and plotting.
Python
1
star
66

vocavue

A vocabulary trainer with a view
JavaScript
1
star
67

lst-chat

JavaScript
1
star
68

homepage

My website
TeX
1
star
69

hyphertool

Command-line tool for syllabification and hyphenisation for multiple languages
Rust
1
star
70

lamastats

Generates statistical reports on the usage of our software and webservices
Python
1
star
71

charfreq

Very simply command-line tool that counts (unicode) character frequency from standard input
Rust
1
star
72

colibrita

Colibrita is a proof-of-concept translation assistance system, translating L1 fragments in an L2 context, using machine learning and statistical machine translation techniques
Python
1
star