ÚFAL (@ufal)

Top repositories

1

whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation
Python
1,782
star
2

neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
Python
410
star
3

udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
C++
344
star
4

acl2019_nested_ner

Source code for paper Neural Architectures for Nested NER through Linearization
Python
91
star
5

unilib

Embeddable C++17 Unicode library offering UTF encodings, general category info, simple and full casing, normalization forms, and combining marks stripping.
C++
73
star
6

morphodita

MorphoDiTa: Morphologic Dictionary and Tagger
C++
65
star
7

public-license-selector

Tool that will help you select the right open license for your data or software
CoffeeScript
52
star
8

perin

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020
Python
44
star
9

nametag

NameTag: Named Entity Tagger
C++
38
star
10

mtmonkey

Distributed infrastructure for Machine Translation web services (using Moses, Python, JSON-RPC/web interface)
Python
33
star
11

treex

Treex NLP framework
Perl
33
star
12

npfl114

Materials for the Deep Learning -- ÚFAL course NPFL114
Python
29
star
13

npfl129

NPFL129 repository
Python
29
star
14

lindat-translation

Frontend of LINDAT translation service
Python
25
star
15

factgenie

Lightweight self-hosted span annotation tool
JavaScript
19
star
16

augpt

DSTC9 Submission
Python
18
star
17

korektor

Statistical spell- and (occasional) grammar-checker.
C++
17
star
18

npfl117

Deep Learning Seminar -- ÚFAL course NPFL117
17
star
19

multilexnorm2021

MultiLexNorm 2021 competition system from ÚFAL
Python
15
star
20

parsito

Parsito: Fast non-projective transition-based dependency parser
C++
14
star
21

npfl122

NPFL122 repository
Python
13
star
22

microrestd

MicroRestD is a small C++11 cross-platform REST server built on top of libmicrohttpd http://www.gnu.org/software/libmicrohttpd/.
C++
13
star
23

low-resource-gec-wnut2019

Source code for paper Grammatical Error Correction in Low-Resource Scenarios (W-NUT 2019)
Python
11
star
24

correctable-lecture-translator

A system for live lecture translation (speech to text) where the audience can easily provide corrections.
Python
9
star
25

olimpic-icdar24

Practical End-to-End Optical Music Recognition for Pianoform Music
Python
9
star
26

pytreex

A minimal Python implementation of the Treex API
Python
8
star
27

linpipe

LinPipe: Multilingual Processing Tool
C
8
star
28

nlgi_eval

NLI evaluation for NLG
Python
8
star
29

chu_liu_edmonds

Chu-Liu-Edmonds maximum spanning algorithm from TurboParser for use within Python
C++
7
star
30

marian-tensorboard

a simple tool to parse marian training logs and display them in tensorboard
Python
7
star
31

sigmorphon2019

UFAL-Prague entry to the Sigmorphon 2019 Shared Task 2
Python
6
star
32

hamledt

Makefiles, scenarios and support scripts for the development of HamleDT within the Treex infrastructure
Makefile
6
star
33

wnut2021_character_transformations_gec

The code from the paper Character Transformations for Non-Autoregressive GEC Tagging
Python
6
star
34

lindat-repository-obsolete

LINDAT/CLARIN repository for linguistics (http://lindat.cz)
Java
6
star
35

charles-translator-web-frontend

Charles Translator: MT from Charles University
TypeScript
6
star
36

clarin-sp-aaggregator

PHP
5
star
37

mrpipe-conll2019

ÚFAL MRPipe submission to CoNLL 2019 shared task
Python
5
star
38

slimd

SliMD presentation system based on Markdown and HTML5&js.
JavaScript
5
star
39

universal-segmentations

Build scripts for the UniSegments collection of morphologically segmented lexicons for many languages
Python
5
star
40

UFAL_poster

Latex repository for a poster design
TeX
4
star
41

bert-diacritics-restoration

Repository storing code and data for our paper "Diacritics Restoration using BERT with Analysis on Czech language".
Python
4
star
42

MLASK

EACL 2023 paper "MLASK: Multimodal Summarization of Video-based News Articles"
Python
4
star
43

evalatin2024-latinpipe

LatinPipe – the winning entry to parsing task of EvaLatin 2024
Python
4
star
44

optimal-reference-translations

Python
4
star
45

conll2017

CoNLL 2017 Shared Task Proposal: UD End-to-End parsing
Perl
3
star
46

wiki-error-corpus

Scripts for extracting errors from Wikipedia revisions
Python
3
star
47

weighteddist

A tiny toolkit for weighted word/character edit distance, including cost estimation.
C
3
star
48

rg

ÚFAL Reading Group
3
star
49

thesis_info

ÚFAL Thesis Information Repository
Python
3
star
50

perl-pmltq

Query engine and query language for trees in PML format
Perl
3
star
51

rh_nntagging

Reading Hackathon -- NN Tagging Project
Python
3
star
52

perl-pmltq-server

Refactored and simplified PMLTQ::CGI
Perl
3
star
53

pcedt2.0-coref

Coreference extension to Prague Czech-English Dependency Treebank 2.0
Makefile
3
star
54

kazitext

Python
3
star
55

corefud-scorer

Coreference and anaphora scorer for CorefUD data
Python
3
star
56

quickjudge

A handy tool for quick manual evaluation of line-oriented outputs, e.g. of machine translation.
Perl
3
star
57

teitok-tools

Conversion tools to and from the TEITOK TEI/XML format
Perl
2
star
58

conll2018

CoNLL 2018 UD Shared Task
Perl
2
star
59

charles-translator-android

Android app of LINDAT translation service
Kotlin
2
star
60

crac2023-corpipe

ÚFAL CorPipe: CRAC 2023 Winning System for Multilingual Coreference Resolution
Python
2
star
61

qtleap

QTLeap Pilot MT systems using TectoMT
Perl
2
star
62

PDT-C

Consolidated Czech PDT-style annotated corpus; consists of PDT, Czech part of PCEDT, PDTSC, PDT-Faust
2
star
63

lindat-corpora-conversions

LINDAT Corpora Conversions
Python
2
star
64

lindat-aai-attributes

Parse shibboleth logs for important information about attributes from IdPs and other
XSLT
2
star
65

ufal-tools

Perl
2
star
66

deltacorpus

Delexicalized tagging and parsing.
Python
2
star
67

js-treex-view

Javascript library for visualizing Treex files
JavaScript
2
star
68

phd-thesis-template

A template PhD thesis at UFAL
TeX
2
star
69

cpp_builtem

C++ Builtem is a cross-platform Makefile-based build system for C++11
Shell
2
star
70

ambiguity-grammaticality-complexity

Code for the paper Sentence Ambiguity, Grammaticality and Complexity Probes
Python
2
star
71

lindat-common

Common files and branding for Lindat projects
JavaScript
2
star
72

crac2022-corpipe

ÚFAL CorPipe: CRAC 2022 Winning System for Multilingual Coreference Resolution
Python
2
star
73

lindat_piwik_reports

Cashing important counts from PIWIK periodically and creating customized reports for LINDAT/CLARIN
JavaScript
2
star
74

eyetracked-multi-modal-translation

EMMT (Eyetracked Multi-Modal Translation), a simultaneous eye-tracking, 4-electrode EEG and audio corpus for multi-modal reading and translation scenarios
2
star
75

uk-cs-data-scripts

Scripts for processing data for Czech-Ukrainian MT
Python
2
star
76

errant_czech

Python
2
star
77

UFAL_MT_service

Python
1
star
78

nametag3

NameTag3: Named Entity Tagger
Python
1
star
79

mrptask

Perl
1
star
80

lindat-aai-discovery

HTML
1
star
81

pyclarindspace

Python package using clarin-dspace API
Jupyter Notebook
1
star
82

ParCzech

ParCzech is a project on compiling Czech parliamentary data into annotated corpora.
GLSL
1
star
83

theaitrobot

THEaiTRE bot
Python
1
star
84

auto-hume

Semantic MT metric trained on HUME annotations
Python
1
star
85

npfl101

Repository of the seminar NPFL101 Competing in Machine Translation.
Shell
1
star
86

bilingual-abstracts-corpus

Bilingual corpus of scientific abstracts from ÚFAL Charles University publications.
Python
1
star
87

continuous-rating

PHP
1
star
88

tamiltb

Makefile
1
star
89

nmt-pe-effects-2021

Experiment relating NMT quality and post-editing efforts
Jupyter Notebook
1
star
90

MTEQA

Python
1
star
91

cpp_utils

UFAL C++ Utils
C++
1
star
92

europarlmin

Corpus of European Parliament debates organized as a corpus for meeting summarization, i.e. matching full transcripts and minutes from the sessions. Used in the shared task of AutoMin 2023.
1
star
93

pmltq-cgi

PMLTQ::CGI has been removed from PMLTQ module in order to decrease number of dependencies. It should be installed separately.
Perl
1
star
94

qsubmit

A wrapper over various grid submission scripts
Python
1
star
95

SynSemClassSearch

JavaScript
1
star
96

ker

Simple Czech and English keyword extractor
Python
1
star
97

npfl087

NPFL087 Statistical Machine Translation
Shell
1
star
98

diaser

Python
1
star
99

treex-web

Online interface for Treex
JavaScript
1
star
100

wembedding_service

TF2 service for word embeddings computation
Python
1
star