Text Mining Unit at BSC (@TeMU-BSC)

Top repositories

1

codiesp-evaluation-script

Evaluation library for CodiEsp Task
Python
7
star
2

PharmaCoNER-Tagger

PharmaCoNER Tagger is a Neural Named Entity Recognition program targeting domain adaptation, particularly in the case of Spanish medical texts. It is based on NeuroNER.
Python
6
star
3

TEMUNormalizer

Baseline term normalizer to find Snomed and CIE-10 codes in a list of terms
Python
5
star
4

AnonymizationPipeline

Anonymization Pipeline for injesting data from outside of BSC that contains GDPR protected data.
Python
4
star
5

corpus-cleaner-acl

Python
3
star
6

web-scrapping

Repository that contains all web scrapping scripts written by text mining group
Python
3
star
7

iaa-computation

Compute Inter Annotator Agreement from Brat files
Python
2
star
8

catalan_CC0_sentences

collected CC0 sentences written in Catalan
2
star
9

iberifier

Jupyter Notebook
2
star
10

embeddings_v1.0

2
star
11

spanish-person-names-generator

Generator of Spanish names based on the lists of INE
Python
2
star
12

temu-webpage

Landing page of the Text Mining Unit at Barcelona Supercomputing Center.
TypeScript
2
star
13

demos

Web demos for some text mining projects
JavaScript
2
star
14

ASIT

Advanced Semantic Indexing Tool
TypeScript
2
star
15

TemuSTS

Programa de anรกlisis de frases similares en dos corpus.
Python
2
star
16

distemist_evaluation_library

Python
2
star
17

Biomedical_NER_models

1
star
18

mesinesp-workflow

Personalized Mailing and Evaluation engines for MESINESP task.
HTML
1
star
19

wmt2021-indoeuropean

Python
1
star
20

socialdisner_evaluation_script

Python
1
star
21

clinical-nested-ner

Python
1
star
22

indexer

DeCS Indexer frontend and backend for MESINESP task.
Python
1
star
23

seq-to-seq-catalan

Sequence to sequence language resources for Catalan and for two tasks, namely: Summarization and Machine Translation.
Jupyter Notebook
1
star
24

cantemist-evaluation-library

Compute evaluation metrics for Cantemist submissions
Python
1
star
25

language-model-prepro

Preprocessing scripts for language models
Python
1
star
26

spactes

Apache SpaCTeS-cTAKES
Java
1
star
27

meddoprof-evaluation-library

Python
1
star
28

brat-merger

Python
1
star
29

compare-annotations

Qualitative analysis and Quantifying Variability of Manual Annotations
Python
1
star
30

detect-annotations

Detect missed annotations in BRAT files based on previous annotations (from other files).
Python
1
star
31

meddoprof-baseline

Baseline for the MEDDOPROF Shared Task on occupation detection and normalization.
Python
1
star
32

BioTextMiner

BioTextMiner is a web application developed by the NLP4BIA that provides a user-friendly interface for corpus control of biomedical corpora. With BioTextMiner, researchers can easily manage and manipulate large-scale biomedical text data by organizing and curating it in a centralized database
Python
1
star
33

json-converters

Python
1
star
34

medprocner_evaluation_library

Evaluation library for the MedProcNER/ProcTEMIST shared task (https://temu.bsc.es/medprocner/)
Python
1
star