• Stars
    star
    112
  • Rank 312,240 (Top 7 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NLP framework in python for entity recognition and relationship extraction

☝️ We moved

This library is not maintained anymore, and is only ocassionally receiving bugfixes.

We moved the functionality to train NER & Relation models to the text annotation tool, tagtog:

tagtog, The Text Annotation Tool to Train AI




PyPI version Build Status codecov

nalaf - (Na)tural (La)nguage (F)ramework

nalaf is a NLP framework written in python. The goal is to be a general-purpose module-based and easy-to-use framework for common text mining tasks. At the moment two tasks are covered: named-entity recognition (NER) and relationship extraction. These modules support both training and annotating. Associated to these, helper components such as cross-validation training or reading and conversion from different corpora formats are given. At the moment, NER is implemented with Conditional Random Fields (CRFs) and relationship extraction with Support Vector Machines (SVMs) using either linear or tree kernels.

Historically, the framework started from 2 joint theses at Rostlab at Technische Universität München with a focus on bioinformatics / BioNLP. Concretely the first goal was to do extraction of NL mutation mentions. Soon after another master's thesis used and generalized the framework to do relationship extraction of transcription factors (TF) interacting with gene or gene products. The nalaf framework is planned to be used in other BioNLP tasks at Rostlab.

As a result of the original BioNLP focus, some parts of the code are tailored to the biomedical domain. However, current efforts are underway to generalize all parts and this process is almost done. Development is not active and code maintenance is not guaranteed.

Current maintainer: Juan Miguel Cejuela (@juanmirocks).

Pipeline diagram (editable version on Lucidchart of the pipeline diagram; requires log in)

Install

Requires Python ^3.6

From PyPi

pip3 install nalaf
python3 -m nalaf.download_data

From source

git clone https://github.com/Rostlab/nalaf.git
cd nalaf
poetry shell
poetry install  # or run `poetry update` if you really want to update the dependencies' versions
python3 -m nalaf.download_data

Developing

See wiki

Test

nosetests

Run Examples

Run example_annotate.py for a simple example of annotation with a pre-trained NER model for protein names extraction:

  • python3 example_annotate.py -p 15878741 12625412
  • python3 example_annotate.py -s "This is c.A1003G an example" # see issue #159
  • python3 example_annotate.py -d resources/example.txt # see issue #159

More Repositories

1

goPredSim

Python
38
star
2

ConSurf

Evolutionary conservation estimation of residues or nucleotides
C++
32
star
3

bindPredict

Prediction of binding residues for metal ions, nucleic acids, and small molecules.
Python
31
star
4

JS16_ProjectA

In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.
JavaScript
28
star
5

EAT

Embedding-based annotation transfer (EAT) uses Euclidean distance between vector representations (embeddings) of proteins to transfer annotations from a set of labeled lookup protein embeddings to query protein embedding.
Python
22
star
6

DM_CS_WS_2016-17

Repo for general info of the course and communication
21
star
7

VESPA

VESPA is a simple, yet powerful Single Amino Acid Variant (SAV) effect predictor based on embeddings of the Protein Language Model ProtT5.
Python
17
star
8

ProNA2020

ProNA2020: System predicting protein-DNA, protein-RNA and protein-protein binding sites from sequence
Python
14
star
9

predictprotein-docker

Based off of the official Rostlab & PredictProtein website installation, as of 2020-09-07, the produced Docker image from this repository will result in a fully functioning predictprotein suite, including all of its required methods. Databases are not included.
Dockerfile
13
star
10

relna

Biomedical Relation Extraction for Transcription Factor and Gene / Gene Products (part of a Master Thesis at Rostlab, TUM)
HTML
12
star
11

JS16_ProjectF

In this project we will build a web portal for our GoT data analysis and visualization system. The website will integrate all the apps created in projects B-D with the help of the integration team assigned to Project E.
JavaScript
10
star
12

JS16_ProjectC_Group10

The known GoT world is vast and stretches over the three continents of Westeros, Essos and Sothorys. Readers of the Ice and Fire books will get acquainted and transported from King's Landing to the borders of the Seven Kingdoms, and further on across the Narrow Sea. Over two thousand characters mentioned in the books have been associated with multiple landmarks in the GoT world. Your mission is to find character-place associations and put those associations on an interactive GoT map. Such a tool will help us figure out where did Gregor “the hound” Clegane went on his travels and how are these travels coincide with the travels of Breanne of Tarth (hint: they never crossed paths in the books, however they had a deadly duel during the show).
JavaScript
9
star
13

FunFamsClustering

Python
8
star
14

SNAP2

SNP effect predictor
Perl
7
star
15

nala

Text mining of natural language mutations mentions
HTML
6
star
16

LocText

Relation Extraction (RE) of: Proteins <--> Cell Compartments
HTML
5
star
17

JS18_ProjectA_Group2

In this project we created the framework that translates natural language to data visualization creation. This project encompasses loading and querying data and creating simple graphs.
TypeScript
5
star
18

LambdaPP

JavaScript
4
star
19

LocNuclei

Prediction of subnuclear locations
Python
3
star
20

PredictProtein

PredictProtein is an automatic service for protein database searches and the prediction of aspects of protein structure and function.
Perl
3
star
21

JS16_ProjectB_Group6

Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
JavaScript
3
star
22

some-scripts

General-utility scripts that hopefully are useful for somebody
Python
2
star
23

PP2_CS_WS_2015-16

Communication and documentation for the class
2
star
24

LocTree3

Protein Subcelullar Localization Sequenced-Based Predictor
Roff
2
star
25

pssh-parser

A simple JS pssh parser
JavaScript
2
star
26

RostSpace

Python
2
star
27

JS16_ProjectD_Group5

Joffrey Baratheon is one of the most loathed characters in TV history. As a matter of fact people were celebrating his TV death on Twitter. We are interested to learn more on how people feel about different characters by analyzing tweets mentioning GoT characters. In this project you will be analyzing Twitter feeds across a timeline, you will look for the name of GoT characters in that feed and try to identify whether the tweet is positive or negative. You can then generate a metric that evaluates what is the accumulated sentiment expressed on Twitter for that given character at a given point in time, and what is the trend (positive, negative). It will be interesting to intersect the sentiments for characters following the airing of a certain episode (you can easily get the airing date for an episode from the database constructed in Project A).
JavaScript
2
star
28

someNA

Protein DNA/RNA binding predictor
Perl
1
star
29

MetaStudent

Sequence-based Protein GO / Functional Predictor
Python
1
star
30

MetaDisorder

Protein sequenced-based Disorder Predictor
Perl
1
star
31

bindadjust

Python
1
star
32

smiles-cl

Python
1
star
33

TMvis

Combining AlphaFold 2 structures with predicted transmembrane proteins into interactive 3D visualizations of protein structures embedded into membranes.
Python
1
star
34

JS18_ProjectB_Group3

JavaScript
1
star
35

JS16_ProjectB_Group7

Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
JavaScript
1
star