eXascale Infolab (@eXascaleInfolab)

Top repositories

1

PyExPool

Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
Python
163
star
2

LFR-Benchmark_UndirWeightOvp

Extended version of the Lancichinetti-Fortunato-Radicchi Benchmark for Undirected Weighted Overlapping networks to evaluate clustering algorithms using generated ground-truth communities
C++
79
star
3

Flashback_code

Python
45
star
4

JUST

Python
45
star
5

LBSN2Vec

Code release for LBSN2Vec
C
44
star
6

HINGE_code

Python
38
star
7

bench-vldb20

C
35
star
8

NodeSketch

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching
Python
33
star
9

TRank

Ranking Entity Types using the Web of Data
Scala
30
star
10

RETA_code

Python
30
star
11

ActiveLink

Deep active learning framework for link prediction in knowledge graph
Python
24
star
12

HistoSketch

Implementation of HistoSketch and D2HistoSketch in MATLAB
MATLAB
20
star
13

GenConvNMI

Generalized Conventional Mutual Information (GenConvMI) - NMI for overlapping (soft, fuzzy) clusters (communities), compatible with standard NMI, pure C++ version (single executable)
C++
20
star
14

clubmark

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling of Clustering (Community Detection) Algorithms Considering Overlaps (Covers)
Python
20
star
15

pytrec_eval

A library to evaluate TREC-like runs with TREC-like qrels. Implements similarity of rankings, ttest between runs etc…
Python
19
star
16

PyCABeM

Python Benchmarking Framework for the Clustering Algorithms Evaluation: networks generation and shuffling; failover execution and resource consumption tracing (peak RAM RSS, CPU, ...); evaluation of Modularity, conductance, NMI and F1 Score for overlapping communities
Python
19
star
17

MARTA

Python
18
star
18

xmeasures

Extremely fast evaluation of the extrinsic clustering measures: various (mean) F1 measures and Omega Index (Fuzzy Adjusted Rand Index) for the multi-resolution clustering with overlaps/covers, standard NMI, clusters labeling
C++
18
star
19

TSM-Bench

Comprehensive Benchmark for Time Series Database Systems
Jupyter Notebook
15
star
20

fashion_nlp_v2

FashionBrain D2.1: Named Entity Recognition and Linking Methods
Python
11
star
21

fashionNLP

Python
6
star
22

orbits

C#
6
star
23

PyNetConvert

Network (Graph) Format Converter: RCG, Pajek, Metis, NSL (NCol, SNAP, ...), Mathlab
Python
6
star
24

daoc

DAOC (Deterministic and Agglomerative Overlapping Clustering algorithm): Stable Clustering of Large Networks
C++
6
star
25

StaTIX

Statistical Type Inference (both fully automatic and semi supervised) for RDF datasets
Java
6
star
26

GraphEmbEval

Graph (network) embeddings evaluation framework via classification, gram martix construction for links prediction
Python
6
star
27

pSCAN

pSCAN: Fast and Exact Structural Graph Clustering (with overlaps)
C
5
star
28

TaxoComplete

his is the repositotry of TaxoComplete: Self-Supervised Taxonomy Completion Leveraging Position-Enhanced Semantic Matching
Python
5
star
29

sanaphor

Python
5
star
30

2018-Internship-TableDetection

This repository contains the pipeline for table detection/extraction from 'Bundesarchive' documents.
HTML
5
star
31

CORAD

CORAD: Correlation-Aware Compression of Massive Time Series using Sparse Dictionary Coding
Python
5
star
32

Wiki2Prop

The companion material for the Wiki2Prop Paper
Python
5
star
33

OpenCrowd

Python
4
star
34

WDCFramework

clone of https://www.assembla.com/spaces/commondata/subversion/source/HEAD/WDCFramework/trunk
Java
4
star
35

daor

DAOR Parameter-free Embedding Framework for Large Graphs (Networks)
C++
4
star
36

cardinal

Source Code and Companion Material of the Non-Parametric Class Completeness Estimators
Python
3
star
37

entity-disambiguation-data-ecir2013

3
star
38

2016-armatweet

NLP components of ArmaTweet devoted to converting tweets into quads of the form (`subject`, `predicate`, `object`, `location`) where `subject`, `object`, and `location` are DBpedia resources, and `predicate` is a WordNet synset.
Scala
3
star
39

axel

Project for exploratory search on scientific articles
Python
3
star
40

thesis_template

Latex template for XI BSc/MSc thesis
TeX
3
star
41

hirecs

High Resolution Hierarchical Clustering with Stable State
C++
3
star
42

seer

CSS
3
star
43

NetHash

NetHash algorithm from IJCAI 2018
C++
3
star
44

typhon

Deep Learning framework that trains a single model using multiple, heterogeneous datasets leveraging parallel transfer, strictly enforcing feature generalization and even preventing overfitting
Python
3
star
45

inFlux

Task Flow Control
JavaScript
2
star
46

wd-graph

A toolset to work with the Wikidata Graph
Python
2
star
47

WDCTools

Scala
2
star
48

timesvd_vc

Python
2
star
49

SNF_disambiguation

2
star
50

Event-Detection-Twitter

This is the repository for data related to our submission to TKDE titled "Event Detection on Microposts: a Comparison of Four Approaches".
2
star
51

vadetis

Jupyter Notebook
2
star
52

pgpr

Python
2
star
53

preposition-data-cikm2014

Datasets with preposition corrections for CIKM 2014 paper
2
star
54

resmerge

Resolution levels clustering merger with filtering and clusters deduplication. Flattens a hierarchy/list of multiple resolutions levels (clusterings) into the single flat clustering (collection), synchronizing the node base and deduplicating.
C++
2
star
55

typhon_exp

Experiments for the paper: "Typhon: Parallel Transfer on Heterogeneous Datasets for Cancer Detection in Computer-Aided Diagnosis"
Python
1
star
56

cdrec

C++
1
star
57

ase-lab

Lab of Time Series Database Systems
Python
1
star
58

scala_utils

Few Scala utils...
Java
1
star
59

ReVival-Code

PHP
1
star
60

nif-entity-linking-webservice

JavaScript
1
star
61

interval_index

A full-set of data structures and experimental data for CINTIA paper
C++
1
star
62

CDTool

C++
1
star
63

bench-vldb20_full

C
1
star
64

oslom2

Sources of the OSLOM2 (v2.5) clustering algorithm with slightly extended I/O for the benchmarking under Clubmark
C++
1
star
65

tag-recommendation-data-iswc2012

Dataset for the " Tag recommendation" paper from ISWC 2012
1
star
66

scientific_NER_dataset

Judged dataset for NER in scientific documents
1
star
67

2019_kais-bench

AGS Script
1
star
68

BonusBar

BonusBar Django project. An HCI prototype for worker retention.
JavaScript
1
star
69

libMoji

The implementation of Moji Visualizations
JavaScript
1
star
70

WikidataSectionLinks

Python
1
star
71

TInfES

Type Inference Evaluation Scripts & Accessory Apps (used for the StaTIX benchmarking)
Python
1
star
72

JOINER_code

C++
1
star
73

sds2020_web_table_annotation

SDS2020 - Annotating Web Tables through Knowledge Bases: A Context-Based Approach
Python
1
star
74

Wikipedia30

A collections of 30 random Wikipedia pages manually annotated with entities.
1
star
75

HIT-Scheduler

Opensource, HIT Scheduling backend for Amazon Mechanical Turk.
JavaScript
1
star
76

SMA-17s_CommunityDetection

Community detection programming exercises for the SMA-17s course
Jupyter Notebook
1
star
77

ASE-lab-2023

Time Series Database System Lab 2023
1
star
78

CGGC

RG (Randomized Greedy clustering), CGGC_RG (Core Groups Graph ensemble Clustering) or CGGCi_RG (Core Groups Graph ensemble Clustering Iterative) algorithms
C++
1
star