• Stars
    star
    45
  • Rank 612,313 (Top 13 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Source code for several Metanome data profiling algorithms

More Repositories

1

Metanome

The source repository of the Metanome tool
Java
169
star
2

TimeEval-algorithms

Time series anomaly detection algorithm implementations for TimeEval (Docker-based)
Python
60
star
3

timeeval-evaluation-paper

Supporting material and website for the paper "Anomaly Detection in Time Series: A Comprehensive Evaluation"
44
star
4

TimeEval

Evaluation Tool for Anomaly Detection Algorithms on Time Series
Jupyter Notebook
38
star
5

gutentag

GutenTAG is an extensible tool to generate time series datasets with and without anomalies.
Python
37
star
6

snowman

Welcome to Snowman App – a Data Matching Benchmark Platform.
TypeScript
37
star
7

TimeEval-GUI

[Read-Only Mirror] Benchmarking Toolkit for Time Series Anomaly Detection Algorithms using TimeEval and GutenTAG
Python
14
star
8

inclusion-dependency-algorithms

This repository provides the implementation of several well-know INDs discovery algorithms
Java
12
star
9

akka-tutorial

Code for the Akka tutorial
Java
11
star
10

Quagga

An email segmentation system (reference implementation of ECIR 2018 paper)
Python
10
star
11

QuaggaLib

An Email Segmentation System
Python
9
star
12

DADS

Distributed detection of sequential anomalies in univariate time series
Java
9
star
13

Pollock

Pollock is a benchmark for data loading on character-delimited files.
Python
8
star
14

Mondrian

Code repository for Mondrian, a project for multiregion template recognition in spreadsheets.
Python
7
star
15

DECENT

Python
7
star
16

enno

Text Annotation tool that is hopefully less painful to use than others
JavaScript
6
star
17

ProLOD

ProLOD++ contains algorithms to perform data profiling on Linked Data.
Scala
5
star
18

DQ4AI

Experimental study of the effects of data quality dimensions on machine learning performance
Jupyter Notebook
4
star
19

ELEX

A graph exploration tool that makes it easier to understand graphs.
JavaScript
4
star
20

DataGossip

DataGossip is an extension for asynchronous distributed data parallel machine learning that improves the training on imbalanced partitions.
Jupyter Notebook
3
star
21

NumbER

Entity Resolution for Numerical Data
Python
3
star
22

ExtracTable

Extract tables from Plain-Text Files.
Jupyter Notebook
2
star
23

S2Gpp

Rust
2
star
24

GAP-Gender-Analysis-for-Publications

The code behind the platform csgender.org
Python
2
star
25

GenderAnalysis

An analysis of gender distribution in scientific publications
Python
2
star
26

wikipedia_cleanup

MP 21/22
Jupyter Notebook
2
star
27

Sawfish

sIND Discovery
Java
2
star
28

pyro

Pyro is an algorithm to detect approximate keys and functional dependencies in relational datasets.
Java
2
star
29

s2gpp_experiments

Experiments for Series2Graph++ Paper
Python
2
star
30

ner-text-quality-impact

Python
1
star
31

ChangeTimeSeriesClustering

Scala-Spark Framework to cluster changes in data
Scala
1
star
32

winepi_serial_length2

Jupyter Notebook
1
star
33

flink-kmeans

Scala
1
star
34

Metanome-Frontend

Frontend for the Metanome Project
CSS
1
star
35

SURAGH

The source repository of the SURAGH.
Java
1
star
36

cmt_statistics_tool

CMTStat - a CMT Statistics Tool
Python
1
star
37

spark-tutorial

Code for the Spark tutorial
Scala
1
star
38

spark-kmeans

Scala
1
star
39

TAHARAT

Java
1
star
40

wikipediatablevandalism

A vandalism detection system for table edits on Wikipedia
Jupyter Notebook
1
star
41

art-ner-dataset

Data and code from the paper "Generation of Training Data for Named Entity Recognition of Artworks"
Python
1
star
42

DBS1-Exercise

Kotlin
1
star
43

Armadillo

Table Overlap Approximation and Datasets
Python
1
star