• Stars
    star
    1,087
  • Rank 41,165 (Top 0.9 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Datumbox Machine Learning Framework

Build Status Windows Build status Maven Central License

Datumbox

The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid development Machine Learning and Statistical applications. The main focus of the framework is to include a large number of machine learning algorithms & statistical methods and to be able to handle large sized datasets.

Copyright & License

Copyright (C) 2013-2020 Vasilis Vryniotis.

The code is licensed under the Apache License, Version 2.0.

Installation & Versioning

Datumbox Framework is available on Maven Central Repository.

The latest stable version of the framework is 0.8.2 (Build 20200805). To use it, add the following snippet in your pom.xml:

    <dependency>
        <groupId>com.datumbox</groupId>
        <artifactId>datumbox-framework-lib</artifactId>
        <version>0.8.2</version>
    </dependency>

The latest snapshot version of the framework is 0.8.3-SNAPSHOT (Build 20201014). To test it, update your pom.xml as follows:

    <repository>
       <id>sonatype-snapshots</id>
       <name>sonatype snapshots repo</name>
       <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>

    <dependency>
        <groupId>com.datumbox</groupId>
        <artifactId>datumbox-framework-lib</artifactId>
        <version>0.8.3-SNAPSHOT</version>
    </dependency>

The develop branch is the development branch (default github branch), while the master branch contains the latest stable version of the framework. All the stable releases are marked with tags.

The releases of the framework follow the Semantic Versioning approach. For detailed information about the various releases check out the Changelog.

Documentation and Code Examples

All the public methods and classes of the Framework are documented with Javadoc comments. Moreover for every model there is a JUnit Test which clearly shows how to train and use the models. Finally for more examples on how to use the framework checkout the Code Examples or the official Blog.

Pre-trained Models

Datumbox comes with a large number of pre-trained models which allow you to perform Sentiment Analysis (Document & Twitter), Subjectivity Analysis, Topic Classification, Spam Detection, Adult Content Detection, Language Detection, Commercial Detection, Educational Detection and Gender Detection. To get the binary models check out the Datumbox Zoo.

Which methods/algorithms are supported?

The Framework currently supports performing multiple Parametric & non-parametric Statistical tests, calculating descriptive statistics on censored & uncensored data, performing ANOVA, Cluster Analysis, Dimension Reduction, Regression Analysis, Timeseries Analysis, Sampling and calculation of probabilities from the most common discrete and continues Distributions. In addition it provides several implemented algorithms including Max Entropy, Naive Bayes, SVM, Bootstrap Aggregating, Adaboost, Kmeans, Hierarchical Clustering, Dirichlet Process Mixture Models, Softmax Regression, Ordinal Regression, Linear Regression, Stepwise Regression, PCA and several other techniques that can be used for feature selection, ensemble learning, linear programming solving and recommender systems.

Bug Reports

Despite the fact that parts of the Framework have been used in commercial applications, not all classes are equally used/tested. Currently the framework is in Alpha version, so you should expect some changes on the public APIs on future versions. If you spot a bug please submit it as an Issue on the official Github repository.

Contributing

The Framework can be improved in many ways and as a result any contribution is welcome. By far the most important feature missing from the Framework is the ability to use it from command line or from other languages such as Python. Other important enhancements include improving the documentation, the test coverage and the examples, improving the architecture of the framework and supporting more Machine Learning and Statistical Models. If you make any useful changes on the code, please consider contributing them by sending a pull request.

Acknowledgements

Many thanks to Eleftherios Bampaletakis for his invaluable input on improving the architecture of the Framework. Also many thanks to ej-technologies GmbH for providing a license for their Java Profiler and to JetBrains for providing a license for their Java IDE.

Useful Links

More Repositories

1

NaiveBayesClassifier

JAVA implementation of Multinomial Naive Bayes Text Classifier.
Java
94
star
2

Game-2048-AI-Solver

Implementation of Game 2048 with an Artificial Intelligence Solver written in JAVA.
Java
68
star
3

twitter-sentiment-analysis

A tool which performs Sentiment Analysis on Twitter by using Datumbox API.
PHP
52
star
4

datumbox-framework-examples

Code examples on how to use the Datumbox Machine Learning Framework.
Java
41
star
5

facebook-sentiment-analysis

A tool which performs Sentiment Analysis on Facebook posts by using Datumbox API.
PHP
23
star
6

DPMM-Clustering

Java implementation of Dirichlet Process Mixture Model.
Java
18
star
7

datumbox-framework-zoo

Pre-trained models for Datumbox Machine Learning Framework.
14
star
8

lpsolve

Java wrapper for lpsolve library
C++
8
star
9

DataEnvelopmentAnalysis

JAVA implementation of Data Envelopment Analysis algorithm with an example of how to use it to calculate the Social Media Popularity of webpages.
Java
6
star
10

Hodgkin-Huxley-Neuron

Hodgkin-Huxley-Neuron
PHP
4
star
11

wordpress-machine-learning-antispam

This Wordpress Plugin uses Machine Learning to detect spam and adult content comments and mark them as spam. Additionally it allows you to filter negative comments and keep them pending for approval.
PHP
3
star
12

dapi-model-versioning

RFC for Model Versioning across all PyTorch Domain libraries
Python
2
star
13

torchvision-models

2
star
14

ansible-jprofiler

Ansible role for jprofiler
1
star
15

LDA-KNN-Image-Classifier

Implementation of an Image Classifier that uses the Linear Discriminant Analysis and K-Nearest Neighbors algorithms.
1
star
16

datumbox-api-client-php

Example of Datumbox API Client in PHP
PHP
1
star