This Package is part of an expected set of packages implementing machine learning capabilities for Racket. The core of this package is the management of datasets, these data sets are assumed to be for training and testing of machine learning capabilities. This package does not assume anything about such capabilities, and uses an expansive notion of machine learning that should cover statistical inferencing, tree and decision matrix models, as well as deep leaning approaches.
This module deals with two opaque structure types, data-set
and data-set-field
. These are not
available to clients directly although certain accessors are exported by this module.
Conceptually a data-set
is a table of data, columns represent fields that are either features
that represent properties of an instance, and classifiers or labels that are used to train
and match instances.
See the rml-knn (not quite there yet) repository for an example capability built upon this package.
data
- manages data sets, load from CSV and JSON files, save and load snapshots as well as manage partitions and statistics.individual
- manages individuals when classifying against a data set.classify
- describes a contract for classifier functions and a set of higher-order cross-classifiers over data sets.results
- provides a confusion matrix that records the results of classification as a mapping from true to predicted values.not-implemented
- really a convenience for raisingfail:unsupported
exceptions.
The following example loads a sample data set and displays some useful information before_script writing a snapshot to the current output port.
(require rml/data.rkt)
(define iris-data-set
(load-data-set "iris_training_data2.csv"
'csv
(list
(make-feature "sepal-length" #:index 0)
(make-feature "sepal-width" #:index 1)
(make-feature "petal-length" #:index 2)
(make-feature "petal-width" #:index 3)
(make-classifier "classification" #:index 4))))
(displayln (classifier-product dataset))
(newline)
(displayln (feature-statistics dataset "sepal-width"))
(newline)
(write-snapshot dataset (current-output-port))
(newline)
(for ([row (result-matrix-formatted (make-result-matrix dataset))])
(displayln row))
The result of feature-statistics
returns an instance of the statistics
struct from math/statistics
.