• Stars
    star
    32
  • Rank 775,648 (Top 16 %)
  • Language
    Julia
  • License
    Other
  • Created over 7 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Utility package for working with classification targets and label-encodings

MLLabelUtils

Utility package for working with classification targets. As such, this package provides the necessary functionality for interpreting class-predictions, as well as converting classification targets from one encoding to another.

Package Status Package Evaluator Build Status
License Documentation Status PkgEval CI Coverage Status

Introduction

In a classification setting, one usually treats the desired output variable (also called ground truths, or targets) as a discrete categorical variable. That is true even if the values themself are of numerical type, which they often are for practical reasons.

In fact, it is a common requirement in Machine Learning related experiments to encode the classification targets of some supervised dataset in a very specific way. There are multiple conventions that all have their own merits and reasons to exist. Some models, such as the probabilistic version of logistic regression, require the targets in the form of numbers in the set {1,0}. On the other hand, margin-based classifier, such as SVMs, expect the targets to be in the set {1,−1}.

This package provides the functionality needed to deal will these different scenarios in an efficient, consistent, and convenient manner. In particular, this library is designed with package developers in mind, that require their classification-targets to be in a specific format. To that end, the core focus of this package is to provide all the tools needed to deal with classification targets of arbitrary format. This includes asserting if the targets are of a desired encoding, inferring the concrete encoding the targets are in and how many classes they represent, and converting from their native encoding to the desired one.

Example

The following code snippets show a simple "hello world" scenario of how this package can be used to work with classification targets.

using MLLabelUtils

We can automatically derive the used encoding from the targets using labelenc. This function looks at all elements and tries to determine which specific encoding best describes the target array.

julia> true_targets = Int8[0, 1, 0, 1, 1];

julia> le = labelenc(true_targets)
# MLLabelUtils.LabelEnc.ZeroOne{Int8,Float64}(0.5)

To just determine if a specific encoding is approriate one can use the function islabelenc.

julia> islabelenc(true_targets, LabelEnc.ZeroOne)
# true

julia> islabelenc(true_targets, LabelEnc.MarginBased)
# false

Furthermore we can compute a label map, which computes the indices of all elements that belong to each class. This information is useful for resampling strategies, such as stratified sampling

julia> true_targets = [:yes,:no,:maybe,:yes];

julia> labelmap(true_targets)
# Dict{Symbol,Array{Int64,1}} with 3 entries:
#   :yes   => [1,4]
#   :maybe => [3]
#   :no    => [2]

If need be we can convert to other encodings. Note that unless explicitly specified, we try to preserve the eltype of the input. However, this behaviour only comes to play in the case of numbers.

julia> true_targets = Int8[0, 1, 0, 1, 1];

julia> convertlabel([:yes,:no], true_targets) # Equivalent to LabelEnc.NativeLabels([:yes,:no])
# 5-element Array{Symbol,1}:
#  :no
#  :yes
#  :no
#  :yes
#  :yes

julia> convertlabel(LabelEnc.MarginBased, true_targets) # Preserves eltype
# 5-element Array{Int8,1}:
#  -1
#   1
#  -1
#   1
#   1

julia> convertlabel(LabelEnc.MarginBased(Float32), true_targets) # Force new eltype
# 5-element Array{Float32,1}:
#  -1.0
#   1.0
#  -1.0
#   1.0
#   1.0

For encodings that can be multiclass, the number of classes can be inferred from the targets, or specified explicitly.

julia> convertlabel(LabelEnc.Indices{Int}, true_targets) # number of classes inferred
# 5-element Array{Int64,1}:
#  2
#  1
#  2
#  1
#  1

julia> convertlabel(LabelEnc.Indices(Int,2), true_targets)
# 5-element Array{Int64,1}:
#  2
#  1
#  2
#  1
#  1

julia> convertlabel(LabelEnc.OneOfK{Bool}, true_targets)
# 2×5 Array{Bool,2}:
#  false   true  false   true   true
#   true  false   true  false  false

Note that the OneOfK encoding is special in that as a matrix-based encoding it supports ObsDim, which can be used to specify which dimension of the array denotes the observations.

julia> convertlabel(LabelEnc.OneOfK{Int}, true_targets, obsdim = 1)
# 5×2 Array{Int64,2}:
#  0  1
#  1  0
#  0  1
#  1  0
#  1  0

We also provide a OneVsRest encoding, which allows to transform a multiclass problem into a binary one

julia> true_targets = [:yes,:no,:maybe,:yes];

julia> convertlabel(LabelEnc.OneVsRest(:yes), true_targets)
# 4-element Array{Symbol,1}:
#  :yes
#  :not_yes
#  :not_yes
#  :yes

julia> convertlabel(LabelEnc.TrueFalse, true_targets, LabelEnc.OneVsRest(:yes))
# 4-element Array{Bool,1}:
#   true
#  false
#  false
#   true

NativeLabels maps between data of an arbitary type (e.g. Strings) and the other label types (Normally LabelEnc.Indices{Int} for an integer index). When using it, you should always save the encoding in a variable, and pass it as an argument to convertlabel; as otherwise the encoding will be inferred each time, so will normally encode differently for different inputs.

julia> enc = LabelEnc.NativeLabels(["copper", "tin", "gold"])
# MLLabelUtils.LabelEnc.NativeLabels{String,3}(...)

julia> convertlabel(LabelEnc.Indices, ["gold", "copper"], enc)
# 2-element Array{Int64,1}:
#  3
#  1

Encodings such as ZeroOne, MarginBased, and OneOfK also provide a classify function.

ZeroOne has a threshold parameter which represents the decision boundary.

julia> classify(0.3, 0.5)
# 0.0

julia> classify(0.3, LabelEnc.ZeroOne) # equivalent to before
# 0.0

julia> classify(0.3, LabelEnc.ZeroOne(0.2)) # custom threshold
# 1.0

julia> classify(0.3, LabelEnc.ZeroOne(Int,0.2)) # custom type
# 1

julia> classify.([0.3,0.5], LabelEnc.ZeroOne(Int,0.4)) # broadcast support
# 2-element Array{Int64,1}:
#  0
#  1

MarginBased uses the sign to determine the class.

julia> classify(-5, LabelEnc.MarginBased)
# -1

julia> classify(0.2, LabelEnc.MarginBased)
# 1.0

julia> classify(-5, LabelEnc.MarginBased(Float64))
# -1.0

julia> classify.([-5,5], LabelEnc.MarginBased(Float64))
# 2-element Array{Float64,1}:
#  -1.0
#   1.0

OneOfK determines which index is the largest element.

julia> pred_output = [0.1 0.4 0.3 0.2; 0.8 0.3 0.6 0.2; 0.1 0.3 0.1 0.6]
# 3×4 Array{Float64,2}:
#  0.1  0.4  0.3  0.2
#  0.8  0.3  0.6  0.2
#  0.1  0.3  0.1  0.6

julia> classify(pred_output, LabelEnc.OneOfK)
# 4-element Array{Int64,1}:
#  2
#  1
#  2
#  3

julia> classify(pred_output', LabelEnc.OneOfK, obsdim = 1) # note the transpose
# 4-element Array{Int64,1}:
#  2
#  1
#  2
#  3

julia> classify([0.1,0.2,0.6,0.1], LabelEnc.OneOfK) # single observation
# 3

Documentation

For a much more detailed treatment check out the latest documentation

Additionally, you can make use of Julia's native docsystem. The following example shows how to get additional information on convertlabel within Julia's REPL:

?convertlabel

Installation

This package is registered in METADATA.jl and can be installed as usual. Just start up Julia and type the following code-snipped into the REPL. It makes use of the native Julia package manger.

Pkg.add("MLLabelUtils")

Additionally, for example if you encounter any sudden issues, or in the case you would like to contribute to the package, you can manually choose to be on the latest (untagged) version.

Pkg.checkout("MLLabelUtils")

License

This code is free to use under the terms of the MIT license

More Repositories

1

MLDatasets.jl

Utility package for accessing common Machine Learning datasets in Julia
Julia
218
star
2

Reinforce.jl

Abstractions, algorithms, and utilities for reinforcement learning in Julia
Julia
201
star
3

LossFunctions.jl

Julia package of loss functions for machine learning.
Julia
146
star
4

OpenAIGym.jl

OpenAI's Gym binding for Julia
Julia
104
star
5

MLDataUtils.jl

Utility package for generating, loading, splitting, and processing Machine Learning datasets
Julia
101
star
6

TableTransforms.jl

Transforms and pipelines with tabular data in Julia
Julia
100
star
7

MLUtils.jl

Utilities and abstractions for Machine Learning tasks
Julia
99
star
8

OpenAI.jl

OpenAI API wrapper for Julia
Julia
86
star
9

LIBSVM.jl

LIBSVM bindings for Julia
Julia
85
star
10

MLDataPattern.jl

Utility package for subsetting, resampling, iteration, and partitioning of various types of data sets in Machine Learning
Julia
60
star
11

Learn.jl

JuliaML bundled in a convenient all-in-one toolkit.
Julia
57
star
12

AtariAlgos.jl

Arcade Learning Environment (ALE) wrapped as a Reinforce.jl environment
Julia
40
star
13

Transformations.jl

Static transforms, activation functions, and other implementations of LearnBase abstractions
Julia
31
star
14

StochasticOptimization.jl

Implementations of stochastic optimization algorithms and solvers
Julia
30
star
15

DensityRatioEstimation.jl

Density ratio estimation in Julia
Julia
30
star
16

ValueHistories.jl

Utilities to efficiently track learning curves or other optimization information
Julia
29
star
17

LearningStrategies.jl

A generic and modular framework for building custom iterative algorithms in Julia
Julia
28
star
18

PenaltyFunctions.jl

Julia package of regularization functions for machine learning
Julia
26
star
19

MLMetrics.jl

Metrics for scoring machine learning models in Julia
Julia
25
star
20

MLPlots.jl

Plotting recipes for statistics and machine learning using Plots.jl
Julia
24
star
21

TableDistances.jl

Distances between heterogeneous tabular data
Julia
23
star
22

LearnBase.jl

Abstractions for Julia Machine Learning Packages
Julia
17
star
23

MLPreprocessing.jl

Julia
15
star
24

LIBLINEAR.jl

LIBLINEAR bindings for Julia
Julia
11
star
25

META

Discussions related to the future of Machine Learning in Julia
10
star
26

OpenAIGymAPI.jl

A Julia package providing access to the OpenAI Gym API
Julia
10
star
27

ContinuousOptimization.jl

A playground for implementations of unconstrained continuous full-batch optimization algorithms
Julia
8
star
28

TransformsBase.jl

Base package for general data transformations in Julia
Julia
7
star
29

DataScienceTraits.jl

Traits for data science
Julia
6
star
30

ObjectiveFunctions.jl

Generic definitions of objective functions using abstractions from LearnBase
Julia
5
star
31

JuliaML.github.io

The home page of the JuliaML organization
JavaScript
5
star
32

StatsLearnModels.jl

Statistical learning models for tabular data
Julia
4
star
33

Prox.jl

Bank of proximal operators to support proximal optimization algorithms
Julia
3
star
34

RankAggregation.jl

Rank aggregation in Julia
Julia
3
star
35

FileStorage

Storage for images and other binary files used throughout our documentation
Julia
2
star
36

TransformsAPI.jl

Julia API for general data transformations
Julia
2
star
37

ColumnSelectors.jl

Column selectors for tables
Julia
2
star