• Stars
    star
    550
  • Rank 80,860 (Top 2 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory

Gitlab CI status Azure Pipelines status Latest version on PyPI

License

Conda package for SKLL Supported python versions for SKLL DOI for citing SKLL 1.0.0

This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. One of the primary goals of our project is to make it so that you can run scikit-learn experiments without actually needing to write any code other than what you used to generate/extract the features.

Installation

You can install using either pip or conda. See details here.

Requirements

Command-line Interface

The main utility we provide is called run_experiment and it can be used to easily run a series of learners on datasets specified in a configuration file like:

[General]
experiment_name = Titanic_Evaluate_Tuned
# valid tasks: cross_validate, evaluate, predict, train
task = evaluate

[Input]
# these directories could also be absolute paths
# (and must be if you're not running things in local mode)
train_directory = train
test_directory = dev
# Can specify multiple sets of feature files that are merged together automatically
featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]]
# List of scikit-learn learners to use
learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
# Column in CSV containing labels to predict
label_col = Survived
# Column in CSV containing instance IDs (if any)
id_col = PassengerId

[Tuning]
# Should we tune parameters of all learners by searching provided parameter grids?
grid_search = true
# Function to maximize when performing grid search
objectives = ['accuracy']

[Output]
# Also compute the area under the ROC curve as an additional metric
metrics = ['roc_auc']
# The following can also be absolute paths
logs = output
results = output
predictions = output
probability = true
models = output

For more information about getting started with run_experiment, please check out our tutorial, or our config file specs.

You can also follow this interactive Jupyter tutorial.

We also provide utilities for:

Python API

If you just want to avoid writing a lot of boilerplate learning code, you can also use our simple Python API which also supports pandas DataFrames. The main way you'll want to use the API is through the Learner and Reader classes. For more details on our API, see the documentation.

While our API can be broadly useful, it should be noted that the command-line utilities are intended as the primary way of using SKLL. The API is just a nice side-effect of our developing the utilities.

A Note on Pronunciation

SKLL logo

doc/spacer.png

SciKit-Learn Laboratory (SKLL) is pronounced "skull": that's where the learning happens.

Talks

  • Simpler Machine Learning with SKLL 1.0, Dan Blanchard, PyData NYC 2014 (video | slides)
  • Simpler Machine Learning with SKLL, Dan Blanchard, PyData NYC 2013 (video | slides)

Citing

If you are using SKLL in your work, you can cite it as follows: "We used scikit-learn (Pedragosa et al, 2011) via the SKLL toolkit (https://github.com/EducationalTestingService/skll)."

Books

SKLL is featured in Data Science at the Command Line by Jeroen Janssens.

Changelog

See GitHub releases.

Contribute

Thank you for your interest in contributing to SKLL! See CONTRIBUTING.md for instructions on how to get started.

More Repositories

1

factor_analyzer

A Python module to perform exploratory & confirmatory factor analyses.
Python
229
star
2

rstfinder

Fast Discourse Parser to find latent Rhetorical STructure (RST) in text.
Python
120
star
3

metaphor

Metaphor classification for verbs and content words
67
star
4

rsmtool

A Python package to facilitate research on building and evaluating automated scoring models.
Python
65
star
5

python-zpar

A python wrapper around the ZPar parser for English.
Python
48
star
6

gitlab-to-atlassian

Scripts to help export information from GitLab to Atlassian JIRA and Stash.
Python
36
star
7

gug-data

A dataset of sentences with ordinal labels for grammaticality
27
star
8

CATS

Coherence-Aware Text Segmentation tool, used to perform text segmentation.
Python
26
star
9

TOEFL-Spell

Corpus of Annotations for Misspelings
24
star
10

match

Match tokenized words and phrases within the original, untokenized, often messy, text.
Python
20
star
11

sarcasm

shared tasks and research related to sarcasm detection
20
star
12

VAMP

Visualization and Analysis for Multimodal Presentation
Python
10
star
13

Confero

Eye-tracking, Screen and Event Capturing System for Windows. A web application running on a separate PC allows for real time monitoring of the users actions.
Python
9
star
14

ies-writing-achievement-study-data

Data from an IES research study that explores the relationship between writing achievement and success at 4-year postsecondary institutions.
8
star
15

node-zpar

A node package that allows using the ZPar English parser with node.js
JavaScript
7
star
16

simpledep

A simple example shift-reduce parser based on a perl version from Kenji Sagae
Python
5
star
17

a11yBookMarklets

HTML
5
star
18

Person_fit_analysis

R code to accompany an article published in Applied Measurement in Education
R
3
star
19

aes-book-hands-on

Supporting files for hands-on exercises from the book "Automated Essay Scoring" by Beata Beigman Klebanov & Nitin Madnani.
3
star
20

MIRT

A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm.
Fortran
3
star
21

argument-component-essays

Repository based on the analysis of argument components from student essays
2
star
22

ScoreDiff

R Software for Score Differencing
R
2
star
23

workingmemory

This program tests one's working memory capacity for sequences of numbers or letters. The items in the sequence are displayed over time, which one is asked to recall in order.
JavaScript
2
star
24

graphead

HTML5 Graphing Editor
JavaScript
1
star
25

cpd

Algorithms for Conditioned Positive Definite Matrix Under Constraints
Jupyter Notebook
1
star
26

prmse-simulations

Simulations for the PRMSE automated scoring metric.
Jupyter Notebook
1
star
27

rsmtool-conda-tester

Automatically test RSMTool conda packages on Linux and Windows.
PowerShell
1
star
28

nn-compound-sentiment

Sentiment Lexicon for Noun Noun Compounds Generated via Crowdsourcing.
1
star
29

LEAF

LEAF: Language Learners’ English Essays and Feedback Corpus
1
star