• Stars
    star
    315
  • Rank 132,951 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.

gokart

Test Python Versions

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.

Documentation for the latest release is hosted on readthedocs.

About gokart

Here are some good things about gokart.

  • The following meta data for each Task is stored separately in a pkl file with hash value
    • task output data
    • imported all module versions
    • task processing time
    • random seed in task
    • displayed log
    • all parameters set as class variables in the task
  • Automatically rerun the pipeline if parameters of Tasks are changed.
  • Support GCS and S3 as a data store for intermediate results of Tasks in the pipeline.
  • The above output is exchanged between tasks as an intermediate file, which is memory-friendly
  • pandas.DataFrame type and column checking during I/O
  • Directory structure of saved files is automatically determined from structure of script
  • Seeds for numpy and random are automatically fixed
  • Can code while adhering to SOLID principles as much as possible
  • Tasks are locked via redis even if they run in parallel

All the functions above are created for constructing Machine Learning batches. Provides an excellent environment for reproducibility and team development.

Here are some non-goal / downside of the gokart.

  • Batch execution in parallel is supported, but parallel and concurrent execution of task in memory.
  • Gokart is focused on reproducibility. So, I/O and capacity of data storage can become a bottleneck.
  • No support for task visualize.
  • Gokart is not an experiment management tool. The management of the execution result is cut out as Thunderbolt.
  • Gokart does not recommend writing pipelines in toml, yaml, json, and more. Gokart is preferring to write them in Python.

Getting Started

Within the activated Python environment, use the following command to install gokart.

pip install gokart

Quickstart

A minimal gokart tasks looks something like this:

import gokart

class Example(gokart.TaskOnKart):
    def run(self):
        self.dump('Hello, world!')

task = Example()
output = gokart.build(task)
print(output)

gokart.build return the result of dump by gokart.TaskOnKart. The example will output the following.

Hello, world!

This is an introduction to some of the gokart. There are still more useful features.

Please See Documentation .

Have a good gokart life.

Achievements

Gokart is a proven product.

Thanks

gokart is a wrapper for luigi. Thanks to luigi and dependent projects!

More Repositories

1

octoparts

Octoparts, the backend services aggregator
Scala
151
star
2

pptx-template

Build PowerPoint presentation from template(pptx) and model(json) data like other template engines
Python
91
star
3

redshells

Machine learning tasks which are used with data pipeline library "luigi" and its wrapper "gokart".
Python
43
star
4

curly

A pretty simple HTTP client as handy as `curl` command
Java
42
star
5

thunderbolt

gokart file manager
Python
24
star
6

kannon

Kannon is a wrapper for the gokart library that allows gokart tasks to be easily executed in a distributed and parallel manner on multiple kubernetes jobs.
Python
22
star
7

m3-terraform-modules

Terraform boilerplate modules
HCL
13
star
8

scalaflavor4j

Scala flavored useful API in Java
Java
11
star
9

global-session-filter

Sessions across servers enabled by just applying a ServletFilter
Java
10
star
10

techbook-templete

m3 techbook templete
TeX
9
star
11

oss-mokumoku

9
star
12

graphql-apollo-sample

JavaScript
8
star
13

active_record-postgresql_analyzer

Analyze the execution plan and write log when finding sequential scan.
Ruby
8
star
14

m3dev.github.com

GitHub Pages by M3, Inc.
7
star
15

cookiecutter-gokart

cookiecutter for gokart
Python
7
star
16

broom

broom is a Kubernetes Custom Controller designed to gracefully handle OOM events in CronJobs by dynamically increasing memory limits
Go
7
star
17

play2-sentry

sentry logging library for play2
Scala
4
star
18

multilane

Multi-lane Expressway to Aggregate Values in Parallel
Java
4
star
19

m3-tracing

Wrapper of Distributed Tracing SDK
Kotlin
4
star
20

method-cache-interceptor

Method Result Cache Interceptor for Java Applications
3
star
21

openid4java-memcached-association-store

OpenID4Java MemcachedAssociationStore implementation
Java
3
star
22

graphql-spring-boot-kotlin-sample

Kotlin
3
star
23

m3commons-s2

DEPRECATED: Our memories with Seasar2, M3 Advent Calendar 2015 Day 2
Java
3
star
24

promisedcache

A caching library with sff4s.Future
Scala
2
star
25

Zipkin-Interrogator

Helps you find critical bottlenecks in a sea of Zipkin traces and spans
Scala
2
star
26

memcached-client-facade

Pluggable Wrapper API for Memcached Clients
Java
2
star
27

fluent-plugin-http-list

A plugin to accept a JSON list of events via HTTP Post.
Ruby
2
star
28

octoparts-site

Octoparts site
HTML
2
star
29

spring-boot-mybatis-multiple-datasource

Kotlin
1
star
30

gitlab-access-controlled-server

This is a web server for Gitlab pages or Gitlab review apps.
Python
1
star
31

go-bullet

1
star
32

redmine_better_mail

HTML
1
star