• Stars
    star
    656
  • Rank 68,675 (Top 2 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created almost 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Clojure high performance data processing system

tech.ml.dataset

TMD Logo

tech.ml.dataset (TMD) is a Clojure library for tabular data processing similar to Python's Pandas, or R's data.table. It supports pragmatic data-intensive work on the JVM by providing powerful abstractions that simplify implementing efficient solutions to real problems. Datasets shrink in memory through columnar storage and the use of primitive arrays, packed datetime types, and string tables.

Unlike in Python or R, TMD datasets are functional, which means they're easier to reason about.

Installing

Installation instructions for your favorite build system (lein, deps.edn, etc...) can be found at Clojars, where the library is hosted:

Clojars Project

Verifying Installation

user> (require 'tech.v3.dataset)
nil
user> (->> (System/getProperties)
           (map (fn [[k v]] {:k k :v (apply str (take 40 (str v)))}))
           (tech.v3.dataset/->>dataset {:dataset-name "My Truncated System Properties"}))

My Truncated System Properties [53 2]:

|                         :k |                                       :v |
|----------------------------|------------------------------------------|
|                sun.desktop |                                    gnome |
|                awt.toolkit |                     sun.awt.X11.XToolkit |
| java.specification.version |                                       11 |
|            sun.cpu.isalist |                                          |
|           sun.jnu.encoding |                                    UTF-8 |
|            java.class.path | src:resources:target/classes:/home/harol |
|             java.vm.vendor |                                   Ubuntu |
|        sun.arch.data.model |                                       64 |
|            java.vendor.url |                      https://ubuntu.com/ |
|              user.timezone |                           America/Denver |
|                        ... |                                      ... |
|                    os.arch |                                    amd64 |
| java.vm.specification.name |       Java Virtual Machine Specification |
|        java.awt.printerjob |                   sun.print.PSPrinterJob |
|         sun.os.patch.level |                                  unknown |
|          java.library.path | /usr/java/packages/lib:/usr/lib/x86_64-l |
|               java.vm.info |                      mixed mode, sharing |
|                java.vendor |                                   Ubuntu |
|            java.vm.version |      11.0.17+8-post-Ubuntu-1ubuntu222.04 |
|    sun.io.unicode.encoding |                            UnicodeLittle |
|        apple.awt.UIElement |                                     true |
|         java.class.version |                                     55.0 |

📚 Documentation 📚

The best place to start is the "Getting Started" topic in the documentation: https://techascent.github.io/tech.ml.dataset/000-getting-started.html

The "Walkthrough" topic provides long-form examples of processing real data: https://techascent.github.io/tech.ml.dataset/100-walkthrough.html

The "Quick Reference" topic summarizes many of the most frequently used functions: https://techascent.github.io/tech.ml.dataset/200-quick-reference.html

The API docs document every available function: https://techascent.github.io/tech.ml.dataset/

The provided Java API (javadoc / with frames) and sample program (source) show how to use TMD from Java.

Questions / Community


Related Projects and Notes

License

Copyright © 2023 Complements of TechAscent, LLC

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

More Repositories

1

tech.ml

This library has been superceded by https://github.com/scicloj/scicloj.ml.
Clojure
96
star
2

tvm-clj

clojure tvm bindings and exploration
Clojure
95
star
3

tech.datatype

Efficient numerics for the jvm
Clojure
84
star
4

tmducken

tech.ml.dataset integration with duckdb
Clojure
54
star
5

tech.jna

java native access bindings
Clojure
40
star
6

tech.resource

RAII resource management system
Clojure
31
star
7

tech.viz

A Clojure library for visualizing data.
Clojure
31
star
8

p2p-chat

A Peer-to-Peer chat using re-frame, reagent and libp2p
Clojure
30
star
9

tech.opencv

Opencv bindings via the tech.datatype library and javacpp
Clojure
19
star
10

tech.ml.dataset.sql

SQL bindings for tech.ml.dataset
Clojure
19
star
11

tech.compute

Compute abstraction for clojure
Clojure
14
star
12

tech.io

Generalized IO interface that uses urls and makes doing rapid research easier
Clojure
14
star
13

tech.parquet

Simple parquet bindings for tech.ml.dataset.
Clojure
10
star
14

tech.queue

Simple queuing abstraction
Clojure
7
star
15

tech.parallel

Clojure
5
star
16

tech.javacpp-datatype

javacpp bindings for the datatype library
Clojure
3
star
17

tech.python

Python support for the techascent ecosystem
Clojure
3
star
18

tech.io.aws

Bindings to s3 for the tech.io system.
Java
2
star
19

tech.mxnet

MXNet integration of the tech compute ecosystem
Clojure
2
star
20

tech.neanderthal

Neanderthal bindings for the techascent ecosystem.
Clojure
2
star
21

tech.netcdf

netcdf bindings into the techascent ecosystem
Clojure
2
star
22

tech.hello

A "Hello, World!" webapp from the TechAscent ecosystem.
Shell
1
star
23

tech.lentils

Support for Intal DAAL for the techascent ecosystem
Clojure
1
star
24

java-data-science-getting-started

Java Data Science: Getting Started
Java
1
star
25

tech.config

Clojure
1
star