• Stars
    star
    869
  • Rank 50,336 (Top 2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created about 9 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Expressive types for Spark.

Frameless

Workflow Badge Codecov Badge Discord Badge Maven Badge Snapshots Badge

Frameless is a Scala library for working with Spark using more expressive types. It consists of the following modules:

  • frameless-dataset for a more strongly typed Dataset/DataFrame API
  • frameless-ml for a more strongly typed Spark ML API based on frameless-dataset
  • frameless-cats for using Spark's RDD API with cats

Note that while Frameless is still getting off the ground, it is very possible that breaking changes will be made for at least the next few versions.

The Frameless project and contributors support the Typelevel Code of Conduct and want all its associated channels (e.g. GitHub, Discord) to be a safe and friendly environment for contributing and learning.

Versions and dependencies

The compatible versions of Spark and cats are as follows:

Frameless Spark Cats Cats-Effect Scala
0.14.1 3.4.0 / 3.3.0 / 3.2.2 2.x 3.x 2.12 / 2.13
0.14.0 3.3.0 / 3.2.2 / 3.1.3 2.x 3.x 2.12 / 2.13
0.13.0 3.3.0 / 3.2.2 / 3.1.3 2.x 3.x 2.12 / 2.13
0.12.0 3.2.1 / 3.1.3 / 3.0.3 2.x 3.x 2.12 / 2.13
0.11.1 3.2.0 / 3.1.2 / 3.0.1 2.x 2.x 2.12 / 2.13
0.11.0* 3.2.0 / 3.1.2 / 3.0.1 2.x 2.x 2.12 / 2.13
0.10.1 3.1.0 2.x 2.x 2.12
0.9.0 3.0.0 1.x 1.x 2.12
0.8.0 2.4.0 1.x 1.x 2.11 / 2.12
0.7.0 2.3.1 1.x 1.x 2.11
0.6.1 2.3.0 1.x 0.8 2.11
0.5.2 2.2.1 1.x 0.8 2.11
0.4.1 2.2.0 1.x 0.8 2.11
0.4.0 2.2.0 1.0.0-IF 0.4 2.11

* 0.11.0 has broken Spark 3.1.2 and 3.0.1 artifacts published.

Starting 0.11 we introduced Spark cross published artifacts:

  • By default, frameless artifacts depend on the most recent Spark version
  • Suffix -spark{major}{minor} is added to artifacts that are released for the previous Spark version(s)

Artifact names examples:

  • frameless-dataset (the latest Spark dependency)
  • frameless-dataset-spark33 (Spark 3.3.x dependency)
  • frameless-dataset-spark32 (Spark 3.2.x dependency)

Versions 0.5.x and 0.6.x have identical features. The first is compatible with Spark 2.2.1 and the second with 2.3.0.

The only dependency of the frameless-dataset module is on shapeless 2.3.2. Therefore, depending on frameless-dataset, has a minimal overhead on your Spark's application jar. Only the frameless-cats module depends on cats and cats-effect, so if you prefer to work just with Datasets and not with RDDs, you may choose not to depend on frameless-cats.

Frameless intentionally does not have a compile dependency on Spark. This essentially allows you to use any version of Frameless with any version of Spark. The aforementioned table simply provides the versions of Spark we officially compile and test Frameless with, but other versions may probably work as well.

Breaking changes in 0.9

  • Spark 3 introduces a new ExpressionEncoder approach, the schema for single value DataFrame's is now "value" not "_1".

Why?

Frameless introduces a new Spark API, called TypedDataset. The benefits of using TypedDataset compared to the standard Spark Dataset API are as follows:

  • Typesafe columns referencing (e.g., no more runtime errors when accessing non-existing columns)
  • Customizable, typesafe encoders (e.g., if a type does not have an encoder, it should not compile)
  • Enhanced type signature for built-in functions (e.g., if you apply an arithmetic operation on a non-numeric column, you get a compilation error)
  • Typesafe casting and projections

Click here for a detailed comparison of TypedDataset with Spark's Dataset API.

Documentation

Quick Start

Since the 0.9.x release, Frameless is compiled only against Scala 2.12.x.

To use Frameless in your project add the following in your build.sbt file as needed:

val framelessVersion = "<latest version>"

resolvers ++= Seq(
  // for snapshot artifacts only
  "s01-oss-sonatype" at "https://s01.oss.sonatype.org/content/repositories/snapshots"
)

libraryDependencies ++= List(
  "org.typelevel" %% "frameless-dataset" % framelessVersion,
  "org.typelevel" %% "frameless-ml"      % framelessVersion,
  "org.typelevel" %% "frameless-cats"    % framelessVersion
)

An easy way to bootstrap a Frameless sbt project:

  • if you have Giter8 installed then simply:
g8 imarios/frameless.g8
  • with sbt >= 0.13.13:
sbt new imarios/frameless.g8

Typing sbt console inside your project will bring up a shell with Frameless and all its dependencies loaded (including Spark).

Need help?

Feel free to messages us on our discord channel for any issues/questions.

Development

We require at least one sign-off (thumbs-up, +1, or similar) to merge pull requests. The current maintainers (people who can merge pull requests) are:

Testing

Frameless contains several property tests. To avoid OutOfMemoryErrors, we tune the default generator sizes. The following environment variables may be set to adjust the size of generated collections in the TypedDataSet suite:

Property Default
FRAMELESS_GEN_MIN_SIZE 0
FRAMELESS_GEN_SIZE_RANGE 20

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.

More Repositories

1

cats

Lightweight, modular, and extensible library for functional programming.
Scala
5,120
star
2

fs2

Compositional, streaming I/O library for Scala
Scala
2,319
star
3

scalacheck

Property-based testing for Scala
Scala
1,908
star
4

cats-effect

The pure asynchronous runtime for Scala
Scala
1,817
star
5

spire

Powerful new number types and numeric abstractions for Scala.
Scala
1,753
star
6

skunk

A data access library for Scala + Postgres.
Scala
1,545
star
7

simulacrum

First class syntax support for type classes in Scala
Scala
936
star
8

squants

The Scala API for Quantities, Units of Measure and Dimensional Analysis
Scala
910
star
9

kind-projector

Compiler plugin for making type lambdas (type projections) easier to write
Scala
906
star
10

cats-collections

Data structures for pure functional programming in Scala
Scala
557
star
11

kittens

Automatic type class derivation for Cats
Scala
522
star
12

jawn

Jawn is for parsing jay-sawn (JSON)
Scala
431
star
13

log4cats

Logging Tools For Interaction with cats-effect
Scala
390
star
14

Laika

Site and E-book Generator and Customizable Text Markup Transformer for sbt, Scala and Scala.js
Scala
387
star
15

algebra

Experimental project to lay out basic algebra type classes
Scala
379
star
16

mouse

A small companion to cats
Scala
347
star
17

sbt-tpolecat

scalac options for the enlightened
Scala
328
star
18

discipline

Flexible law checking for Scala
Scala
322
star
19

natchez

functional tracing for cats
Scala
317
star
20

cats-mtl

cats transformer type classes.
Scala
304
star
21

cats-tagless

Library of utilities for tagless final encoded algebras
Scala
301
star
22

CT_from_Programmers.scala

Scala sample code for Bartosz Milewski's CT for Programmers
Scala
279
star
23

fs2-grpc

gRPC implementation for FS2/cats-effect
Scala
258
star
24

cats-parse

A parsing library for the cats ecosystem
Scala
224
star
25

machinist

Spire's macros for zero-cost operator enrichment
Scala
191
star
26

cats-effect-testing

Integration between cats-effect and test frameworks
Scala
184
star
27

paiges

an implementation of Wadler's a prettier printer
Scala
183
star
28

shapeless-3

Generic programming for Scala
Scala
168
star
29

grackle

Grackle: Functional GraphQL for the Typelevel stack
Scala
163
star
30

sbt-typelevel

Let sbt work for you.
Scala
151
star
31

feral

Feral cats are homeless, feral functions are serverless
Scala
144
star
32

munit-cats-effect

Integration library for MUnit & cats-effect
Scala
142
star
33

catbird

Birds and cats together
Scala
140
star
34

otel4s

An OpenTelemetry library for Scala based on Cats-Effect
Scala
138
star
35

fs2-chat

Sample project demonstrating use of fs2-io to build a chat client and server
Scala
123
star
36

spotted-leopards

Proof of concept for a cats-like library built using Dotty features
Scala
112
star
37

fabric

Object-Notation Abstraction for JSON, binary, HOCON, etc.
Scala
110
star
38

literally

Compile time validation of literal values built from strings
Scala
102
star
39

toolkit

Quickstart your next app with the Typelevel Toolkit!
Scala
92
star
40

cats-time

Cats Instances for Java Time
Scala
91
star
41

typelevel-nix

Development tools for Typelevel projects
Nix
87
star
42

vault

Type-safe, persistent storage for values of arbitrary types
Scala
81
star
43

shapeless-contrib

Interoperability libraries for Shapeless
Scala
79
star
44

cats-effect-cps

An incubator project for async/await syntax support for Cats Effect
Scala
78
star
45

scalacheck-effect

Effectful property testing built on ScalaCheck
Scala
76
star
46

coop

Cooperative multithreading as a pure monad transformer
Scala
68
star
47

claimant

Library to support automatic labeling of ScalaCheck properties.
Scala
68
star
48

typeclassic

Everything you need to make type classes first class.
Scala
61
star
49

scalaz-contrib

Interoperability libraries & additional data structures and instances for Scalaz
Scala
55
star
50

twiddles

Micro-library for building effectful protocols
Scala
55
star
51

monoids

Generic Monoids for Scala
Scala
51
star
52

fs2-netty

What it says on the tin!
Scala
47
star
53

sbt-catalysts

sbt utilities for open source projects
Scala
45
star
54

natchez-http4s

Glorious integration layer for Natchez and Http4s.
Scala
44
star
55

typelevel.github.com

Web site of typelevel.scala
HTML
38
star
56

jawn-fs2

Integration between jawn and fs2
Scala
36
star
57

keypool

A Keyed Pool Implementation for Scala
Scala
34
star
58

scalaz-specs2

Specs2 bindings for Scalaz
Scala
34
star
59

catalysts

Scala
34
star
60

case-insensitive

A case-insensitive string for Scala
Scala
34
star
61

simulacrum-scalafix

Simulacrum as Scalafix rules
Scala
33
star
62

scalaz-outlaws

outcasts no longer allowed in the ivory tower
Scala
28
star
63

scalac-options

A library for configuring scalac options
Scala
27
star
64

bobcats

Typelevel's very own CryptoKitties!
Scala
27
star
65

ce3.g8

Scala
24
star
66

scalaz-scalatest

Scalatest bindings for scalaz.
Scala
23
star
67

general

Repository for general Typelevel information, activity and issues
19
star
68

discipline-munit

MUnit binding for Typelevel Discipline
Scala
18
star
69

unique

Unique Functional Values for Scala
Scala
18
star
70

cats-testkit-scalatest

Cats Testkit for Scalatest
Scala
18
star
71

discipline-scalatest

ScalaTest binding for Discipline
Scala
17
star
72

typelevel-scalafix

Scalafix rules for Typelevel projects
Scala
17
star
73

semigroups

Scala
16
star
74

cats-effect-shell

Command line debugging console for Cats Effect
Scala
15
star
75

jdk-index

A Jabba compatible index of JDK versions
Scala
14
star
76

cats-uri

URI implementation based on cats-parse with cats instances
Scala
14
star
77

typelevel.g8

A typelevel.g8 based on sbt-typelevel
Scala
14
star
78

catapult

Scala
13
star
79

weaver-test

A test framework that runs everything in parallel.
Scala
11
star
80

discipline-specs2

Specs2 Integration for Discipline
Scala
8
star
81

governance

Typelevel governance
Scala
7
star
82

catz-cradle

Testbed for scala libraries and tools, based on examples from cats docs
Scala
7
star
83

spire-contrib

Interoperability libraries for spire
Shell
7
star
84

idna4s

Cross-platform Scala implementation of Internationalized Domain Names in Applications
Scala
6
star
85

scalac-compat

Lightweight tools for tackling Scalac version incompatibilities
Scala
6
star
86

steward

Runs Scala Steward for Typelevel projects
5
star
87

cats-effect-main

3
star
88

sacagawea

Common infrastructure for tracing functional effects
Scala
3
star
89

scalacheck-xml

Scalacheck instances for scala-xml
Scala
3
star
90

sorcery

WIP
2
star
91

scalacheck-web

ScalaCheck Web Site
Nix
2
star
92

sbt-catalysts.g8

Scala
2
star
93

feral.g8

Giter8 template for feral serverless
Scala
2
star
94

download-java

2
star
95

toolkit.g8

A Giter8 template for Typelevel Toolkit!
Scala
2
star
96

sbt-tls-crossproject

sbt-crossproject plugin for Typelevel Scala
Scala
1
star
97

catalysts-docker

Shell
1
star
98

await-cirrus

Depend on Cirrus CI from a GitHub Actions workflow
JavaScript
1
star
99

.github

a โœจspecial โœจ repository for project defaults and organization readme
1
star