• Stars
    star
    879
  • Rank 51,943 (Top 2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated 20 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Expressive types for Spark.

Frameless

Workflow Badge Codecov Badge Discord Badge Maven Badge Snapshots Badge

Frameless is a Scala library for working with Spark using more expressive types. It consists of the following modules:

  • frameless-dataset for a more strongly typed Dataset/DataFrame API
  • frameless-ml for a more strongly typed Spark ML API based on frameless-dataset
  • frameless-cats for using Spark's RDD API with cats

Note that while Frameless is still getting off the ground, it is very possible that breaking changes will be made for at least the next few versions.

The Frameless project and contributors support the Typelevel Code of Conduct and want all its associated channels (e.g. GitHub, Discord) to be a safe and friendly environment for contributing and learning.

Versions and dependencies

The compatible versions of Spark and cats are as follows:

Frameless Spark Cats Cats-Effect Scala
0.14.1 3.4.0 / 3.3.0 / 3.2.2 2.x 3.x 2.12 / 2.13
0.14.0 3.3.0 / 3.2.2 / 3.1.3 2.x 3.x 2.12 / 2.13
0.13.0 3.3.0 / 3.2.2 / 3.1.3 2.x 3.x 2.12 / 2.13
0.12.0 3.2.1 / 3.1.3 / 3.0.3 2.x 3.x 2.12 / 2.13
0.11.1 3.2.0 / 3.1.2 / 3.0.1 2.x 2.x 2.12 / 2.13
0.11.0* 3.2.0 / 3.1.2 / 3.0.1 2.x 2.x 2.12 / 2.13
0.10.1 3.1.0 2.x 2.x 2.12
0.9.0 3.0.0 1.x 1.x 2.12
0.8.0 2.4.0 1.x 1.x 2.11 / 2.12
0.7.0 2.3.1 1.x 1.x 2.11
0.6.1 2.3.0 1.x 0.8 2.11
0.5.2 2.2.1 1.x 0.8 2.11
0.4.1 2.2.0 1.x 0.8 2.11
0.4.0 2.2.0 1.0.0-IF 0.4 2.11

* 0.11.0 has broken Spark 3.1.2 and 3.0.1 artifacts published.

Starting 0.11 we introduced Spark cross published artifacts:

  • By default, frameless artifacts depend on the most recent Spark version
  • Suffix -spark{major}{minor} is added to artifacts that are released for the previous Spark version(s)

Artifact names examples:

  • frameless-dataset (the latest Spark dependency)
  • frameless-dataset-spark33 (Spark 3.3.x dependency)
  • frameless-dataset-spark32 (Spark 3.2.x dependency)

Versions 0.5.x and 0.6.x have identical features. The first is compatible with Spark 2.2.1 and the second with 2.3.0.

The only dependency of the frameless-dataset module is on shapeless 2.3.2. Therefore, depending on frameless-dataset, has a minimal overhead on your Spark's application jar. Only the frameless-cats module depends on cats and cats-effect, so if you prefer to work just with Datasets and not with RDDs, you may choose not to depend on frameless-cats.

Frameless intentionally does not have a compile dependency on Spark. This essentially allows you to use any version of Frameless with any version of Spark. The aforementioned table simply provides the versions of Spark we officially compile and test Frameless with, but other versions may probably work as well.

Breaking changes in 0.9

  • Spark 3 introduces a new ExpressionEncoder approach, the schema for single value DataFrame's is now "value" not "_1".

Why?

Frameless introduces a new Spark API, called TypedDataset. The benefits of using TypedDataset compared to the standard Spark Dataset API are as follows:

  • Typesafe columns referencing (e.g., no more runtime errors when accessing non-existing columns)
  • Customizable, typesafe encoders (e.g., if a type does not have an encoder, it should not compile)
  • Enhanced type signature for built-in functions (e.g., if you apply an arithmetic operation on a non-numeric column, you get a compilation error)
  • Typesafe casting and projections

Click here for a detailed comparison of TypedDataset with Spark's Dataset API.

Documentation

Quick Start

Since the 0.9.x release, Frameless is compiled only against Scala 2.12.x.

To use Frameless in your project add the following in your build.sbt file as needed:

val framelessVersion = "<latest version>"

resolvers ++= Seq(
  // for snapshot artifacts only
  "s01-oss-sonatype" at "https://s01.oss.sonatype.org/content/repositories/snapshots"
)

libraryDependencies ++= List(
  "org.typelevel" %% "frameless-dataset" % framelessVersion,
  "org.typelevel" %% "frameless-ml"      % framelessVersion,
  "org.typelevel" %% "frameless-cats"    % framelessVersion
)

An easy way to bootstrap a Frameless sbt project:

  • if you have Giter8 installed then simply:
g8 imarios/frameless.g8
  • with sbt >= 0.13.13:
sbt new imarios/frameless.g8

Typing sbt console inside your project will bring up a shell with Frameless and all its dependencies loaded (including Spark).

Need help?

Feel free to messages us on our discord channel for any issues/questions.

Development

We require at least one sign-off (thumbs-up, +1, or similar) to merge pull requests. The current maintainers (people who can merge pull requests) are:

Testing

Frameless contains several property tests. To avoid OutOfMemoryErrors, we tune the default generator sizes. The following environment variables may be set to adjust the size of generated collections in the TypedDataSet suite:

Property Default
FRAMELESS_GEN_MIN_SIZE 0
FRAMELESS_GEN_SIZE_RANGE 20

License

Code is provided under the Apache 2.0 license available at http://opensource.org/licenses/Apache-2.0, as well as in the LICENSE file. This is the same license used as Spark.

More Repositories

1

cats

Lightweight, modular, and extensible library for functional programming.
Scala
5,182
star
2

fs2

Compositional, streaming I/O library for Scala
Scala
2,359
star
3

doobie

Functional JDBC layer for Scala.
Scala
2,161
star
4

scalacheck

Property-based testing for Scala
Scala
1,908
star
5

cats-effect

The pure asynchronous runtime for Scala
Scala
1,817
star
6

spire

Powerful new number types and numeric abstractions for Scala.
Scala
1,761
star
7

skunk

A data access library for Scala + Postgres.
Scala
1,579
star
8

simulacrum

First class syntax support for type classes in Scala
Scala
937
star
9

squants

The Scala API for Quantities, Units of Measure and Dimensional Analysis
Scala
922
star
10

kind-projector

Compiler plugin for making type lambdas (type projections) easier to write
Scala
915
star
11

cats-collections

Data structures for pure functional programming in Scala
Scala
557
star
12

kittens

Automatic type class derivation for Cats
Scala
531
star
13

jawn

Jawn is for parsing jay-sawn (JSON)
Scala
431
star
14

log4cats

Logging Tools For Interaction with cats-effect
Scala
400
star
15

Laika

Site and E-book Generator and Customizable Text Markup Transformer for sbt, Scala and Scala.js
Scala
387
star
16

algebra

Experimental project to lay out basic algebra type classes
Scala
378
star
17

mouse

A small companion to cats
Scala
365
star
18

sbt-tpolecat

scalac options for the enlightened
Scala
328
star
19

discipline

Flexible law checking for Scala
Scala
328
star
20

natchez

functional tracing for cats
Scala
324
star
21

cats-tagless

Library of utilities for tagless final encoded algebras
Scala
314
star
22

cats-mtl

cats transformer type classes.
Scala
308
star
23

CT_from_Programmers.scala

Scala sample code for Bartosz Milewski's CT for Programmers
Scala
279
star
24

fs2-grpc

gRPC implementation for FS2/cats-effect
Scala
270
star
25

cats-parse

A parsing library for the cats ecosystem
Scala
233
star
26

machinist

Spire's macros for zero-cost operator enrichment
Scala
191
star
27

cats-effect-testing

Integration between cats-effect and test frameworks
Scala
191
star
28

shapeless-3

Generic programming for Scala
Scala
185
star
29

paiges

an implementation of Wadler's a prettier printer
Scala
183
star
30

grackle

Grackle: Functional GraphQL for the Typelevel stack
Scala
176
star
31

sbt-typelevel

Let sbt work for you.
Scala
170
star
32

munit-cats-effect

Integration library for MUnit & cats-effect
Scala
149
star
33

feral

Feral cats are homeless, feral functions are serverless
Scala
144
star
34

catbird

Birds and cats together
Scala
139
star
35

otel4s

An OpenTelemetry library for Scala based on Cats-Effect
Scala
138
star
36

fs2-chat

Sample project demonstrating use of fs2-io to build a chat client and server
Scala
123
star
37

spotted-leopards

Proof of concept for a cats-like library built using Dotty features
Scala
112
star
38

fabric

Object-Notation Abstraction for JSON, binary, HOCON, etc.
Scala
110
star
39

literally

Compile time validation of literal values built from strings
Scala
106
star
40

toolkit

Quickstart your next app with the Typelevel Toolkit!
Scala
94
star
41

cats-time

Cats Instances for Java Time
Scala
91
star
42

typelevel-nix

Development tools for Typelevel projects
Nix
87
star
43

cats-effect-cps

An incubator project for async/await syntax support for Cats Effect
Scala
81
star
44

vault

Type-safe, persistent storage for values of arbitrary types
Scala
81
star
45

shapeless-contrib

Interoperability libraries for Shapeless
Scala
79
star
46

scalacheck-effect

Effectful property testing built on ScalaCheck
Scala
76
star
47

coop

Cooperative multithreading as a pure monad transformer
Scala
68
star
48

claimant

Library to support automatic labeling of ScalaCheck properties.
Scala
68
star
49

typeclassic

Everything you need to make type classes first class.
Scala
61
star
50

scalaz-contrib

Interoperability libraries & additional data structures and instances for Scalaz
Scala
55
star
51

twiddles

Micro-library for building effectful protocols
Scala
55
star
52

monoids

Generic Monoids for Scala
Scala
51
star
53

fs2-netty

What it says on the tin!
Scala
47
star
54

sbt-catalysts

sbt utilities for open source projects
Scala
45
star
55

natchez-http4s

Glorious integration layer for Natchez and Http4s.
Scala
44
star
56

typelevel.github.com

Web site of typelevel.scala
HTML
40
star
57

jawn-fs2

Integration between jawn and fs2
Scala
38
star
58

keypool

A Keyed Pool Implementation for Scala
Scala
34
star
59

scalaz-specs2

Specs2 bindings for Scalaz
Scala
34
star
60

catalysts

Scala
34
star
61

simulacrum-scalafix

Simulacrum as Scalafix rules
Scala
34
star
62

case-insensitive

A case-insensitive string for Scala
Scala
34
star
63

scalaz-outlaws

outcasts no longer allowed in the ivory tower
Scala
28
star
64

bobcats

Typelevel's very own CryptoKitties!
Scala
28
star
65

scalac-options

A library for configuring scalac options
Scala
27
star
66

weaver-test

A test framework that runs everything in parallel.
Scala
27
star
67

ce3.g8

Scala
24
star
68

scalaz-scalatest

Scalatest bindings for scalaz.
Scala
23
star
69

general

Repository for general Typelevel information, activity and issues
19
star
70

discipline-munit

MUnit binding for Typelevel Discipline
Scala
18
star
71

cats-testkit-scalatest

Cats Testkit for Scalatest
Scala
18
star
72

unique

Unique Functional Values for Scala
Scala
17
star
73

discipline-scalatest

ScalaTest binding for Discipline
Scala
17
star
74

typelevel-scalafix

Scalafix rules for Typelevel projects
Scala
17
star
75

semigroups

Scala
16
star
76

cats-effect-shell

Command line debugging console for Cats Effect
Scala
15
star
77

jdk-index

A Jabba compatible index of JDK versions
Scala
14
star
78

cats-uri

URI implementation based on cats-parse with cats instances
Scala
14
star
79

typelevel.g8

A typelevel.g8 based on sbt-typelevel
Scala
14
star
80

catapult

Scala
13
star
81

discipline-specs2

Specs2 Integration for Discipline
Scala
9
star
82

governance

Typelevel governance
Scala
7
star
83

catz-cradle

Testbed for scala libraries and tools, based on examples from cats docs
Scala
7
star
84

spire-contrib

Interoperability libraries for spire
Shell
7
star
85

idna4s

Cross-platform Scala implementation of Internationalized Domain Names in Applications
Scala
7
star
86

scalac-compat

Lightweight tools for tackling Scalac version incompatibilities
Scala
6
star
87

steward

Runs Scala Steward for Typelevel projects
5
star
88

cats-effect-main

3
star
89

sacagawea

Common infrastructure for tracing functional effects
Scala
3
star
90

scalacheck-xml

Scalacheck instances for scala-xml
Scala
3
star
91

sorcery

WIP
2
star
92

scalacheck-web

ScalaCheck Web Site
Nix
2
star
93

sbt-catalysts.g8

Scala
2
star
94

feral.g8

Giter8 template for feral serverless
Scala
2
star
95

download-java

2
star
96

toolkit.g8

A Giter8 template for Typelevel Toolkit!
Scala
2
star
97

sbt-tls-crossproject

sbt-crossproject plugin for Typelevel Scala
Scala
1
star
98

await-cirrus

Depend on Cirrus CI from a GitHub Actions workflow
JavaScript
1
star
99

catalysts-docker

Shell
1
star
100

.github

a โœจspecial โœจ repository for project defaults and organization readme
1
star