• Stars
    star
    26
  • Rank 930,752 (Top 19 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A simple introduction to using spark ml pipelines

More Repositories

1

spark-testing-base

Base classes to use when writing tests with Spark
Scala
1,513
star
2

learning-spark-examples

Examples for learning spark
Java
333
star
3

elasticsearchspark

Elastic Search on Spark
Scala
112
star
4

spark-validator

A library you can include in your Spark job to validate the counters and perform operations on success. Goal is scala/java/python support.
Scala
106
star
5

spark-structured-streaming-ml

Structured Streaming Machine Learning example with Spark 2.0
Scala
92
star
6

sparkProjectTemplate.g8

Template for Spark Projects
Scala
88
star
7

spark-flowchart

Flowchart for debugging Spark applications
Shell
83
star
8

fastdataprocessingwithsparkexamples

Examples for Fast Data Processing with Spark
Scala
59
star
9

spark-upgrade

Magic to help Spark pipelines upgrade
Python
33
star
10

chef-cookbook-spark

A chef cookbook for deploying spark
Ruby
30
star
11

fastdataprocessingwithspark-sharkexamples

Examples for Fast Data Processing with Spark example Shark project
Scala
22
star
12

holdensmagicalunicorn

Perl
18
star
13

diversity-analytics

Analytics on Apache Projects for Diversity
Jupyter Notebook
18
star
14

intro-to-pyspark-demos

Examples from Holden's intro to PySpark workshop. This is an intro level workshop focused on using Spark with Python.
14
star
15

clothes-from-code

Auto generate cool code based clothing [WIP]
Python
12
star
16

remote-python-debugging-4-spark

Set up PDB on Spark
Jupyter Notebook
10
star
17

livestreaming-tools

Basic tools for livestreaming, very much to Holden's use case.
Python
7
star
18

distributedcomputing4kids

distributedcomputing4kids
Jupyter Notebook
6
star
19

kafka-streams-python-cthulhu

Proof of concept integration of Python into Kafka Streams. Built w/Scala
Python
5
star
20

stalin-hax

hax on top of stalin
C
4
star
21

spark-misc-utils

Misc Utils for Spark
Scala
4
star
22

wanderinghobos

Scheme
4
star
23

resume

latex resume
TeX
4
star
24

spark-ml-example

Some examples using Spark's machine learning library.
Scala
3
star
25

web2.0collage

Scheme
3
star
26

github-rename-all-my-commits

Uses git filter-repo to rename all of your commits in all of your repos, intended for removing deadnames, will be funky with any forks you want to merge though.
Shell
3
star
27

print-the-world

I (attempt to) print everything* from places
Python
2
star
28

beam-test-examples

[WIP] Examples for testing Apache BEAM
Java
2
star
29

fnurbot

Scala
2
star
30

sparklingpinkpandas

Website for Sparkling Pink Pandas (queer, trans focused scooter club)
JavaScript
2
star
31

mydotfiles

My dotfiles. You probably don't care about this.
Shell
2
star
32

dnsrbl

A simple haskell interface to asynchronously lookup ip/name against a bunch of DNS based RBLs
2
star
33

colo-scripts

Shell
2
star
34

Costume-Code

Code for the Alice in Wonderland Costume
Java
1
star
35

commerce

rails based e-commerce platform
Ruby
1
star
36

talk-info

Info of my talks
1
star
37

datasciencecoursera

datasciencecoursera
1
star
38

not-so-deep-spark

A not so deep version of deep-spark
Jupyter Notebook
1
star