• Stars
    star
    12
  • Rank 1,597,372 (Top 32 %)
  • Language
    Jupyter Notebook
  • License
    Creative Commons ...
  • Created about 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/

More Repositories

1

dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Python
624
star
2

spark-dashboard

Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
Dockerfile
112
star
3

SparkPlugins

Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Scala
83
star
4

hdfs-metadata

Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks and nodes.
Java
56
star
5

SparkDLTrigger

Code and links to the data for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
Jupyter Notebook
29
star
6

grafana-mimir-cardinality-dashboards

Grafana Mimir dashboards used for cardinality exploration
26
star
7

Hadoop-Profiler

Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.
Shell
24
star
8

sparkMeasure

This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics.
Scala
14
star
9

cern-sso-python

Python Re-implementation of the cern-get-sso-cookie functionality
Python
11
star
10

flume-ng-audit-db

Apache Flume JDBC source, drop duplicated events interceptor, utility to infer Avro schema from table and much more!
Java
9
star
11

linux-firewall-tool

Linux iptables automation tool. It manages the firewall on CERN 's DB Servers.
Python
8
star
12

tf-spawner

TF-Spawner is an experimental tool for running TensorFlow distributed training on Kubernetes clusters.
Python
8
star
13

wls-cern-sso

Oracle Weblogic CERN SSO integration packages
Java
8
star
14

storage-api

Unified RESTful interface for managing CERNs data storage back-ends
Python
7
star
15

zkpolicy

Zookeeper Policy Audit Tool (aka zkPolicy) for checking and enforcing ACLs on ZNodes.
Java
7
star
16

SparkExecutorPlugins2.4

Spark Executor Plugins Examples for Spark 2.4
Java
6
star
17

hadoop-xrootd

Mirror of CERN db/hadoop-xrootd. Hadoop-XRootD Filesystem Connector
Java
6
star
18

hadoop-metrics-http-sink

Hadoop Metrics 2 plugin to push metrics to a HTTP end point (e.g. Elastic, Flume).
Java
5
star
19

dbod-core

DB On Demand management infrastructure core library
Perl
5
star
20

dbod-web

Future DB On Demand Web Interface implementation
TypeScript
5
star
21

dbod-api

DB On Demand API
Python
4
star
22

CERN-Hadoop-tutorials-ML-with-Apache-Spark

Tutorial materials for Analytics with Apache Spark and MLlib at CERN. https://indico.cern.ch/event/546003/
TeX
4
star
23

hloader

Python
3
star
24

dbod-infra

Perl
3
star
25

dbod-webapp

Java
3
star
26

netapp-api-python

A re-implementation of (parts of) NetApp's ZAPI in idiomatic Python using Requests
Python
3
star
27

wls-cli

Weblogic CLI tool
Python
2
star
28

elastalert

Python
2
star
29

nile-webapp

Nile Service Web Interface
TypeScript
2
star
30

tomcat-sso-integration-components

Set of valves classes that helps CERN applications with the integration in the CERN Authentication
Java
1
star
31

oracle-weblogic-1221-domain-ords-autodeploy

Oracle WebLogic docker image with ORDS deployed
PLSQL
1
star
32

coding-standards

Miscelenaeous coding standards or support files to use in IT-DB projects
Shell
1
star
33

hadoop-tutorials

Repository for hadoop tutorials code and guides
1
star
34

NotebooksExamples

This repository contains Jupyter notebook examples, intended to be linked with the SWAN Gallery
Jupyter Notebook
1
star
35

cern-openlab-oracle-hackzurich-2018

CERN openlab Oracle HackZurich 2018 collaboration
1
star
36

2qpgconf2017

CERN DB On Demand 2Q PGConf 2017 slides
1
star