CERN Database Group (@cerndb)

Top repositories

1

dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Python
624
star
2

spark-dashboard

Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
Dockerfile
87
star
3

SparkPlugins

Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are initialized. This also allows extending the Spark metrics systems with user-provided monitoring probes.
Scala
77
star
4

hdfs-metadata

Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks and nodes.
Java
56
star
5

SparkDLTrigger

Code and links to the data for the article "Machine Learning Pipelines with Modern Big DataTools for High Energy Physics"
Jupyter Notebook
29
star
6

Hadoop-Profiler

Hadoop Profiler, or hprofiler, is a tool which is able to analyze on- and off-CPU workloads on distributed computing environments.
Shell
24
star
7

grafana-mimir-cardinality-dashboards

Grafana Mimir dashboards used for cardinality exploration
17
star
8

sparkMeasure

This is a mirror of https://github.com/LucaCanali/sparkMeasure - sparkMeasure is a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics.
Scala
14
star
9

cern-sso-python

Python Re-implementation of the cern-get-sso-cookie functionality
Python
11
star
10

SparkTraining

Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
Jupyter Notebook
9
star
11

flume-ng-audit-db

Apache Flume JDBC source, drop duplicated events interceptor, utility to infer Avro schema from table and much more!
Java
9
star
12

linux-firewall-tool

Linux iptables automation tool. It manages the firewall on CERN 's DB Servers.
Python
8
star
13

tf-spawner

TF-Spawner is an experimental tool for running TensorFlow distributed training on Kubernetes clusters.
Python
8
star
14

wls-cern-sso

Oracle Weblogic CERN SSO integration packages
Java
8
star
15

storage-api

Unified RESTful interface for managing CERNs data storage back-ends
Python
7
star
16

zkpolicy

Zookeeper Policy Audit Tool (aka zkPolicy) for checking and enforcing ACLs on ZNodes.
Java
7
star
17

SparkExecutorPlugins2.4

Spark Executor Plugins Examples for Spark 2.4
Java
6
star
18

hadoop-xrootd

Mirror of CERN db/hadoop-xrootd. Hadoop-XRootD Filesystem Connector
Java
6
star
19

hadoop-metrics-http-sink

Hadoop Metrics 2 plugin to push metrics to a HTTP end point (e.g. Elastic, Flume).
Java
5
star
20

dbod-core

DB On Demand management infrastructure core library
Perl
5
star
21

dbod-web

Future DB On Demand Web Interface implementation
TypeScript
5
star
22

dbod-api

DB On Demand API
Python
4
star
23

CERN-Hadoop-tutorials-ML-with-Apache-Spark

Tutorial materials for Analytics with Apache Spark and MLlib at CERN. https://indico.cern.ch/event/546003/
TeX
4
star
24

hloader

Python
3
star
25

dbod-infra

Perl
3
star
26

dbod-webapp

Java
3
star
27

netapp-api-python

A re-implementation of (parts of) NetApp's ZAPI in idiomatic Python using Requests
Python
3
star
28

wls-cli

Weblogic CLI tool
Python
2
star
29

elastalert

Python
2
star
30

nile-webapp

Nile Service Web Interface
TypeScript
2
star
31

tomcat-sso-integration-components

Set of valves classes that helps CERN applications with the integration in the CERN Authentication
Java
1
star
32

oracle-weblogic-1221-domain-ords-autodeploy

Oracle WebLogic docker image with ORDS deployed
PLSQL
1
star
33

coding-standards

Miscelenaeous coding standards or support files to use in IT-DB projects
Shell
1
star
34

hadoop-tutorials

Repository for hadoop tutorials code and guides
1
star
35

NotebooksExamples

This repository contains Jupyter notebook examples, intended to be linked with the SWAN Gallery
Jupyter Notebook
1
star
36

cern-openlab-oracle-hackzurich-2018

CERN openlab Oracle HackZurich 2018 collaboration
1
star
37

2qpgconf2017

CERN DB On Demand 2Q PGConf 2017 slides
1
star