• Stars
    star
    1,323
  • Rank 32,424 (Top 0.7 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

Dr. Elephant

Build Status Join the chat at https://gitter.im/linkedin/dr-elephant

Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Its goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.

Documentation

For more information on Dr. Elephant, check the wiki pages here.

For quick setup instructions: Click here

Developer guide: Click here

Administrator guide: Click here

User guide: Click here

Engineering Blog: Click here

Mailing-list & Github Issues

Google groups mailing list: Click here (Reached upper limit! please create github issues)

Github issues: click here

Meetings

We have scheduled a weekly Dr. Elephant meeting for the interested developers and users to discuss future plans for Dr. Elephant. Please click here for details.

How to Contribute?

Check this link.

License

Copyright 2016 LinkedIn Corp.

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

More Repositories

1

school-of-sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.
HTML
7,460
star
2

css-blocks

High performance, maintainable stylesheets.
TypeScript
6,347
star
3

Burrow

Kafka Consumer Lag Checking
Go
3,572
star
4

databus

Source-agnostic distributed change data capture system
Java
3,555
star
5

qark

Tool to look for several security related Android application vulnerabilities
Python
3,063
star
6

dustjs

Asynchronous Javascript templating for the browser and server
JavaScript
2,915
star
7

cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
Java
2,459
star
8

rest.li

Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs.
Java
2,347
star
9

kafka-monitor

Xinfra Monitor monitors the availability of Kafka clusters by producing synthetic workloads using end-to-end pipelines to obtain derived vital statistics - E2E latency, service produce/consume availability, offsets commit availability & latency, message loss rate and more.
Java
1,977
star
10

dexmaker

A utility for doing compile or runtime code generation targeting Android's Dalvik VM
Java
1,790
star
11

greykite

A flexible, intuitive and fast forecasting library
Python
1,716
star
12

ambry

Distributed object store
Java
1,699
star
13

shiv

shiv is a command line utility for building fully self contained Python zipapps as outlined in PEP 441, but with all their dependencies included.
Python
1,639
star
14

swift-style-guide

LinkedIn's Official Swift Style Guide
1,407
star
15

detext

DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
Python
1,249
star
16

parseq

Asynchronous Java made easier
Java
1,154
star
17

luminol

Anomaly Detection and Correlation library
Python
1,128
star
18

test-butler

Reliable Android Testing, at your service
Java
1,038
star
19

oncall

Oncall is a calendar tool designed for scheduling and managing on-call shifts. It can be used as source of dynamic ownership info for paging systems like http://iris.claims.
Python
966
star
20

PalDB

An embeddable write-once key-value store written in Java
Java
935
star
21

goavro

Go
917
star
22

brooklin

An extensible distributed system for reliable nearline data streaming at scale
Java
869
star
23

photon-ml

A scalable machine learning library on Apache Spark
Terra
791
star
24

Hakawai

A powerful, extensible UITextView.
Objective-C
783
star
25

iris

Iris is a highly configurable and flexible service for paging and messaging.
Python
759
star
26

URL-Detector

A Java library to detect and normalize URLs in text
Java
755
star
27

eyeglass

NPM Modules for Sass
TypeScript
744
star
28

opticss

A CSS Optimizer
TypeScript
715
star
29

flashback

mock the internet
Java
582
star
30

kafka-tools

A collection of tools for working with Apache Kafka.
Python
577
star
31

pygradle

Using Gradle to build Python projects
Java
573
star
32

coral

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
Java
568
star
33

LayoutTest-iOS

Write unit tests which test the layout of a view in multiple configurations
Objective-C
567
star
34

FeatureFu

Library and tools for advanced feature engineering
Java
565
star
35

LiTr

Lightweight hardware accelerated video/audio transcoder for Android.
Java
559
star
36

FastTreeSHAP

Fast SHAP value computation for interpreting tree-based models
Python
471
star
37

venice

Venice, Derived Data Platform for Planet-Scale Workloads.
Java
391
star
38

Spyglass

A library for mentions on Android
Java
379
star
39

dagli

Framework for defining machine learning models, including feature generation and transformations, as directed acyclic graphs (DAGs).
Java
353
star
40

ml-ease

ADMM based large scale logistic regression
Java
330
star
41

cruise-control-ui

Cruise Control Frontend (CCFE): Single Page Web Application to Manage Large Scale of Kafka Clusters
Vue
310
star
42

transport

A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Apache Hive, and Presto.
Java
280
star
43

spark-tfrecord

Read and write Tensorflow TFRecord data from Apache Spark.
Scala
255
star
44

isolation-forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Scala
206
star
45

LiFT

The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Scala
166
star
46

shaky-android

Shake to send feedback for Android.
Java
155
star
47

pyexchange

Python wrapper for Microsoft Exchange
Python
152
star
48

asciietch

A graphing library with the goal of making it simple to graphs using ascii characters.
Python
136
star
49

python-avro-json-serializer

Serializes data into a JSON format using AVRO schema.
Python
134
star
50

gdmix

A deep ranking personalization framework
Python
133
star
51

dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Java
130
star
52

li-apache-kafka-clients

li-apache-kafka-clients is a wrapper library for the Apache Kafka vanilla clients. It provides additional features such as large message support and auditing to the Java producer and consumer in the open source Apache Kafka.
Java
127
star
53

Avro2TF

Avro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.
Scala
124
star
54

linkedin-gradle-plugin-for-apache-hadoop

Groovy
117
star
55

dex-test-parser

Find all test methods in an Android instrumentation APK
Kotlin
102
star
56

datahub-gma

General Metadata Architecture
Java
101
star
57

cassette

An efficient, file-based FIFO Queue for iOS and macOS.
Objective-C
95
star
58

spaniel

LinkedIn's JavaScript viewport tracking library and IntersectionObserver polyfill
JavaScript
92
star
59

sysops-api

sysops-api is a framework designed to provide visability from tens of thousands of machines in seconds.
Python
72
star
60

migz

Multithreaded, gzip-compatible compression and decompression, available as a platform-independent Java library and command-line utilities.
Java
70
star
61

avro-util

Collection of utilities to allow writing java code that operates across a wide range of avro versions.
Java
68
star
62

Hoptimator

Multi-hop declarative data pipelines
Java
67
star
63

kube2hadoop

Secure HDFS Access from Kubernetes
Java
59
star
64

linkedin.github.com

Listing of all our public GitHub projects.
JavaScript
58
star
65

iceberg

A temporary home for LinkedIn's changes to Apache Iceberg (incubating)
Java
57
star
66

dynoyarn

DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.
Java
56
star
67

Tachyon

An Android library that provides a customizable calendar day view UI widget.
Java
51
star
68

DuaLip

DuaLip: Dual Decomposition based Linear Program Solver
Scala
47
star
69

iris-relay

Stateless reverse proxy for thirdparty service integration with Iris API.
Python
45
star
70

concurrentli

Classes for multithreading that expand on java.util.concurrent, adding convenience, efficiency and new tools to multithreaded Java programs
Java
43
star
71

Cytodynamics

Classloader isolation library.
Java
39
star
72

instantsearch-tutorial

Sample code for building an end-to-end instant search solution
JavaScript
39
star
73

iris-mobile

A mobile interface for linkedin/iris, built for iOS and Android on the Ionic platform
TypeScript
37
star
74

lambda-learner

Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.
Python
36
star
75

tracked-queue

An autotracked implementation of a ring-buffer-backed double-ended queue
TypeScript
35
star
76

self-focused

Helps make a single page application more friendly to screen readers.
JavaScript
35
star
77

PASS-GNN

Python
34
star
78

performance-quality-models

Personalizing Performance model repository
Jupyter Notebook
31
star
79

QueryAnalyzerAgent

Analyze MySQL queries with negligible overhead
Go
30
star
80

Iris-message-processor

Iris-message-processor is a fully distributed Go application meant to replace the sender functionality of Iris and provide reliable, scalable, and extensible incident and out of band message processing and sending.
Go
24
star
81

smart-arg

Smart Arguments Suite (smart-arg) is a slim and handy python lib that helps one work safely and conveniently with command line arguments.
Python
23
star
82

data-integration-library

The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
Java
22
star
83

atscppapi

This library provides wrappers around the existing Apache Traffic Server API which will vastly simplify the process of writing Apache Traffic Server plugins.
C++
20
star
84

TE2Rules

Python library to explain Tree Ensemble models (TE) like XGBoost, using a rule list.
Python
19
star
85

icon-magic

Automated icon build system for iOS, Android and Web
TypeScript
19
star
86

high-school-trainee

LinkedIn Women in Tech High School Trainee Program
Python
18
star
87

play-parseq

Play-ParSeq is a Play module which seamlessly integrates ParSeq with Play Framework
Scala
17
star
88

linkedin-calcite

LinkedIn's version of Apache Calcite
Java
17
star
89

kafka-remote-storage-azure

Java
13
star
90

forthic

Python
12
star
91

play-restli

A library that simplifies building restli services on top of the play server.
Java
12
star
92

spark-inequality-impact

Scala
10
star
93

AlerTiger

Jupyter Notebook
9
star
94

gobblin-elr

This is a read-only mirror of apache/gobblin
Java
4
star
95

linkedin-gtm-community-template

Smarty
4
star
96

WomenConnect

3
star
97

o19-bmc-firmware

OpenBMC is an open software framework to build a complete Linux image for a Board Management Controller (BMC)
C
3
star
98

audience-network-ios-sdk

JavaScript
3
star
99

rest.li-test-suite

A language-independent Rest.li test suite.
Java
3
star
100

apk-bitminer

Python
2
star