• Stars
    star
    166
  • Rank 227,748 (Top 5 %)
  • Language Makefile
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Running YARN on Kubernetes with PetSet controller.

YARN on Kubernetes

Tags and Dockerfiles

Docker image for kube-yarn

Image for deploying the kube-yarn artifacts without any local dependencies.

Example usage:

Invoke the default make target to deploy the full stack using the kubeconfig from your current home directory.

docker run -it --rm -v ${HOME}/.kube/config:/root/.kube/config:ro danisla/kube-yarn:latest

Make sure to mount any additional volumes for files referenced within the kube config.

To remove the resources by invoking the clean target:

docker run -it --rm -v ${HOME}/.kube/config:/root/.kube/config:ro danisla/kube-yarn:latest clean

Architecture Diagram

Architecture

StatefulSet Overview

The hadoop components are boostrapped using files from a ConfigMap to provide the init script and config xml files. This allows users to fully customize their distribution for their use cases.

The ACP env var can also be set to a comma-separated list of URLs to be downloaded and added to the classpath at runtime.

Logs from the ${HADOOP_PREFIX}/logs directory are tailed so that they can be viewed by attaching to the container.

hdfs-nn - HDFS Name Node

The namenode daemon runs in this pod container.

Currently, only 1 namenode is supported (no HA w/Zookeeper or secondary namenode).

hdfs-dn - HDFS Data Node

The datanode daemon runs in this pod container.

There can be 1 or more of these, scaled by changing the number of replicas in the spec.

yarn-rm - YARN Resource Manager

The resource manager daemon runs in this pod container.

Currently, only 1 resource manager is supported (no HA w/Zookeeper).

The WebUI can be accessed using the service port 8088 or at localhost:8088 after running make pf.

yarn-nm - YARN Node Manager

The node manager daemon runs in this pod container.

There can be 1 or more of these, scaled by changing the number of replicas in the pod spec.

The amount of vcores and memory registered with the resource manager is reflected and infered from the pod spec resources using the Downward API. These are made available to the container via the env vars MY_CPU_LIMIT and MY_MEM_LIMIT respectively and then added to the yarn-site.xml at runtime using the bootstrap script.

zeppelin - Zeppelin Notebook

The Zeppelin notebook is run in this pod container and can be used to run Spark jobs or hadoop jobs using the %sh shell interpreter.

The K8S yarn cluster config is mounted from the same ConfigMap over the top of the Zeppelin image dir to give it access to the cluster without modifying the base image.

The Zeppelin web app can be accessed using the service port 8080 or at localhost:8081 after running make pf.

Running locally

This repo uses minikube to start a k8s cluster locally.

The Makefile contains targets for starting the cluster and helper targets for kubectl to apply the K8S manifests and interact with the pods.

When running locally with minikube, make sure your VM has enough resources, you should set the number of cpus to 8 and memory to 8192. If not set, the pods won't have enough resources to fully start and will be stuck in the Pending creation phase.

Starting minikube manually:

minikube start --cpus 8 --memory 8192

Or, start with the Makefile which will download minikube and start the cluster:

make minikube

Start the YARN cluster:

make

This will create all of the components for the cluster.

Run this to create port forwards to localhost:

make pf

You should now be able to access the following:

  • YARN WebUI: http://localhost:8088
  • Zeppelin: http://localhost:8081

Full stack test

Test hdfs, yarn and mapred using TestDFSIO, submitted from one of the node managers:

make test

Which runs this command on yarn-nm-0:

/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 128MB -resFile /tmp/TestDFSIOwrite.txt

Spark on YARN in Zeppelin

In your browser, go to Zeppelin at: http://localhost:8081

Create a new note and run this in a paragraph:

sc.parallelize(1 to 1000).count

Press shift-enter to execute the paragraph

The first command executed creates the spark job on yarn and will take a few seconds, then you should get the result 1000 when complete.

Make targets:

init

Create the namespace, configmaps service account and hosts-disco service.

create-apps

Creates hdfs, yarn and zeppelin apps.

pf

Creates local port forwards for yarn and zeppelin.

dfsreport

Gets the state of HDFS.

get-yarn-nodes

Lists the registered yarn nodes by executing this in the node manager pod: yarn node -list

shell-hdfs-nn-0

Drops into a shell on the namenode

shell-yarn-rm-0

Drops into a shell on the resource manager.

shell-zeppelin-0

Drops into a shell on the zeppelin container.

Shutting down

make clean

Shutdown the cluster

minikube stop

Or:

make stop-minikube

More Repositories

1

FreeFlow

A layout engine for Android that decouples layouts from the View containers that manage scrolling and view recycling. FreeFlow makes it really easy to create custom layouts and beautiful transition animations as data and layouts change
Java
2,397
star
2

rulio

Rulio
Go
336
star
3

gots

MPEG Transport Stream handling in Go
Go
306
star
4

sirius

A distributed system library for managing application reference data
Scala
298
star
5

cmb

This project is no longer actively supported. It is made available as read-only. A highly available, horizontally scalable queuing and notification service compatible with AWS SQS and SNS
Java
278
star
6

jrugged

A Java libary of robustness design patterns
Java
266
star
7

ip4s

Defines immutable, safe data structures for describing IP addresses, multicast joins, socket addresses and similar IP & network related data types
Scala
224
star
8

mamba

Mamba is a Swift iOS, tvOS and macOS framework to parse, validate and write HTTP Live Streaming (HLS) data.
Swift
178
star
9

k8sh

A simple, easily extensible shell for navigating your kubernetes clusters
Shell
155
star
10

patternlab-edition-node-webpack

The webpack wrapper around patternlab-node core, providing tasks to interact with the core library and move supporting frontend assets.
JavaScript
127
star
11

gaad

GAAD (Go Advanced Audio Decoder)
Go
126
star
12

sheens

Message Machines
Go
119
star
13

eel

A simple proxy service to forward JSON events and transform or filter them along the way.
Go
105
star
14

Speed-testJS

JavaScript
94
star
15

traffic_control

Traffic Control CDN
92
star
16

Surf-N-Perf

Micro-library for gathering web page performance data
JavaScript
90
star
17

pulsar-client-go

A Go client library for Apache Pulsar
Go
78
star
18

graphvinci

A better schema visualizer for GraphQL APIs
JavaScript
74
star
19

MirrorTool-for-Kafka-Connect

A Kafka Source connector for Kafka Connect
Java
72
star
20

php-legal-licenses

A utility to help generate a file containing information about dependencies including the full license text.
PHP
70
star
21

caption-inspector

Caption Inspector is a reference decoder for Closed Captions (CEA-608 and CEA-708).
C
69
star
22

money

Dapper Style Distributed Tracing Instrumentation Libraries
Scala
67
star
23

compass-csslint

Easily integrate CSS Lint into your projects that use the Compass CSS Framework
Ruby
65
star
24

snowdrift

App to perform testing and validation of firewall rules
Shell
63
star
25

dialyzex

A Mix task for type-checking your Elixir project with dialyzer
Elixir
61
star
26

ssh-to

Easily manage dozens or hundreds of machines via SSH
Shell
58
star
27

ansible-sdkman

An Ansible role that installs, configures, and manages SDKMAN
Python
58
star
28

Bynar

Server remediation as a service
Rust
57
star
29

xGitGuard

AI based Secrets Detection Python Framework
Python
54
star
30

Canticle

Go
50
star
31

scte35-js

A SCTE 35 Parser for JavaScript
HTML
47
star
32

go-leaderelection

GoLea is a Go library that provides the capability for a set of distributed processes to compete for leadership for a shared resource.
Go
45
star
33

Comcast.github.io-archive

The main Open Source portal for Comcast
HTML
38
star
34

scte35-go

Golang implementation of ANSI/SCTE-35
Go
36
star
35

sitemapper-for-js

Generate XML sitemaps for JS Websites (Supports Angular, React)
JavaScript
35
star
36

zucchini

Run your cucumber-jvm tests in parallel across all your devices
Java
34
star
37

Oscar

OSCAR - OpenSource Cablemodem file AssembleR - DOCSIS, PacketCable, DPoE Configuration Editor and API Framework
Java
33
star
38

hlsparserj

Java
32
star
39

weasel

Lightweight license checker.
Go
32
star
40

Infinite-File-Curtailer

Curtail is a utility program that reads stdin and writes to a file bound by size.
C
32
star
41

DahDit

Custom Views for drawing Dotted and Dashed Lines without jumping through hoops.
Kotlin
27
star
42

resourceprovider2

Resource Management API Generator for Android
Kotlin
22
star
43

python-batch-runner

A tiny framework for building batch applications as a collection of tasks in a workflow.
Python
22
star
44

Buildenv-Tool

Go
21
star
45

blueprint

Blueprint is a compact framework for constructing mvp architecture within a scrollable, multi-view-type list UI. It uses the Android RecyclerView library, and currently only supports LinearLayouts
Kotlin
21
star
46

comcast.github.io

The main Open Source portal for Comcast
JavaScript
20
star
47

ProjectGuardRail

AI/ML applications have unique security threats. Project GuardRail is a set of security and privacy requirements that AI/ML applications should meet during their design phase that serve as guardrails against these threats. These requirements help scope the threats such applications must be protected against.
20
star
48

resourceprovider-utils

Samples and Test Utilities Library for the ResourceProvider API Generator Gradle Plugin for Android.
Java
19
star
49

xCOMPASS

This repository hosts a persona based privacy threat modeling solution called Models of Applied Privacy or MAP.
17
star
50

compass-csscss

Easily integrate csscss into your projects that use the Compass CSS Framework
Ruby
16
star
51

go-edgegrid

A Golang Akamai API client and Akamai OPEN EdgeGrid Authentication scheme authentication handler.
Go
16
star
52

cea-extractor

Parsing and display logic for CEA-608 caption data in fragmented MP4 files.
TypeScript
16
star
53

all-digital

Comcast All Digital CSS
SCSS
15
star
54

littlesheens

A Sheens implementation that has a small footprint
C
15
star
55

rdk-on-raspberrypi

Documentation for running RDK profiles ( Video, broadband, Camera ) on Raspberrypi boards
15
star
56

terraform-provider-akamai

An Akamai GTM Terraform provider
Go
14
star
57

Xooie

Important note: this project is no longer actively maintained. Xfinity Xooie - Modular, Modifiable, Responsive, Accessible
JavaScript
14
star
58

Superior-Cache-ANalyzer

A tool for inspecting the contents of Apache Traffic Server caches
Python
14
star
59

ActorServiceRegistry

Scala
14
star
60

connvitals-monitor

A persistent monitor and aggregator of network statistics
Python
13
star
61

RestfulHttpsProxy

Go
13
star
62

Priority-Operation-Processing

A workflow orchestration system where the workflow is scheduled as a unit giving resource priority once selected. Priority queuing and customizable scheduling algorithms. Customer aware for multi-tenant. A JSON DAG based blueprint defines the execution flow. Executes operations within a workflow by spinning up on-demand Kubernetes Pods (dynamic resource allocation)
Java
13
star
63

connvitals

A network analysis and statistics aggregation tool
Python
13
star
64

xml-selector

A jQuery-like interface for working with XML using CSS-style selectors
JavaScript
12
star
65

prombox

Prombox creates a sandbox environment for editing and testing Prometheus configuration on the fly
JavaScript
12
star
66

svn-to-github

Comprehensive Tool for Converting SVN to Git in Bulk
Shell
12
star
67

watchmen-ping-nightmare

A plugin for watchmen that uses nightmare and an electron browser to monitor websites
JavaScript
12
star
68

tsb

A Transitive Source Builder for managing builds across multiple repositories
Go
11
star
69

scte224structs

Utilities for modeling SCTE-224 data in Go
Go
11
star
70

cf-recycle-plugin

Cloud Foundry cli plugin for rolling restart of application instances
Go
10
star
71

redirector

Java
10
star
72

discovery

The Comcast discovery services for the open source community
Java
10
star
73

fishymetrics

Redfish API Prometheus Exporter for monitoring large scale server deployments
Go
10
star
74

libcertifier

With small, space optimized, 90KB libCertifier(), IoT devices (cameras, toasters, sensors ….) can now request and receive unique, compact (650 bytes) digital certificates (x509 v3 compliant).
C
10
star
75

compare-ini-files

Compare an arbitrary number of .ini files based on logical sections and key/value pairs.
PHP
9
star
76

flume2storm

This project is no longer actively maintained. The Flume2Storm repository contains the source code, documentation (wiki) and issue tracking system
Java
9
star
77

cts-mpx

Ruby
8
star
78

ansible-role-pypi

An ansible role for configuring a pypi-server on a systemd centos system.
Python
8
star
79

akamai-slack-reporter

A Slack slash command integration for querying your team's Akamai configuration
JavaScript
8
star
80

rapid-ip-checker

A GPU accelerated tool to compare large lists of IPv4/IPv6 addresses.
Python
8
star
81

css_lint_ruby

Nicholas C. Zakas and Nicole Sullivan's CSS Lint made available as a Ruby gem.
Ruby
8
star
82

akamai-gtm

A CLI to Akamai GTM
Go
8
star
83

DNS-over-HTTPs-translator

Go
7
star
84

tlsrpt_processor

Simple python script to process TLSRPT reports
Python
7
star
85

libstorage

rust storage server helper utilities
Rust
7
star
86

EasyConnect

Reference Development Kit that will help partners and vendors implement EasyConnect in their devices.
Java
7
star
87

dmc-sim

ns3 simulator code for Distributed Monotonic Clocks
C++
7
star
88

jovo-plugin-resume

πŸ”ˆ A plugin for resuming conversations in the Open Source Voice Layer, Jovo (https://www.jovo.tech)
TypeScript
7
star
89

Porrtal

This project was created to help developers. You can use the platform to build web applications. The project supports both React and Angular development.
TypeScript
7
star
90

cf-zdd-plugin

Go
7
star
91

vesper_legacy

Secure Telephone Identity Management in Session Initiation Protocol
Go
7
star
92

SyntaViz

A visualization interface for analyzing a (very large) corpus of natural-language queries.
Python
7
star
93

hypergard

A JavaScript client for HAL APIs with support for forms
JavaScript
7
star
94

ctex

A Mix task and helpers for running common_test suites
Elixir
7
star
95

plax

A test automation engine for messaging systems
Go
7
star
96

Ravel

Go
7
star
97

pipeclamp

Java
6
star
98

rally-rest-toolkit

Client API library for interacting with the Rally REST API
Go
6
star
99

rdkcryptoapi

Contains Cryptographic APIs used in the RDK Software Stack
C
6
star
100

polaris

Vanilla Web Components for global Header, Footer & Toast notifications for XFINITY Websites
JavaScript
6
star