• Stars
    star
    248
  • Rank 163,560 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ansible playbooks for deploying Hortonworks Data Platform and DataFlow using Ambari Blueprints

ansible-hortonworks

These Ansible playbooks will build a Hortonworks cluster (Hortonworks Data Platform and / or Hortonworks DataFlow) using Ambari Blueprints. For a full list of supported features check below.

  • Tested with: HDP 3.0 -> 3.1, HDP 2.4 -> 2.6.5, HDP Search 3.0 -> 4.0, HDF 2.0 -> 3.4, Ambari 2.4 -> 2.7 (the versions must be matched as per the support matrix).

  • This includes building the Cloud infrastructure (optional) and taking care of the prerequisites.

  • The aim is to first build the nodes in a Cloud environment, prepare them (OS settings, database, KDC, etc) and then install Ambari and create the cluster using Ambari Blueprints.

  • It can use a static blueprint or a dynamically generated one based on the components from the Ansible variables file.

    • The dynamic blueprint gives the freedom to distribute components for a chosen topology but this topology must respect Ambari Blueprint restrictions (e.g. if a single NAMENODE is set, there must also be a SECONDARY_NAMENODE).
    • Another advantage of the dynamic blueprint is that it generates the correct blueprint for when using HA services, or external databases or Kerberos.

DISCLAIMER

These Ansible playbooks offer a specialised way of deploying Ambari-managed Hortonworks clusters. To use these playbooks you'll need to have a good understanding of both Ansible and Ambari Blueprints.

This is not a Hortonworks product and these playbooks are not officially supported by Hortonworks.

For a fully Hortonworks-supported and user friendly way of deploying Ambari-managed Hortonworks clusters, please check Cloudbreak first.

Installation Instructions

  • AWS: See INSTALL.md for AWS build instructions and cluster installation.
  • Azure: See INSTALL.md for Azure build instructions and cluster installation.
  • Google Compute Engine: See INSTALL.md for GCE build instructions and cluster installation.
  • OpenStack: See INSTALL.md for OpenStack build instructions and cluster installation.
  • Static inventory: See INSTALL.md for cluster installation on pre-built environments.

Requirements

  • Ansible 2.5+

  • Expects CentOS/RHEL, Ubuntu, Amazon Linux or SLES hosts

Concepts

The core concept of these playbooks is the host_groups field in the Ambari Blueprint. This is an essential piece of Ambari Blueprints that maps the topology components to the actual servers.

The host_groups field in the Ambari Blueprint logically groups the components, while the host_groups field in the Cluster Creation Template maps these logical groups to the actual servers that will run the components.

Therefore, these Ansible playbooks try to take advantage of Blueprint's host_groups and map the Ansible inventory groups to the host_groups using a Jinja2 template: cluster_template.j2.

  • If the blueprint is dynamic, these host_groups are defined in the variable file and they need to match the Ansible inventory groups that will run those components.
  • If the blueprint is static, these host_groups are defined in the blueprint itself and they need to match the Ansible inventory groups that will run those components.

Cloud inventory

A special mention should be given when using a Cloud environment and / or a dynamic Ansible inventory.

In this case, building the Cloud environment is decoupled from building the Ambari cluster, and there needs to be a way to tie things together - the Cloud nodes to the Blueprint layout (e.g. on which Cloud node the NAMENODE should run).

This is done using a feature that exists in all (or most) Clouds: Tags. The Ansible dynamic inventory takes advantage of this Tag information and creates an Ansible inventory group for each Tag.

If these playbooks are also used to build the Cloud environment, the nodes need to be grouped together in the Cloud inventory variables file. This information is then used to set the Tags when building the nodes.

Then, using the Ansible dynamic inventory for the specific Cloud, the helper add_{{ cloud_name }}_nodes playbooks create the Ansible inventory groups that the rest of the playbooks expect.

  • A more elegant solution would have been to use Static Groups of Dynamic Groups as Ansible recommends. However, each Cloud's dynamic inventory has a different syntax for creating the groups, for example AWS uses tag_Group_ while OpenStack uses meta-Group_ and the helper add_{{ cloud_name }}_nodes playbooks was the solution to make this work for all Clouds.

Parts

Currently, these playbooks are divided into the following parts:

  1. (Optional) Build the Cloud nodes

    Run the build_cloud.sh script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.

  2. Install the cluster

    Run the install_cluster.sh script that will install the HDP and / or HDF cluster using Blueprints while taking care of the necessary prerequisites.

...or, alternatively, run each step separately (also useful for replaying a specific part in case of failure):

  1. (Optional) Build the Cloud nodes

    Run the build_cloud.sh script to build the Cloud nodes. Refer to the Cloud specific INSTALL guides for more information.

  2. Prepare the Cloud nodes

    Run the prepare_nodes.sh script to prepare the nodes.

    This installs the required OS packages, applies the recommended OS settings and prepares the database and / or the local MIT-KDC.

  3. Install Ambari

    Run the install_ambari.sh script to install Ambari on the nodes.

    This adds the Ambari repo, installs the Ambari Agent and Server packages and configures the Ambari Server with the required Java and database options.

  4. Configure Ambari

    Run the configure_ambari.sh script to configure Ambari.

    This further configures Ambari with some settings, changes admin password and adds the repository information needed by the cluster build.

  5. Apply Blueprint

    Run the apply_blueprint.sh script to install HDP and / or HDF based on an Ambari Blueprint.

    This uploads the blueprint to Ambari and applies it. Ambari would then create and install the cluster.

  6. Post Install

    Run the post_install.sh script to execute any actions after the cluster is built.

Features

Infrastructure support

  • Pre-built infrastructure (using a static inventory file)
  • OpenStack nodes
  • OpenStack Block Storage (Cinder)
  • AWS nodes (with root EBS only)
  • AWS Block Storage (additional EBS)
  • Azure nodes
  • Azure Block Storage (VHDs)
  • Google Compute Engine nodes (with root Persistent Disks only)
  • Google Compute Engine Block Storage (additional Persistent Disks)

OS support

  • CentOS/RHEL 6 support
  • CentOS/RHEL 7 support
  • Ubuntu 14 support
  • Ubuntu 16 support
  • Amazon Linux 2 AMI support (Ambari 2.7+)
  • SUSE Linux Enterprise Server 11 support
  • SUSE Linux Enterprise Server 12 support

Prerequisites done

  • Install and start NTP
  • Create /etc/hosts mappings
  • Set nofile and nproc limits
  • Set swappiness
  • Disable SELinux
  • Disable THP
  • Set Ambari repositories
  • Install OpenJDK or Oracle JDK
  • Install and prepare MySQL
  • Install and prepare PostgreSQL
  • Install and configure local MIT KDC
  • Partition and mount additional storage

Cluster build supported features

  • Install Ambari Agents and Server
  • Configure Ambari Server with OpenJDK or Oracle JDK
  • Configure Ambari Server with external database options
  • Configure Ambari Server with SSL
  • Configure custom Repositories and specific HDP/HDF versions
  • Configure Rack Awareness (static inventory)
  • Configure custom Paths (data / logs / metrics / tmp)
  • Build HDP clusters
  • Build HDF clusters
  • Build HDP clusters with HDF nodes
  • Build HDP clusters with HDP Search (Solr) addon
  • Build clusters with a specific JSON blueprint (static blueprint)
  • Build clusters with a generated JSON blueprint (dynamic blueprint based on Jinja2 template and variables)
  • Wait for the cluster to be built

Dynamic blueprint supported features

The components that will be installed are only those defined in the blueprint_dynamic variable.

  • Supported in this case means all prerequites (databases, passwords, required configs) are taken care of and the component is deployed successfully on the chosen host_group.
  • HDP Services: HDFS, YARN + MapReduce2, Hive, HBase, Accumulo, Oozie, ZooKeeper, Storm, Atlas, Kafka, Knox, Log Search, Ranger, Ranger KMS, SmartSense, Spark2, Zeppelin, Druid, Superset
  • HDF Services: NiFi, NiFi Registry, Schema Registry, Streaming Analytics Manager, ZooKeeper, Storm, Kafka, Knox, Ranger, Log Search
  • HA Configuration: NameNode, ResourceManager, Hive, HBase, Ranger KMS, Druid
  • Secure clusters with MIT KDC (Ambari managed)
  • Secure clusters with Microsoft AD (Ambari managed)
  • Install Ranger and enable all plugins
  • Ranger KMS
  • Ranger AD integration
  • Hadoop SSL
  • Hadoop AD integration
  • NiFi SSL
  • NiFi AD integration
  • Basic memory settings tuning
  • Make use of additional storage for HDP workers
  • Make use of additional storage for master services
  • Configure additional storage for NiFi

More Repositories

1

hive-testbench

Java
373
star
2

cloudbreak

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.
Java
353
star
3

gohadoop

Go
309
star
4

data-tutorials

Hortonworks tutorials
Shell
283
star
5

simple-yarn-app

Simple YARN application
Java
167
star
6

streamline

StreamLine - Streaming Analytics
Java
164
star
7

kubernetes-yarn

Go
117
star
8

ambari-shell

CLI for Apache Ambari
Java
89
star
9

structor

Vagrant files creating multi-node virtual Hadoop clusters with or without security.
HTML
67
star
10

HDP-Public-Utilities

Shell
65
star
11

hoya

Deploys and manages applications within a YARN cluster
Java
64
star
12

hadoop-icons

58
star
13

hive-json

A rough prototype of a tool for discovering Apache Hive schemas from JSON documents.
Java
42
star
14

cloudbreak-deployer

Cloudbreak Deployer Tool
Shell
35
star
15

hortonworks-sandbox

hortonworks-sandbox
Python
34
star
16

spark-native-yarn

Tez port for Spark API
Scala
32
star
17

docker-e2e-protractor

This project is going to be retired soon, please use the successor at https://github.com/hortonworks/docker-e2e-cloud
Shell
24
star
18

cloud-haunter

Cloud agnostic resource monitoring and janitor tool
Go
22
star
19

docker-logrotate

Logrotation for docker containers
Shell
22
star
20

mini-dev-cluster

Mini YARN/DFS cluster for developing and testing YARN-based applications (e.g., Tez)
Java
20
star
21

ambari-rest-client

Groovy client library for Apache Ambari's REST API
Groovy
20
star
22

docker-socat

Shell
20
star
23

docker-cloudbreak-uaa

Docker container to run a UAA identity server
Dockerfile
19
star
24

dstream

Java
18
star
25

data_analytics_studio

16
star
26

cloudbreak-images

Saltstack scripts to bake amazon/gcc/azure/openstack images suitable for Cloudbreak
Shell
14
star
27

registry

Schema Registry
Java
13
star
28

cb-cli

Go
13
star
29

docker-cloudbreak

Docker image for Cloudbreak
Shell
12
star
30

cloudbreak-openstack-ansible

Setting up a production ready OpenStack installation
Shell
12
star
31

efm

Java
11
star
32

templeton

New Templeton Repository
Java
10
star
33

docker-protractor

Ubuntu Docker Image for Protractor
Shell
8
star
34

nifi-android-s2s

Java
7
star
35

hadoop0

A Docker sandbox with Hadoop 0.0 (aka Nutch 0.8-dev) and word count example.
Shell
7
star
36

HBaseReplicationBridgeServer

HBase Replication Bridge Server
Java
7
star
37

fieldeng-nifi-druid-integration

Java
7
star
38

docker-cloudbreak-autoscale

Docker image with Periscope
Shell
7
star
39

cloudbreak-docs

Cloudbreak 1.x documentation repo
JavaScript
6
star
40

spark-native-yarn-samples

Scala
5
star
41

fieldeng-scythe

Time Series Library
Scala
5
star
42

fieldeng-modern-clickstream

Shell
5
star
43

fluid-bootstrap-theme

FLUID product design system theme for Bootstrap.
HTML
5
star
44

cloudbreak-documentation

Cloudbreak 2.0 - 2.7.x documentation repo. Cloudbreak 2.8+ docs are stored in the https://github.com/hortonworks/dita-docs repo
CSS
5
star
45

cbd-quickstart

Shell
5
star
46

nifi-ios-s2s

Repository for an iOS client library for Apache NiFi
Objective-C
5
star
47

bman

Bman - An Apache Hadoop cluster manager
Python
4
star
48

salt-bootstrap

Tool for bootstrapping VMs launched by Cloudbreak
Go
4
star
49

docker-haveged

Haveged container to increase entropy
Makefile
4
star
50

pso-hdp-local-repo

Scripts used to create a Local Repo for installations.
Shell
4
star
51

fieldeng-pyscythe

Python Time Series Library
Python
2
star
52

minifi-java

Java
2
star
53

fieldeng-device-manager-demo

Java
2
star
54

docker-cloudbreak-shell

Shell
2
star
55

HA-Monitor

Java
2
star
56

fieldeng-cronus

Industrial IoT NiFi Layer
Java
2
star
57

isa-l-release

Public isa_l release repository
1
star
58

cloudbreak-service-registration

Go
1
star
59

fieldeng-ad-server

JavaScript
1
star
60

fieldeng-biologics-manufacturing-demo

Shell
1
star
61

docker-cloudbreak-autoscale-db

Shell
1
star
62

dp-cli-common

Go
1
star
63

docker-cloudbreak-server-db

Shell
1
star
64

fieldeng-nifi-atlas-lineage-reporter

Java
1
star
65

hue-release

Public hue release repository
1
star
66

fieldeng-data-simulators

Java
1
star
67

fieldeng-retail-store-monitor-demo

Java
1
star
68

fieldeng-nifi-livy-integration

Java
1
star
69

DGC-aetna

JavaScript
1
star
70

ccp-chain-parsing

Java
1
star
71

vega-lite-ui

JavaScript
1
star
72

fieldeng-credit-card-transaction-monitor-mobile-app

Java
1
star
73

azure-cbd-quickstart

Shell
1
star
74

docker-mybatis-migrations

Shell
1
star
75

minifi-cpp

C++
1
star
76

fieldeng-rhea

Industrial IoT UI Layer
CSS
1
star
77

iop-solr-stack

Ambari Solr mpack for helping BI/HDP migration
Python
1
star