• Stars
    star
    522
  • Rank 84,811 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 11 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Minos is beyond a hadoop deployment system.

What is Minos

Minos is a distributed deployment and monitoring system. It was initially developed and used at Xiaomi to deploy and manage the Hadoop, HBase and ZooKeeper clusters used in the company. Minos can be easily extended to support other systems, among which HDFS, YARN and Impala have been supported in the current release.

Components

The Minos system contains the following four components:

Client

This is the command line client tool used to deploy and manage processes of various systems. You can use this client to perform various deployment tasks, e.g. installing, (re)starting, stopping a service. Currently, this client supports ZooKeeper, HDFS, HBase, YARN and Impala. It can be extended to support other systems. You can refer to the following Using Client to learn how to use it.

Owl

This is the dashboard system to display the status of all processes, where users can take a overview of the whole clusters managed by Minos. It collects data from servers through JMX interface. And it organizes pages in cluster, job and task corresponding to the definition in cluster configuration. It also provides some utils like health alerter, HDFS quota updater and quota reportor. You can refer to Installing Owl to learn how to install and use it.

Supervisor

This is the process management and monitoring system. Supervisor is an open source project, a client/server system that allows its users to monitor and control a number of processes on a UNIX-like operating system.

Based on the version of supervisor-3.0b1, we extended Supervisor to support Minos. We implemented an RPC interface under the deployment directory, so that our deploy client can invoke the services supplied by supervisord.

When deploying a Hadoop cluster for the first time, you need to set up supervisord on every production machine. This only needs to be done once. You can refer to Installing Supervisor to learn how to install and use it.

Tank

This is a simple package management Django app server for our deployment tool. When setting up a cluster for the first time, you should set up a tank server first. This also needs to be done only once. You can refer to Installing Tank to learn how to install and use it.

Setting Up Minos on Centos/Ubuntu

Prerequisites

Install Python

Make sure install Python 2.7 or later from http://www.python.org.

Install JDK

Make sure that the Oracle Java Development Kit 6 is installed (not OpenJDK) from http://www.oracle.com/technetwork/java/javase/downloads/index.html, and that JAVA_HOME is set in your environment.

Building Minos

Clone the Minos repository

To Using Minos, just check out the code on your production machine:

git clone https://github.com/XiaoMi/minos.git

Build the virtual environment

All the Components of Minos run with its own virtual environment. So, before using Minos, building the virtual environment firstly.

cd minos
./build.sh build

Note: If you only use the Client component on your current machine, this operation is enough, then you can refer to Using Client to learn how to deploy and manage a cluster. If you want to use the current machine as a Tank server, you can refer to Installing Tank to learn how to do that. Similarly, if you want to use the current machine as a Owl server or a Supervisor server, you can refer to Installing Owl and Installing Supervisor respectively.

Installing Tank

Start Tank

cd minos
./build.sh start tank --tank_ip ${your_local_ip} --tank_port ${port_tank_will_listen}

Note: If you do not specify the tank_ip and tank_port, it will start tank server using 0.0.0.0 on 8000 port.

Stop Tank

./build.sh stop tank

Installing Supervisor

Prerequisites

Make sure you have intstalled Tank on one of the production machines.

Start Supervisor

cd minos
./build.sh start supervisor --tank_ip ${tank_server_ip} --tank_port ${tank_server_port}

When starting supervisor for the first time, the tank_ip and tank_port must be specified.

After starting supervisor on the destination machine, you can access the web interface of the supervisord. For example, if supervisord listens on port 9001, and the serving machine's IP address is 192.168.1.11, you can access the following URL to view the processes managed by supervisord:

http://192.168.1.11:9001/

Stop Supervisor

./build.sh stop supervisor

Monitor Processes

We use Superlance to monitor processes. Superlance is a package of plug-in utilities for monitoring and controlling processes that run under supervisor.

We integrate superlance-0.7 to our supervisor system, and use the crashmail tool to monitor all processes. When a process exits unexpectedly, crashmail will send an alert email to a mailing list that is configurable.

We configure crashmail as an auto-started process. It will start working automatically when the supervisor is started. Following is a config example, taken from minos/build/template/supervisord.conf.tmpl, that shows how to configure crashmail:

[eventlistener:crashmailbatch-monitor]
command=python superlance/crashmailbatch.py \
        --toEmail="[email protected]" \
        --fromEmail="[email protected]" \
        --password="123456" \
        --smtpHost="mail.example.com" \
        --tickEvent=TICK_5 \
        --interval=0.5
events=PROCESS_STATE,TICK_5
buffer_size=100
stdout_logfile=crashmailbatch.stdout
stderr_logfile=crashmailbatch.stderr
autostart=true

Note: The related configuration information such as the server port or username is set in minos/build/template/supervisord.conf.tmpl, if you don't want to use the default value, change it.

Using Client

Prerequisites

Make sure you have intstalled Tank and Supervisor on your production machines.

A Simple Tutorial

Here we would like to show you how to use the client in a simple tutorial. In this tutorial we will use Minos to deploy an HDFS service, which itself requires the deployment of a ZooKeeper service.

The following are some conventions we will use in this tutorial:

  • Cluster type: we define three types of clusters: tst for testing, prc for offline processing, and srv for online serving.
  • ZooKeeper cluster name: we define the ZooKeeper cluster name using the IDC short name and the cluster type. For example, dptst is used to name a testing cluster at IDC dp.
  • Other service cluster names: we define other service cluster names using the corresponding ZooKeeper cluster name and the name of the business for which the service is intended to serve. For example, the dptst-example is the name of a testing cluster used to do example tests.
  • Configuration file names: all the services will have a corresponding configuration file, which will be named as ${service}-${cluster}.cfg. For example, the dptst ZooKeeper service's configuration file is named as zookeeper-dptst.cfg, and the dptst example HDFS service's configuration file is named as hdfs-dptst-example.cfg.

Configuring deploy.cfg

There is a configuration file named deploy.cfg under the root directory of minos. You should first edit this file to set up the deployment environment. Make sure that all service packages are prepared and configured in deploy.cfg.

Configuring ZooKeeper

As mentioned in the cluster naming conventions, we will set up a testing ZooKeeper cluster at the dp IDC, and the corresponding configuration file for the cluster will be named as zookeeper-dptst.cfg.

You can edit zookeeper-dptst.cfg under the config/conf/zookeeper directory to configure the cluster. The zookeeper-dptst.cfg is well commented and self explained, so we will not explain more here.

Setting up a ZooKeeper Cluster

To set up a ZooKeeper cluster, just do the following two steps:

  • Install a ZooKeeper package to the tank server:

      cd minos/client
      ./deploy install zookeeper dptst
    
  • Bootstrap the cluster, this is only needed once when the cluster is setup for the first time:

      ./deploy bootstrap zookeeper dptst
    

Here are some handy ways to manage the cluster:

  • Show the status of the ZooKeeper service:

      ./deploy show zookeeper dptst
    
  • Start/Stop/Restart the ZooKeeper cluster:

      ./deploy stop zookeeper dptst
      ./deploy start zookeeper dptst
      ./deploy restart zookeeper dptst
    
  • Clean up the ZooKeeper cluster:

      ./deploy cleanup zookeeper dptst
    
  • Rolling update the ZooKeeper cluster:

      ./deploy rolling_update zookeeper dptst
    

Configuring HDFS

Now it is time to configure the HDFS system. Here we set up a testing HDFS cluster named dptst-example, whose configuration file will be named as hdfs-dptst-example.cfg, as explained in the naming conventions.

You can edit hdfs-dptst-example.cfg under the config/conf/hdfs directory to configure the cluster. The hdfs-dptst-example.cfg is well commented and self explained, so we will not explain more here.

Setting Up HDFS Cluster

Setting up and managing an HDFS cluster is similar to setting up and managing a ZooKeeper cluster. The only difference is the cluster name, dptst-example, which implies that the corresponding ZooKeeper cluster is dptst:

./deploy install hdfs dptst-example
./deploy bootstrap hdfs dptst-example
./deploy show hdfs dptst-example
./deploy stop hdfs dptst-example
./deploy start hdfs dptst-example
./deploy restart hdfs dptst-example
./deploy rolling_update hdfs dptst-example --job=datanode
./deploy cleanup hdfs dptst-example

Shell

The client tool also supports a very handy command named shell. You can use this command to manage the files on HDFS, tables on HBase, jobs on YARN, etc. Here are some examples about how to use the shell command to perform several different HDFS operations:

./deploy shell hdfs dptst-example dfs -ls /
./deploy shell hdfs dptst-example dfs -mkdir /test
./deploy shell hdfs dptst-example dfs -rm -R /test

You can run ./deploy --help to see the detailed help messages.

Installing Owl

Owl must be installed on the machine that you also use the Client component, they both use the same set of cluster configuration files.

Prerequisites

Install Gnuplot

Gnuplot is required for opentsdb, you can install it with the following command.

Centos: sudo yum install gnuplot
Ubuntu: sudo apt-get install gnuplot

Install Mysql

Ubuntu:
sudo apt-get install mysql-server
sudo apt-get install mysql-client

Centos:
yum install mysql-server mysql mysql-devel

Configuration

Configure the clusters you want to monitor with owl in minos/config/owl/collector.cfg. Following is an example that shows how to modify the configuration.

[collector]
# service name(space seperated)
service = hdfs hbase

[hdfs]
# cluster name(space seperated)
clusters=dptst-example
# job name(space seperated)
jobs=journalnode namenode datanode
# url for collecotr, usually JMX url
metric_url=/jmx?qry=Hadoop:*

Note: Some other configurations such as and opentsdb port is set in minos/build/minos_config.py. You can change the default port for avoiding port conflicts.

Start Owl

cd minos
./build.sh start owl --owl_ip ${your_local_ip} --owl_port ${port_owl_monitor_will_listen}

After starting Owl, you can access the web interface of the Owl. For example, if Owl listens on port 8088, and the machine's IP address is 192.168.1.11, you can access the following URL to view the Owl web interface:

http://192.168.1.11:8088/

Stop Owl

./build.sh stop owl

FAQ

  1. When installing Mysql-python, you may get an error of _mysql.c:44:23: error: my_config.h: No such file or directory (centos) or EnvironmentError: mysql_config not found (ubuntu). As mysql_config is part of mysql-devel, installing mysql-devel allows the installation of Mysql-python. So you may need to install it.

     ubuntu: sudo apt-get install libmysqlclient-dev
     centos: sudo yum install mysql-devel
    
  2. When installing twisted, you may get an error of CompressionError: bz2 module is not available and compile appears:

     Python build finished, but the necessary bits to build these modules were not found:
     _sqlite3           _tkinter           bsddb185
     bz2                dbm                dl
    

Then, you may need to install bz2 and sqlite3 such as

  sudo apt-get install libbz2-dev
  sudo apt-get install libsqlite3-dev
  1. When setting up the stand-alone hbase on Ubuntu, you may fail to start it because of the /etc/hosts file. You can refer to http://hbase.apache.org/book/quickstart.html#ftn.d2907e114 to fix the problem.

  2. When using the Minos client to install a service package, if you get an error of socket.error: [Errno 101] Network is unreachable, please check your tank server configuration in deploy.cfg file, you might miss it.

Note: See Minos Wiki for more advanced features.

More Repositories

1

soar

SQL Optimizer And Rewriter
Go
8,659
star
2

mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
C++
4,922
star
3

open-falcon

A Distributed and High-Performance Monitoring System
3,025
star
4

Gaea

Gaea is a mysql proxy, it's developed by xiaomi b2c-dev team.
Go
2,621
star
5

naftis

An awesome dashboard for Istio built with love.
Go
1,891
star
6

mone

No description, website, or topics provided
Java
1,112
star
7

MiNLP

XiaoMi Natural Language Processing Toolkits
Scala
781
star
8

hiui

HIUI is a solution that is adequate for the fomulation and implementation of interaction and UI design standard for front, middle and backend.
TypeScript
738
star
9

android_tv_metro

android tv metro framework and server API
Java
653
star
10

rose

Rose is not only a framework.
Java
498
star
11

shepher

Java
493
star
12

MiLM-6B

427
star
13

chronos

Network service to provide globally strictly monotone increasing timestamp
Java
399
star
14

LuckyMoneyTool

Java
376
star
15

mace-models

Mobile AI Compute Engine Model Zoo
Python
368
star
16

mobile-ai-bench

Benchmarking Neural Network Inference on Mobile Devices
C++
355
star
17

kaldi-onnx

Kaldi model converter to ONNX
Python
236
star
18

linden

Java
233
star
19

themis

Themis provides cross-row/cross-table transaction on HBase based on google's percolator.
Java
226
star
20

rdsn

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/rdsn
C++
144
star
21

StableDiffusionOnDevice

本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。
C++
91
star
22

thain

Thain is a distributed flow schedule platform.
TypeScript
81
star
23

ozhera

Application Observable Platform in the Cloud Native Era
Java
72
star
24

misound

MiSound is a Android application making XiaoMi's SoundBar more powerful. EQ, control, player all in one.
Java
64
star
25

galaxy-sdk-java

Java SDK for Xiaomi Structured Datastore Service
Java
63
star
26

C3KG

Python
63
star
27

nnlib

Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
C
53
star
28

subllm

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Python
53
star
29

galaxy-fds-sdk-python

Python SDK for Xiaomi File Data Storage.
Python
51
star
30

jack

Jack is a cluster manager built on top of Zookeeper and thrift.
50
star
31

dasheng

Official PyTorch code for Deep Audio-Signal Holistic Embeddings
Python
46
star
32

cmath

CMATH: Can your language model pass Chinese elementary school math test?
Python
38
star
33

pegasus-rocksdb

Has been migrated to https://github.com/pegasus-kv/rocksdb
C++
34
star
34

cloud-ml-sdk

Python
32
star
35

talos-sdk-golang

Go SDK for Xiaomi Streaming Message Queue
Go
32
star
36

pegasus-java-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/java-client
Java
31
star
37

ECFileCache

Java
30
star
38

mace-kit

C++
27
star
39

pegasus-go-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/go-client
Go
24
star
40

emma

Python
22
star
41

galaxy-fds-sdk-java

Java SDK for Xiaomi File Data Storage.
Java
22
star
42

xiaomi.github.com

JavaScript
21
star
43

CGNet

The official implementation of the ECCV 2024 paper: Continuity Preserving Online CenterLine Graph Learning
Python
20
star
44

galaxy-fds-sdk-android

Android SDK for Xiaomi File Data Storage.
Java
18
star
45

go-fds

Next-generation fds golang sdk
Go
17
star
46

galaxy-fds-sdk-php

PHP SDK for Xiaomi File Data Storage.
PHP
16
star
47

galaxy-sdk-python

Python SDK for Xiaomi Structured Datastore Service
Python
16
star
48

galaxy-sdk-go

Go SDK for Xiaomi Structured Datastore Service
Go
15
star
49

galaxy-hadoop

Hadoop interface for Xiaomi Open Storage
Java
13
star
50

galaxy-thrift-api

Thrift API for Xiaomi Structured Datastore Service
Thrift
12
star
51

galaxy-fds-sdk-cpp

C++ SDK for Xiaomi File Data Storage
C++
11
star
52

galaxy-fds-sdk-javascript

JavaScript
9
star
53

pegasus-python-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/python-client
Python
8
star
54

DetermLR

Open source code for paper
Python
8
star
55

galaxy-sdk-php

PHP SDK for Xiaomi Structured Datastore Service
PHP
8
star
56

pegasus-datax

Provide pegasus plugin in alibaba/DataX, please refer to 'pegasuswriter/doc/pegasuswriter.md'.
Java
8
star
57

galaxy-fds-migration-tool

A MapReduce tool to migrate objects or files parallely between different object storage systems
Java
7
star
58

galaxy-sdk-nodejs

Node.js SDK for Xiaomi Structured Datastore Service
JavaScript
6
star
59

pegasus-YCSB

Provide pegasus plugin in YCSB, please refer to 'Test Pegasus' section in README.
Java
6
star
60

pegasus-nodejs-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/nodejs-client
JavaScript
6
star
61

pegasus-scala-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/scala-client
Scala
6
star
62

PowerTestDemo

Java
5
star
63

galaxy-fds-sdk-ios

ios sdk for galaxy-fds
Objective-C
5
star
64

SiMuST-C

Python
5
star
65

galaxy-sdk-cpp

C++ SDK for Xiaomi Structured Datastore Service
C++
5
star
66

nlpcc-2023-shared-task-9

https://mp.weixin.qq.com/s/pBDvTmr_oOHUPzBhjXG-aw
Python
5
star
67

TED-MMST

1
star
68

PowerTestDemoGlobal

The demo script of Power Consumption Test.
Java
1
star
69

galaxy-sdk-javascript

Javascript SDK for Xiaomi Structured Datastore Service
JavaScript
1
star