Google Cloud Dataproc to create managed Apache Hadoop and Apache Spark instances on Google Compute Engine.
This project has been deprecated. Please usebdutil
bdutil is a command-line script used to manage Apache Hadoop and Apache Spark instances on Google Compute Engine. bdutil manages deployment, configuration, and shutdown of your Hadoop instances.
Requirements
bdutil depends on the Google Cloud SDK. bdutil is supported in any posix-compliant Bash v3 or greater shell.
Usage
See the QUICKSTART file in the docs
directory to learn how to set up your Hadoop instances using bdutil.
- Install and configure the Google Cloud SDK if you have already not done so
- Clone this repository with
git clone https://github.com/GoogleCloudPlatform/bdutil.git
- Modify the following variables in the bdutil_env.sh file:
PROJECT
- Set to the project ID for all bdutil commands. The project value will be overridden in the following order (where 1 overrides 2, and 2 overrides 3): * -p flag value, or if not specified then * PROJECT value in bdutil_env.sh, or if not specified then * gcloud default project valueCONFIGBUCKET
- Set to a Google Compute Storage bucket that your project has read/write access to.- Run
bdutil --help
for a list of commands.
The script implements the following commands, which are very similar:
bdutil create
creates and starts instances, but will not apply most configuration settings. You can callbdutil run_command_steps
on instances afterward to apply configuration settings to them. Typically you wouldn't use this, but would usebdutil deploy
instead.bdutil deploy
creates and starts instances with all the configuration options specified in the command line and any included configuration scripts.
Components installed
The latest release of bdutil is 1.3.5
. This bdutil release installs the following versions of open source components:
- Apache Hadoop - 1.2.1 (2.7.1 if you use the
-e
argument) - Apache Spark - 1.5.0
- Apache Pig - 0.12
- Apache Hive - 1.2.1
Documentation
The following documentation is useful for bdutil.
- Quickstart - A guide on how to get started with bdutil quickly.
- Jobs - How to submit jobs (work) to a bdutil cluster.
- Monitoring - How to monitor bdutil cluster.
- Shutdown - How shutdown a bdutil cluster.