• Stars
    star
    150
  • Rank 247,323 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

HDFS Shell UI (CLI tool)

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Image of HDFS-Shell

Build Status - Master Linux Windows Apache 2

Purpose

There are 3 possible usecases:

  • Running user interactive UI shell, inserting command by user
  • Launching Shell with specific HDFS command
  • Running in daemon mode - communication using UNIX domain sockets

Why such UI shell?

Advantages UI against direct calling hdfs dfs function:

  • HDFS DFS initiates JVM for each command call, HDFS Shell does it only once - which means great speed enhancement when you need to work with HDFS more often
  • Commands can be used in short way - eg. hdfs dfs -ls /, ls / - both will work
  • HDFS path completion using TAB key
  • you can easily add any other HDFS manipulation function
  • there is a command history persisting in history log (~/.hdfs-shell/hdfs-shell.log)
  • support for relative directory + commands cd and pwd
  • advanced commands like su, groups, whoami
  • customizable shell prompt

Disadvantages UI against direct calling hdfs dfs function:

  • commands cannot be piped, eg: calling ls /analytics | less is not possible at this time, you have to use HDFS Shell in Daemon mode

Using HDFS Shell UI

Launching HDFS Shell UI

Requirements:

  • JDK 1.8
  • It's working on both Windows/Linux Hadoop 2.6.0

Download

Configuring launch script(s) for your environment

HDFS-Shell is a standard Java application. For its launch you need to define 2 things on your classpath:

  1. All ./lib/*.jar on classpath (the dependencies ./lib are included in the binary bundle or they are located in Gradle build/distributions/*.zip)
  2. Path to directory with your Hadoop Cluster config files (hdfs-site.xml, core-site.xml etc.) - without these files the HDFS Shell will work in local filesystem mode
  • on Linux it's usually located in /etc/hadoop/conf folder
  • on Windows it's usually located in %HADOOP_HOME%\etc\hadoop\ folder

Note that paths inside java -cp switch are separated by : on Linux and ; on Windows.

Pre-defined launch scripts are located in the zip file. You can modify it locally as needed.

  • for CLI UI run hdfs-shell.sh (without parameters) otherwise:
  • HDFS Shell can be launched directly with the command to execute - after completion, hdfs-shell will exit
  • launch HDFS with hdfs-shell.sh script <file_path> to execute commands from file
  • launch HDFS with hdfs-shell.sh xscript <file_path> to execute commands from file but ignore command errors (skip errors)

Possible commands inside shell

  • type help to get list of all supported commands
  • clear or cls to clear screen
  • exit or quit or just q to exit the shell
  • for calling system command type ! <command> , eg. ! echo hello will call the system command echo
  • type (hdfs) command only without any parameters to get its parameter description, eg. ls only
  • script <file_path> to execute commands from file
  • xscript <file_path> to execute commands from file but ignore command errors (skip errors)
Additional commands

For our purposes we also integrated following commands:

  • set showResultCodeON and set showResultCodeOFF - if it's enabled, it will write command result code after its completion
  • cd, pwd
  • su <username> - experimental - changes current active user - it won't probably work on secured HDFS (KERBEROS)
  • whoami - prints effective username
  • groups <username1 <username2,...>> - eg.groups hdfs prints groups for given users, same as hdfs groups my_user my_user2 functionality
  • edit 'my file' - see the config below
Edit Command

Since the version 1.0.4 the simple command 'edit' is available. The command gets selected file from HDFS to the local temporary directory and launches the editor. Once the editor saves the file (with a result code 0), the file is uploaded back into HDFS (target file is overwritten). By default the editor path is taken from $EDITOR environment variable. If $EDITOR is not set, vim (Linux, Mac) or notepad.exe (Windows) is used.

How to change command (shell) prompt

HDFS Shell supports customized bash-like prompt setting! I implemented support for these switches listed in this table (include colors!, exclude \!, \#). You can also use this online prompt generator to create prompt value of your wish. To setup your favorite prompt simply add export HDFS_SHELL_PROMPT="value" to your .bashrc (or set env variable on Windows) and that's it. Restart HDFS Shell to apply change. Default value is currently set to \e[36m\u@\h \e[0;39m\e[33m\w\e[0;39m\e[36m\\$ \e[37;0;39m.

Running Daemon mode

Image of HDFS-Shell

  • run hdfs-shell-daemon.sh
  • then communicate with this daemon using UNIX domain sockets - eg. echo ls / | nc -U /var/tmp/hdfs-shell.sock

Project programming info

The project is using Gradle 3.x to build. By default it's using Hadoop 2.6.0, but it also has been succesfully tested with version 2.7.x. It's based on Spring Shell (includes JLine component). Using Spring Shell mechanism you can easily add your own commands into HDFS Shell. (see com.avast.server.hdfsshell.commands.ContextCommands or com.avast.server.hdfsshell.commands.HadoopDfsCommands for more details)

All suggestions and merge requests are welcome.

Other tech info:

For developing, add to JVM args in your IDE launch config dialog: -Djline.WindowsTerminal.directConsole=false -Djline.terminal=jline.UnsupportedTerminal

Known limitations & problems

  • There is a problem with a parsing of commands containing a file or directory including a space - eg. it's not possible to create directory My dir using command mkdir "My dir" . This should be probably resolved with an upgrade to Spring Shell 2.
  • It's not possible to remove root directory (rm -R dir) from root (/) directory. You have to use absolut path instead (rm -R /dir). It's caused by bug in Hadoop. See HADOOP-15233 for more details. Removing directory from another cwd is not affected.

Contact

Author&Maintainer: Ladislav Vitasek - vitasek/@/avast.com

Help Us

  • If you like using HDFS Shell, please spread the word - eg. write a blog post about it.
  • Do you like using it? Tell us!

Companies using HDFS Shell (we know about)

  • Avast
  • Komercni banka
  • Ataccama Software

More Repositories

1

retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
C++
7,957
star
2

android-butterknife-zelezny

Android Studio plug-in for generating ButterKnife injections from selected layout XML.
Java
3,383
star
3

retry-go

Simple golang library for retry mechanism
Go
2,380
star
4

android-styled-dialogs

Backport of Material dialogs with easy-to-use API based on DialogFragment
Java
2,147
star
5

retdec-idaplugin

RetDec plugin for IDA
C++
753
star
6

pytest-docker

Docker-based integration tests
Python
426
star
7

gradle-docker-compose-plugin

Simplifies usage of Docker Compose for integration testing in Gradle environment.
Groovy
412
star
8

ioc

Threat Intel IoCs + bits and pieces of dark matter
C
368
star
9

scala-server-toolkit

Functional programming toolkit for building server applications in Scala.
Scala
198
star
10

apkparser

APK manifest & resources parsing in Golang.
Go
123
star
11

yaramod

Parsing of YARA rules into AST and building new rulesets in C++.
C++
114
star
12

topee

Google Chrome Extension API for Safari
JavaScript
104
star
13

yari

YARI is an interactive debugger for YARA Language.
Rust
85
star
14

apkverifier

APK Signature verification in Go. Supports scheme v1, v2 and v3 and passes Google apksig's testing suite.
Go
81
star
15

gradle-dependencies-viewer

A simple web UI to analyze dependencies for your project based on the text data generated from "gradle dependencies" command.
JavaScript
78
star
16

yls

YARA Language Server
Python
67
star
17

yarang

Alternative YARA scanning engine
C++
66
star
18

pelib

PE file manipulation library.
C++
62
star
19

datadog4s

Making great monitoring easy in functional Scala
Scala
62
star
20

pe_tools

A cross-platform Python toolkit for parsing/writing PE files.
Python
62
star
21

k8s-admission-webhook

A general-purpose Kubernetes admission webhook to aid with enforcing best practices within your cluster.
Go
54
star
22

grpc-java-jwt

JWT based authentication for gRPC-Java.
Java
46
star
23

yaracpp

C++ wrapper for YARA.
C++
45
star
24

hexrays-demo

IDA SDK tech demo
C++
35
star
25

rabbitmq-scala-client

Scala wrapper over standard RabbitMQ Java client library
Scala
35
star
26

marathon-vault-plugin

Marathon plugin which injects Vault secrets via environment variables
Scala
30
star
27

android-lectures

Class material for lectures about Android development
Kotlin
27
star
28

avast-ctu-cape-dataset

Jupyter Notebook
25
star
29

retdec-regression-tests-framework

A framework for writing and running regression tests for RetDec and related tools.
Python
24
star
30

capstone-dumper

Utility for dumping all the information Capstone has on given instructions.
C++
24
star
31

libdwarf

Library to provide access to DWARF debugging information.
C
23
star
32

PurpleDome

Simulation environment for attacks on computer networks
Python
21
star
33

llvm

An LLVM clone modified for use in RetDec and associated tools.
LLVM
19
star
34

wanna-ml

Complete MLOps framework for Vertex-AI
Python
18
star
35

grpc-json-bridge

Library for exposing gRPC endpoints via HTTP (JSON) API
Scala
18
star
36

authenticode-parser

Authenticode-parser is a simple C library for Authenticode format parsing using OpenSSL.
C
16
star
37

ep-stats

Statistics for Experimentation Platform
Python
15
star
38

elfio

Library for reading and generating ELF files.
C++
14
star
39

vuei18n-po

transform gettext .po files for vue-i18n
JavaScript
14
star
40

retdec-regression-tests

A collection of regression tests for RetDec and associated tools.
Python
12
star
41

safariextz

Safari extension packer for node.js
JavaScript
9
star
42

cactus

Library for easy conversion between GPB and Scala case classes.
Scala
9
star
43

hermes

SMTP honeypot built on top of the Salmon mail server
Python
8
star
44

kafka-tests

Integration test of Apache Kafka 0.9.0+ and Java clients.
Java
8
star
45

bytes

Library providing universal interface for having an immutable representation of sequence of bytes.
Java
7
star
46

genrex

Generator of regular expressions
Python
7
star
47

ctf-aca-brno-2020

Tasks from Avast Cyber Adventure 2020 Brno
Objective-C
6
star
48

metrics

Java/Scala library defining API for metrics publishing
Java
6
star
49

slog4s

Structured and contextual logging for Scala
Scala
6
star
50

Stor

HTTP API for SHA256 objects
Perl
5
star
51

clockwork

An adoption of the map-reduce paradigm based on the concept of coroutines to the world of stream data processing.
Java
5
star
52

covid-19-ioc

HTML
5
star
53

asio-mutex

Awaitable Mutex compatible with Boost.Asio
C++
5
star
54

tlshc

TLSH library in C
C
5
star
55

decryptor-keys

Decryption keys for our ransomware decryptors
5
star
56

scala-hashes

Case-classes representing MD5, SHA1 and SHA256.
Scala
5
star
57

bytecompressor

Java and Scala abstractions for some compression algorithms.
Java
5
star
58

retdec-support

Support packages for the RetDec decompiler.
5
star
59

hackcambridge-ccleaner-app

A custom build of CCleaner that enables the integration of Avast Secure Browser
Visual Basic
5
star
60

mongodb-oplog-stats

A tool for obtaining statistics about a MongoDB replica-set oplog
Rust
5
star
61

hackcambridge-ccleaner-extension

A stub for the CCleaner extension for Avast Secure Browser
JavaScript
5
star
62

machine-learning-python

Machine learning in Python Workshop
Jupyter Notebook
4
star
63

syringe

Syringe - Dependency Injection and Configuration Library from AVAST Software
Java
4
star
64

labmanager-unit-vsphere

REST service for vmWare vSphere virtual machine control
Python
4
star
65

syringe-maven-plugin

Supporting Maven plugin for Syringe
Java
3
star
66

cargo-depdiff

Inspecting what changed around dependencies between versions
Rust
3
star
67

ndisdump

A no-dependencies network packet capture tool for Windows
C++
3
star
68

webtrails

Svelte
3
star
69

BigMap

Scala Map that uses binary search in memory mapped sorted file. It makes possible usage of data sets bigger than available memory as a Map.
Scala
3
star
70

management-console-config

Sample configuration for Avast Business management console
2
star
71

hackcambridge-challenge

Integrate the Avast Secure Browser (ASB) and CCleaner products to improve user privacy, prevent website tracking, and reduce the userโ€™s online footprint.
2
star
72

storage-client

Scala
2
star
73

boost-python-examples

Examples that show capabilities of Boost Python
C++
2
star
74

docker-centos_perl_cpanm

2
star
75

adblock

JavaScript
2
star
76

retdec-build-system-tests

Tests of RetDec build system. This can also serve as RetDec component usage examples.
C++
2
star
77

eslint-plugin-apklab-frida

ESLint plugin & config for the Frida scripts used in the apklab.io platform.
JavaScript
2
star
78

VSArchConv

Converts .sln/.vcxproj to support different architecture
C++
2
star
79

stor-client

Go
2
star
80

stepdance

Functional iterators for easy and elegant parsing, scanning, iterating etc. Written Scala.
Scala
1
star
81

docker-flume-hdfs

Shell
1
star
82

vsphere-instaclone

Really quickly clone machines to be used as TeamCity agents
Kotlin
1
star
83

jmx-publisher

Tool to get properties and methods published via JMX easily.
Java
1
star
84

browser-extension-messaging-sample

JavaScript
1
star
85

continuity

Library for passing context between threads in multi-threaded applications
Scala
1
star
86

instaprofiles-sync

application is used to regularly synchronize defined cloud profiles for [TeamCity plugin vsphere-instaclone](https://github.com/avast/vsphere-instaclone)
Java
1
star
87

jasmine-class-mock

Create a mock class for the Jasmine framework
JavaScript
1
star
88

jfrog-verisign

JFrog plugin to verify deploying artifacts signatures. It supports both JAR and RPM (PGP) verification
Java
1
star
89

https-encryption

Avast HTTPS Encryption powered by HTTPSEverywhere
JavaScript
1
star
90

kluzo

Library for passing tracing ID between threads in multi-threaded applications
Scala
1
star
91

firefox-xpi

Firefox extension packer for node.js
JavaScript
1
star
92

fairy-tale

Toolbox for functional programming in Scala using Finally Tagless approach
Scala
1
star
93

ResolveTest

Simple dns resolve utility.
C++
1
star
94

mdevcamp-workshop

Kotlin Multiplatform - Unleash the Power of Shared Code
Kotlin
1
star
95

gossip-bot

Find out what is happening within the company
Go
1
star