• Stars
    star
    151
  • Rank 237,649 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

HDFS Shell UI (CLI tool)

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Image of HDFS-Shell

Build Status - Master Linux Windows Apache 2

Purpose

There are 3 possible usecases:

  • Running user interactive UI shell, inserting command by user
  • Launching Shell with specific HDFS command
  • Running in daemon mode - communication using UNIX domain sockets

Why such UI shell?

Advantages UI against direct calling hdfs dfs function:

  • HDFS DFS initiates JVM for each command call, HDFS Shell does it only once - which means great speed enhancement when you need to work with HDFS more often
  • Commands can be used in short way - eg. hdfs dfs -ls /, ls / - both will work
  • HDFS path completion using TAB key
  • you can easily add any other HDFS manipulation function
  • there is a command history persisting in history log (~/.hdfs-shell/hdfs-shell.log)
  • support for relative directory + commands cd and pwd
  • advanced commands like su, groups, whoami
  • customizable shell prompt

Disadvantages UI against direct calling hdfs dfs function:

  • commands cannot be piped, eg: calling ls /analytics | less is not possible at this time, you have to use HDFS Shell in Daemon mode

Using HDFS Shell UI

Launching HDFS Shell UI

Requirements:

  • JDK 1.8
  • It's working on both Windows/Linux Hadoop 2.6.0

Download

Configuring launch script(s) for your environment

HDFS-Shell is a standard Java application. For its launch you need to define 2 things on your classpath:

  1. All ./lib/*.jar on classpath (the dependencies ./lib are included in the binary bundle or they are located in Gradle build/distributions/*.zip)
  2. Path to directory with your Hadoop Cluster config files (hdfs-site.xml, core-site.xml etc.) - without these files the HDFS Shell will work in local filesystem mode
  • on Linux it's usually located in /etc/hadoop/conf folder
  • on Windows it's usually located in %HADOOP_HOME%\etc\hadoop\ folder

Note that paths inside java -cp switch are separated by : on Linux and ; on Windows.

Pre-defined launch scripts are located in the zip file. You can modify it locally as needed.

  • for CLI UI run hdfs-shell.sh (without parameters) otherwise:
  • HDFS Shell can be launched directly with the command to execute - after completion, hdfs-shell will exit
  • launch HDFS with hdfs-shell.sh script <file_path> to execute commands from file
  • launch HDFS with hdfs-shell.sh xscript <file_path> to execute commands from file but ignore command errors (skip errors)

Possible commands inside shell

  • type help to get list of all supported commands
  • clear or cls to clear screen
  • exit or quit or just q to exit the shell
  • for calling system command type ! <command> , eg. ! echo hello will call the system command echo
  • type (hdfs) command only without any parameters to get its parameter description, eg. ls only
  • script <file_path> to execute commands from file
  • xscript <file_path> to execute commands from file but ignore command errors (skip errors)
Additional commands

For our purposes we also integrated following commands:

  • set showResultCodeON and set showResultCodeOFF - if it's enabled, it will write command result code after its completion
  • cd, pwd
  • su <username> - experimental - changes current active user - it won't probably work on secured HDFS (KERBEROS)
  • whoami - prints effective username
  • groups <username1 <username2,...>> - eg.groups hdfs prints groups for given users, same as hdfs groups my_user my_user2 functionality
  • edit 'my file' - see the config below
Edit Command

Since the version 1.0.4 the simple command 'edit' is available. The command gets selected file from HDFS to the local temporary directory and launches the editor. Once the editor saves the file (with a result code 0), the file is uploaded back into HDFS (target file is overwritten). By default the editor path is taken from $EDITOR environment variable. If $EDITOR is not set, vim (Linux, Mac) or notepad.exe (Windows) is used.

How to change command (shell) prompt

HDFS Shell supports customized bash-like prompt setting! I implemented support for these switches listed in this table (include colors!, exclude \!, \#). You can also use this online prompt generator to create prompt value of your wish. To setup your favorite prompt simply add export HDFS_SHELL_PROMPT="value" to your .bashrc (or set env variable on Windows) and that's it. Restart HDFS Shell to apply change. Default value is currently set to \e[36m\u@\h \e[0;39m\e[33m\w\e[0;39m\e[36m\\$ \e[37;0;39m.

Running Daemon mode

Image of HDFS-Shell

  • run hdfs-shell-daemon.sh
  • then communicate with this daemon using UNIX domain sockets - eg. echo ls / | nc -U /var/tmp/hdfs-shell.sock

Project programming info

The project is using Gradle 3.x to build. By default it's using Hadoop 2.6.0, but it also has been succesfully tested with version 2.7.x. It's based on Spring Shell (includes JLine component). Using Spring Shell mechanism you can easily add your own commands into HDFS Shell. (see com.avast.server.hdfsshell.commands.ContextCommands or com.avast.server.hdfsshell.commands.HadoopDfsCommands for more details)

All suggestions and merge requests are welcome.

Other tech info:

For developing, add to JVM args in your IDE launch config dialog: -Djline.WindowsTerminal.directConsole=false -Djline.terminal=jline.UnsupportedTerminal

Known limitations & problems

  • There is a problem with a parsing of commands containing a file or directory including a space - eg. it's not possible to create directory My dir using command mkdir "My dir" . This should be probably resolved with an upgrade to Spring Shell 2.
  • It's not possible to remove root directory (rm -R dir) from root (/) directory. You have to use absolut path instead (rm -R /dir). It's caused by bug in Hadoop. See HADOOP-15233 for more details. Removing directory from another cwd is not affected.

Contact

Author&Maintainer: Ladislav Vitasek - vitasek/@/avast.com

Help Us

  • If you like using HDFS Shell, please spread the word - eg. write a blog post about it.
  • Do you like using it? Tell us!

Companies using HDFS Shell (we know about)

  • Avast
  • Komercni banka
  • Ataccama Software

More Repositories

1

retdec

RetDec is a retargetable machine-code decompiler based on LLVM.
C++
7,718
star
2

android-butterknife-zelezny

Android Studio plug-in for generating ButterKnife injections from selected layout XML.
Java
3,385
star
3

retry-go

Simple golang library for retry mechanism
Go
2,170
star
4

android-styled-dialogs

Backport of Material dialogs with easy-to-use API based on DialogFragment
Java
2,153
star
5

retdec-idaplugin

RetDec plugin for IDA
C++
736
star
6

gradle-docker-compose-plugin

Simplifies usage of Docker Compose for integration testing in Gradle environment.
Groovy
402
star
7

pytest-docker

Docker-based integration tests
Python
386
star
8

ioc

Threat Intel IoCs + bits and pieces of dark matter
C
338
star
9

scala-server-toolkit

Functional programming toolkit for building server applications in Scala.
Scala
194
star
10

yaramod

Parsing of YARA rules into AST and building new rulesets in C++.
C++
113
star
11

apkparser

APK manifest & resources parsing in Golang.
Go
109
star
12

topee

Google Chrome Extension API for Safari
JavaScript
103
star
13

yari

YARI is an interactive debugger for YARA Language.
Rust
84
star
14

apkverifier

APK Signature verification in Go. Supports scheme v1, v2 and v3 and passes Google apksig's testing suite.
Go
76
star
15

gradle-dependencies-viewer

A simple web UI to analyze dependencies for your project based on the text data generated from "gradle dependencies" command.
JavaScript
76
star
16

yls

YARA Language Server
Python
63
star
17

yarang

Alternative YARA scanning engine
C++
62
star
18

pelib

PE file manipulation library.
C++
61
star
19

datadog4s

Making great monitoring easy in functional Scala
Scala
60
star
20

pe_tools

A cross-platform Python toolkit for parsing/writing PE files.
Python
60
star
21

k8s-admission-webhook

A general-purpose Kubernetes admission webhook to aid with enforcing best practices within your cluster.
Go
54
star
22

yaracpp

C++ wrapper for YARA.
C++
45
star
23

grpc-java-jwt

JWT based authentication for gRPC-Java.
Java
44
star
24

hexrays-demo

IDA SDK tech demo
C++
34
star
25

rabbitmq-scala-client

Scala wrapper over standard RabbitMQ Java client library
Scala
32
star
26

marathon-vault-plugin

Marathon plugin which injects Vault secrets via environment variables
Scala
30
star
27

android-lectures

Class material for lectures about Android development
Kotlin
24
star
28

retdec-regression-tests-framework

A framework for writing and running regression tests for RetDec and related tools.
Python
23
star
29

capstone-dumper

Utility for dumping all the information Capstone has on given instructions.
C++
23
star
30

libdwarf

Library to provide access to DWARF debugging information.
C
22
star
31

PurpleDome

Simulation environment for attacks on computer networks
Python
20
star
32

avast-ctu-cape-dataset

Jupyter Notebook
19
star
33

llvm

An LLVM clone modified for use in RetDec and associated tools.
LLVM
18
star
34

wanna-ml

Complete MLOps framework for Vertex-AI
Python
17
star
35

authenticode-parser

Authenticode-parser is a simple C library for Authenticode format parsing using OpenSSL.
C
15
star
36

grpc-json-bridge

Library for exposing gRPC endpoints via HTTP (JSON) API
Scala
15
star
37

elfio

Library for reading and generating ELF files.
C++
14
star
38

vuei18n-po

transform gettext .po files for vue-i18n
JavaScript
14
star
39

ep-stats

Statistics for Experimentation Platform
Python
13
star
40

retdec-regression-tests

A collection of regression tests for RetDec and associated tools.
Python
11
star
41

cactus

Library for easy conversion between GPB and Scala case classes.
Scala
9
star
42

safariextz

Safari extension packer for node.js
JavaScript
9
star
43

bytes

Library providing universal interface for having an immutable representation of sequence of bytes.
Java
8
star
44

hermes

SMTP honeypot built on top of the Salmon mail server
Python
8
star
45

kafka-tests

Integration test of Apache Kafka 0.9.0+ and Java clients.
Java
8
star
46

ctf-aca-brno-2020

Tasks from Avast Cyber Adventure 2020 Brno
Objective-C
6
star
47

Stor

HTTP API for SHA256 objects
Perl
5
star
48

clockwork

An adoption of the map-reduce paradigm based on the concept of coroutines to the world of stream data processing.
Java
5
star
49

covid-19-ioc

HTML
5
star
50

tlshc

TLSH library in C
C
5
star
51

decryptor-keys

Decryption keys for our ransomware decryptors
5
star
52

bytecompressor

Java and Scala abstractions for some compression algorithms.
Java
5
star
53

slog4s

Structured and contextual logging for Scala
Scala
5
star
54

retdec-support

Support packages for the RetDec decompiler.
5
star
55

hackcambridge-ccleaner-app

A custom build of CCleaner that enables the integration of Avast Secure Browser
Visual Basic
5
star
56

hackcambridge-ccleaner-extension

A stub for the CCleaner extension for Avast Secure Browser
JavaScript
5
star
57

metrics

Java/Scala library defining API for metrics publishing
Java
4
star
58

asio-mutex

Awaitable Mutex compatible with Boost.Asio
C++
4
star
59

machine-learning-python

Machine learning in Python Workshop
Jupyter Notebook
4
star
60

scala-hashes

Case-classes representing MD5, SHA1 and SHA256.
Scala
4
star
61

syringe

Syringe - Dependency Injection and Configuration Library from AVAST Software
Java
4
star
62

mongodb-oplog-stats

A tool for obtaining statistics about a MongoDB replica-set oplog
Rust
4
star
63

syringe-maven-plugin

Supporting Maven plugin for Syringe
Java
3
star
64

cargo-depdiff

Inspecting what changed around dependencies between versions
Rust
3
star
65

webtrails

Svelte
3
star
66

labmanager-unit-vsphere

REST service for vmWare vSphere virtual machine control
Python
3
star
67

BigMap

Scala Map that uses binary search in memory mapped sorted file. It makes possible usage of data sets bigger than available memory as a Map.
Scala
3
star
68

management-console-config

Sample configuration for Avast Business management console
2
star
69

boost-python-examples

Examples that show capabilities of Boost Python
C++
2
star
70

ndisdump

A no-dependencies network packet capture tool for Windows
C++
2
star
71

docker-centos_perl_cpanm

2
star
72

adblock

JavaScript
2
star
73

stor-client

Go
2
star
74

retdec-build-system-tests

Tests of RetDec build system. This can also serve as RetDec component usage examples.
C++
2
star
75

eslint-plugin-apklab-frida

ESLint plugin & config for the Frida scripts used in the apklab.io platform.
JavaScript
2
star
76

VSArchConv

Converts .sln/.vcxproj to support different architecture
C++
2
star
77

hackcambridge-challenge

Integrate the Avast Secure Browser (ASB) and CCleaner products to improve user privacy, prevent website tracking, and reduce the user’s online footprint.
2
star
78

stepdance

Functional iterators for easy and elegant parsing, scanning, iterating etc. Written Scala.
Scala
1
star
79

docker-flume-hdfs

Shell
1
star
80

storage-client

Scala
1
star
81

vsphere-instaclone

Really quickly clone machines to be used as TeamCity agents
Kotlin
1
star
82

jmx-publisher

Tool to get properties and methods published via JMX easily.
Java
1
star
83

browser-extension-messaging-sample

JavaScript
1
star
84

instaprofiles-sync

application is used to regularly synchronize defined cloud profiles for [TeamCity plugin vsphere-instaclone](https://github.com/avast/vsphere-instaclone)
Java
1
star
85

continuity

Library for passing context between threads in multi-threaded applications
Scala
1
star
86

firefox-xpi

Firefox extension packer for node.js
JavaScript
1
star
87

jasmine-class-mock

Create a mock class for the Jasmine framework
JavaScript
1
star
88

jfrog-verisign

JFrog plugin to verify deploying artifacts signatures. It supports both JAR and RPM (PGP) verification
Java
1
star
89

https-encryption

Avast HTTPS Encryption powered by HTTPSEverywhere
JavaScript
1
star
90

kluzo

Library for passing tracing ID between threads in multi-threaded applications
Scala
1
star
91

genrex

Generator of regular expressions
Python
1
star
92

fairy-tale

Toolbox for functional programming in Scala using Finally Tagless approach
Scala
1
star
93

ResolveTest

Simple dns resolve utility.
C++
1
star
94

gossip-bot

Find out what is happening within the company
Go
1
star