• Stars
    star
    242
  • Rank 165,711 (Top 4 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed P2P Data-driven Workflow Framework

Emissary Dark Knight - some code just wants to watch the core burn

License Java CI with Maven CodeQL

Table of Contents

Introduction

Emissary is a P2P based data-driven workflow engine that runs in a heterogeneous possibly widely dispersed, multi-tiered P2P network of compute resources. Workflow itineraries are not pre-planned as in conventional workflow engines, but are discovered as more information is discovered about the data. There is typically no user interaction in an Emissary workflow, rather the data is processed in a goal oriented fashion until it reaches a completion state.

Emissary is highly configurable, but in this base implementation does almost nothing. Users of this framework are expected to provide classes that extend emissary.place.ServiceProviderPlace to perform work on emissary.core.IBaseDataObject payloads.

A variety of things can be done and the workflow is managed in stages, e.g. STUDY, ID, COORDINATE, TRANSFORM, ANALYZE, IO, REVIEW.

The classes responsible for directing the workflow are the emissary.core.MobileAgent and classes derived from it, which manage the path of a set of related payload objects through the workflow and the emissary.directory.DirectoryPlace which manages the available services, their cost and quality and keep the P2P network connected.

Minimum Requirements

Getting Started

Read through the DEVELOPING.md guide for information on installing required components, pulling the source code, building and running Emissary.

Building

Run mvn clean package to compile, test, and package Emissary

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  9.132 s
[INFO] Finished at: 2022-01-10T22:31:05Z
[INFO] ------------------------------------------------------------------------

Running

There is one bash script in Emissary that runs everything. It is in the top level Emissary directory. The script runs the emissary.Emissary class which has several Picocli commands available to handle different functions.

No arguments

If the emissary script is run without any arguments, you will get a listing of all the configuration subcommands and a brief description.

./emissary

Help

Running ./emissary help will give you the same output as running with no arguments. If you want to see more detailed information on a command, add the command name after help. For example, see all the arguments with descriptions for the what command, run:

./emissary help what

Common parameters

The rest of commands all have (-b or --projectBase) arguments that can be set, but it must match PROJECT_BASE.

The config directory is defaulted to /config but can also be passed in with (-c or --config). When running from the git checkout, you should use target as the projectBase. Feel free to modify config files in target/config before you start.

Logging is handled by logback. You can point to a custom file with the --logbackConfig argument.

See the help -c for each command to get more info.

What

This command will use the configured engines to identify the file. Emissary currently only comes with the SizeIdPlace, so the id will be TINY or SMALL etc. See that class for more info. The -i or --input argument is required as well as -b. Here is how to run the command

./emissary what -i <path to some file>

Server (Standalone)

This command will start up an Emissary server and initialize all the places, a pickup place, and drop off filters that are configured. It will start in standalone mode if -m or --mode is not specified. By default, the number of MobileAgents is calculated based on the specs of the machine. On modern computers, this can be high. You can control the number of agents with -a or --agents. Here is an example run.

./emissary server -a 2

Without further configuration, it will start on http://localhost:8001. If you browse to that url, you will need to enter the username and password defined in target/config/jetty-users.properties, which is emissary and emissary123.

The default PickUpPlace is configured to read files from target/data/InputData. If you copy files into that directory, you will see Emissary process them. Keep in mind, only toUpper and toLower are configured, so the output will not be too interesting.

Agents (Standalone)

The agents command shows the number of MobileAgents for the configured host and what those agents are doing. By default, the port is 9001, but you can use -p or --port to change that.
Assuming you are running on 8001 from the server command above, try:

./emissary agents -p 8001

Pool (Standalone)

Pool is a collapsed view of agents for a node. It, too, defaults to port 9001. To run for the standalone server started above run

./emissary pool -p 8001

This command is more useful for a cluster as it a more digestible view of every node.

Env

The Env Command requires a server to be running. It will ask the server for some configuration values, like PROJECT_BASE and BIN_DIR. With no arguments, it will dump an unformatted json response.

./emissary env

But you can also dump a response suitable for sourcing in bash.

./emissary env --bashable

Starting the Emissary server actually calls this endpoint and dumps out $PROJECT_BASE}/env.sh with the configured variables. This is done so that shell scripts can source $PROJECT_BASE}/env.sh and then have those variable available without having to worry with configuring them elsewhere.

Config

The config command allows you to see the effective configuration for a specified place/service/class. Since Emissary uses flavors, this command will show the resulting configuration of a class after all flavors have been applied. This command can be used to connect to a running Emissary node by specifying the -h for host (default is localhost) and -p for the port (default is 8001). To connect to a locally running Emissary on port 8001, any of the following commands will work:

./emissary config --place emissary.place.sample.ToLowerPlace
./emissary config --place emissary.place.sample.ToLowerPlace -h localhost -p 8001

Optionally, you can specify offline mode using --offline to use the configuration files specified in your local CONFIG_DIR:

./emissary config --place emissary.place.sample.ToLowerPlace --offline

In offline mode, you can provide flavors to see the differences in configurations:

./emissary config --place emissary.place.sample.ToLowerPlace --offline --flavor STANDALONE,TESTING

These are useful to see the effective configuration, but we can also run in a verbose mode to see all the configuration files along with the final output. This is controlled with the --detailed flag:

./emissary config --place emissary.place.sample.ToLowerPlace --detailed

or in offline mode:

./emissary config --place emissary.place.sample.ToLowerPlace --offline --detailed

Run

The Run command is a simple command to execute the main method of the given class. For example

./emissary run emissary.config.ConfigUtil  <path_to_some_cfg_file>

If you need to pass flags to the main method, use -- to stop parsing flags and simply pass them along.

./emissary run emissary.config.ExtractResource -- -o outputdir somefile

Server (Cluster)

Emissary is fun in standalone, but running cluster is more appropriate for real work. The way to run clustered is similar to the standalone, but you need to -m cluster to tell the node to connect to other nodes. In clustered mode Emissary will also start up the PickUpClient instead of the PickUpPlace, so you will need to start a feeder.

Look at the target/config/peers.cfg to see the rendezvous peers. In this case, there are 3. Nodes running on port 8001 and 9001 are just Emissary nodes. The node running on 7001 is the feeder. So let's start up 8001 and 9001 in two different terminals.

./emissary server -a 2 -m cluster
./emissary server -a 2 -m cluster -p 9001

Because these nodes all know about ports 8001, 9001 and 7001, you will see errors in the logs as they continue to try to connect.

Note, in real world deployments we don't run multiple Emissary processes on the same node. You can configure the hostname with -h.

Feed (Cluster)

With nodes started on port 8001 and 9001, we need to start the feeder. The feed command uses port 7001 by default, but we need to set up a directory that the feeder will read from. Files dropped into that directory will be available for worker nodes to take and the work should be distributed amongst the cluster. Start up the feed with

mkdir ~/Desktop/feed1
./emissary feed -i ~/Desktop/feed1/

You should be able to hit http://localhost:8001, http://localhost:9001 and http://localhost:7001 in the browser and look at the configured places. Drop some files in the ~/Desktop/feed1 and see the 2 nodes process them. It may take a minute for them to start processing

Agents (Cluster)

Agents in clustered mode again shows details about the mobileAgents. It starts at with the node you configure (localhost:9001 by default), then calls out to all nodes it knows about and gets the same information. Run it with:

./emissary agents --cluster

Pool (Cluster)

Pool in clustered mode also does the same as pool in standalone. It starts at the node (locahost:9001) by default then goes to all the nodes it knows about and aggregates a collapsed view of the cluster. Run it with

./emissary pool --cluster

Topology (Clustered)

The topology talks to the configured node (localhost:8001 by default) and talks to every node it knows about. The response is what all those nodes know about, so you can build up a network topology of your cluster. Run it with

./emissary topology

Running server with SSL

The keystore and keystore password are in the emissary.client.EmissaryClient-SSL.cfg file. Included and configured by default is a sample keystore you can use for testing this functionality. We do not recommend using the sample keystore in production environments. To use your own keystore, change configuration values in the emissary.client.EmissaryClient-SSL.cfg file.

Standalone

./emissary server -p 8443 --ssl

Clustered

./emissary server -p 8443 --ssl --mode cluster
./emissary server -p 9443 --ssl --mode cluster
mkdir ~/Desktop/feed1
./emissary feed -p 7443 --ssl -i ~/Desktop/feed1/

Contact Us

General Questions

If you have any questions or concerns about this project, you can contact us at: [email protected]

Security Questions

For security questions and vulnerability reporting, please refer to SECURITY.md

More Repositories

1

ghidra

Ghidra is a software reverse engineering (SRE) framework
Java
50,294
star
2

SIMP

A system automation and configuration management stack targeted toward operational flexibility and policy compliance.
Ruby
1,335
star
3

lemongraph

Log-based transactional graph engine
Python
1,133
star
4

datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Java
556
star
5

enigma-simulator

An educational demonstration of breaking the Enigma machine
Jupyter Notebook
451
star
6

skills-service

SkillTree is a micro-learning gamification platform supporting the rapid integration of a gamified tool training approach into new and existing applications.
Groovy
385
star
7

timely

Accumulo backed time series database
CSS
379
star
8

DCP

Digest, stat, and copy files from one location to another in the same read pass
C
331
star
9

lemongrenade

Data-driven automation platform
Java
302
star
10

qgis-latlontools-plugin

QGIS tools to capture and zoom to coordinates using decimal, DMS, WKT, GeoJSON, MGRS, UTM, UPS, GEOREF, ECEF, H3, and Plus Codes notation. Provides external map support, MGRS & Plus Codes conversion and point digitizing tools.
Python
298
star
11

nationalsecurityagency.github.io

Site for NSA's Open Source project listing
HTML
256
star
12

ghidra-data

Supporting Data Archives for Ghidra
167
star
13

qgis-shapetools-plugin

Shape Tools creates geodesic shapes and includes a number of geodesic tools for QGIS including the XY to Line tool, geodesic densify tool, geodesic line break, geodesic measure tool, geodesic measurement layer, geodesic scale, rotate and translate tool, and digitize points at an azimuth & distance tools.
Python
154
star
14

fractalrabbit

Simulate realistic trajectory data seen through sporadic reporting
Java
141
star
15

qgis-d3datavis-plugin

QGIS D3 Date and Time Heatmap
Python
131
star
16

MADCert

Create root and intermediate Certificate Authorities, issue user and server certificates, etc. for testing purposes.
JavaScript
98
star
17

kmyth

C
86
star
18

skills-client

SkillTree client libraries facilitating the rapid integration of a gamified tool training approach in conjunction with skills-service. Provides out of the box support for Angular, React, Vue.js, and native Javascript.
JavaScript
80
star
19

qgis-searchlayers-plugin

Enhanced textual vector layer searching in QGIS.
Python
75
star
20

qgis-kmltools-plugin

Fast KML Import and Export Plugin for QGIS
Python
67
star
21

qonduit

A WebSocket library for use with Apache Accumulo
Java
59
star
22

skills-docs

SkillTree documentation, covering client integration, dashboard administration and deployment, and contribution guidelines.
JavaScript
42
star
23

qgis-bulk-nominatim

Provides bulk nominatim geocoding for QGIS
Python
37
star
24

qgis-earthsunmoon-plugin

QGIS plugin to show the location of the sun, moon, and planets at their zenith for a particular date and time.
QML
30
star
25

skills-client-examples

SkillTree skills-client-examples
Java
28
star
26

pelz

C
28
star
27

accumulo-python3

Build Python 3 applications that integrate with Apache Accumulo
Python
28
star
28

XORSATFilter

A library for building efficient set-membership filters and dictionaries based on the Satisfiability problem.
C
28
star
29

datawave-muchos

This project leverages Ansible to automate DataWave deployments on your cluster
Shell
25
star
30

maat

Maat is a centralized software integrity measurement and attestation (M&A) service
C
25
star
31

call-stack-profiler

SkillTree
Groovy
23
star
32

qgis-datetimetools-plugin

QGIS conversion tools to display the local date, time, time zone, convert between UNIX time (Epoch), Julian dates, ISO8601, calculate the difference between two dates, select a location and time zone by clicking on the map and display the closet location and sun statistics.
Python
21
star
33

qgis-densityanalysis-plugin

QGIS plugin that automates the creation of density heatmaps with a heatmap explorer to examine the areas of greatest concentrations. It includes H3, geohash, and polygon density map algorithms along with several styling algorithms.
Python
21
star
34

skills-stress-test

SkillTree
Groovy
19
star
35

qgis-lockzoom-plugin

QGIS Lock Zoom to Tile Scale
Python
17
star
36

qgis-mgrs-plugin

QGIS Tools to capture and zoom to MGRS coordinates.
Python
17
star
37

datawave-dictionary-service

The Dictionary service provides access to the data dictionary and edge dictionary. These services provide metadata about fields that are stored in Accumulo.
Java
14
star
38

ghidra-extensions

Python
13
star
39

datawave-spring-boot-starter

Java
12
star
40

datawave-metadata-utils

Java
11
star
41

datawave-microservices-root

Shell
10
star
42

datawave-in-memory-accumulo

Java
6
star
43

datawave-ingest-services

Java
5
star
44

datawave-spring-boot-starter-audit

Java
5
star
45

datawave-authorization-service

Java
5
star
46

datawave-query-metric-service

Java
4
star
47

datawave-config-service

Java
4
star
48

datawave-accumulo-utils

Java
4
star
49

datawave-base-rest-responses

Java
4
star
50

datawave-spring-boot-starter-cache

Java
4
star
51

datawave-audit-service

Java
4
star
52

qgis-h3library-plugin

QGIS plugin that installs the H3 library without having to 'pip install h3'.
C
3
star
53

datawave-utils

Java
3
star
54

datawave-spring-boot-starter-query-metric

Java
3
star
55

datawave-type-utils

Java
3
star
56

datawave-parent

2
star
57

datawave-hazelcast-service

Java
2
star
58

datawave-accumulo-service

Java
2
star
59

rank-based-linkage

Java
2
star
60

datawave-common-utils

Java
2
star
61

datawave-metrics-reporter

Java
2
star
62

ghidra-volatility

Python
2
star
63

ghidra-frida

Python
2
star
64

datawave-spring-boot-starter-query

Java
2
star
65

datawave-spring-boot-starter-metadata

Java
2
star
66

datawave-query-service

Java
2
star
67

datawave-service-parent

2
star
68

datawave-helm-charts

Mustache
2
star
69

datawave-stack-docker-images

Shell
2
star
70

datawave-mapreduce-query-service

Java
1
star
71

datawave-query-executor-service

Java
1
star
72

datawave-modification-service

Java
1
star
73

datawave-query-storage-service

1
star
74

datawave-spring-boot-starter-cached-results

Java
1
star