• Stars
    star
    119
  • Rank 287,709 (Top 6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Spark job for dependency links

Jaeger Spark dependencies

This is a Spark job that collects spans from storage, analyze links between services, and stores them for later presentation in the UI. Note that it is needed for the production deployment. all-in-one distribution does not need this job.

This job parses all traces on a given day, based on UTC. By default, it processes the current day, but other days can be explicitly specified.

This repository is based on zipkin-dependencies.

Quick-start

Spark job can be run as docker container and also as java executable:

Docker:

$ docker run --env STORAGE=cassandra --env CASSANDRA_CONTACT_POINTS=host1,host2 jaegertracing/spark-dependencies

Use --env JAVA_OPTS=-Djavax.net.ssl. to set trust store and other Java properties.

As jar file:

STORAGE=cassandra java -jar jaeger-spark-dependencies.jar

Usage

By default, this job parses all traces since midnight UTC. You can parse traces for a different day via an argument in YYYY-mm-dd format, like 2016-07-16 or specify the date via an env property.

# ex to run the job to process yesterday's traces on OS/X
$ STORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -uv-1d +%F`
# or on Linux
$ STORAGE=cassandra java -jar jaeger-spark-dependencies.jar `date -u -d '1 day ago' +%F`

Configuration

jaeger-spark-dependencies applies configuration parameters through environment variables.

The following variables are common to all storage layers:

* `SPARK_MASTER`: Spark master to submit the job to; Defaults to `local[*]`
* `DATE`: Date in YYYY-mm-dd format. Denotes a day for which dependency links will be created.

Cassandra

Cassandra is used when STORAGE=cassandra.

* `CASSANDRA_KEYSPACE`: The keyspace to use. Defaults to "jaeger_v1_dc1".
* `CASSANDRA_CONTACT_POINTS`: Comma separated list of hosts / ip addresses part of Cassandra cluster. Defaults to localhost
* `CASSANDRA_LOCAL_DC`: The local DC to connect to (other nodes will be ignored)
* `CASSANDRA_USERNAME` and `CASSANDRA_PASSWORD`: Cassandra authentication. Will throw an exception on startup if authentication fails
* `CASSANDRA_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false.
* `CASSANDRA_CLIENT_AUTH_ENABLED`: If set enables client authentication on SSL connections. Requires `javax.net.ssl.keyStore` and `javax.net.ssl.keyStorePassword`, defaults to false.

Example usage:

$ STORAGE=cassandra CASSANDRA_CONTACT_POINTS=localhost:9042 java -jar jaeger-spark-dependencies.jar

Elasticsearch

Elasticsearch is used when STORAGE=elasticsearch.

* `ES_NODES`: A comma separated list of elasticsearch hosts advertising http. Defaults to
              localhost. Add port section if not listening on port 9200. Only one of these hosts
              needs to be available to fetch the remaining nodes in the cluster. It is
              recommended to set this to all the master nodes of the cluster. Use url format for
              SSL. For example, "https://yourhost:8888"
* `ES_NODES_WAN_ONLY`: Set to true to only use the values set in ES_HOSTS, for example if your
                       elasticsearch cluster is in Docker. If you're using a cloudprovider
                       such as AWS Elasticsearch, set this to true. Defaults to false
* `ES_USERNAME` and `ES_PASSWORD`: Elasticsearch basic authentication. Use when X-Pack security
                                   (formerly Shield) is in place. By default no username or
                                   password is provided to elasticsearch.
* `ES_CLIENT_NODE_ONLY`: Set to true to disable elasticsearch cluster nodes.discovery and enable nodes.client.only.
                         If your elasticsearch cluster's data nodes only listen on loopback ip, set this to true.
                         Defaults to false
* `ES_INDEX_PREFIX`: index prefix of Jaeger indices. By default unset.
* `ES_INDEX_DATE_SEPARATOR`: index date separator of Jaeger indices. The default value is `-`. 
                             For example `.` will find index "jaeger-span-2020.11.25". 
* `ES_TIME_RANGE`: How far in the past the job should look to for spans, the maximum and default is `24h`.
                   Any value accepted by [date-math](https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math) can be used here, but the anchor is always `now`.
* `ES_USE_ALIASES`: Set to true to use index alias names to read from and write to.
                    Usually required when using rollover indices.

Example usage:

$ STORAGE=elasticsearch ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies.jar

Design

At a high-level, this job does the following:

  • read lots of spans from a time period
  • group them by traceId
  • construct a graph using parent-child relationships expressed in span references
  • for each edge (parent span, child span) output (parent service, child service, count)
  • write the results to the database (e.g. dependencies_v2 table in Cassandra)

Building locally

To build the job locally and run tests:

./mvnw clean install # if failed add SPARK_LOCAL_IP=127.0.0.1
STORAGE=elasticsearch ES_NODES=http://localhost:9200 java -jar jaeger-spark-dependencies/target/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar
docker build -t jaegertracing/spark-dependencies:latest .

In tests it's possible to specify version of Jaeger images by env variable JAEGER_VERSION or system property jaeger.version. By default tests are using latest images.

License

Apache 2.0 License.

More Repositories

1

jaeger

CNCF Jaeger, a Distributed Tracing Platform
Go
19,236
star
2

jaeger-client-go

🛑 This library is DEPRECATED!
Go
1,366
star
3

jaeger-ui

Web UI for Jaeger
JavaScript
1,039
star
4

jaeger-operator

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.
Go
980
star
5

jaeger-client-node

🛑 This library is DEPRECATED!
JavaScript
553
star
6

jaeger-client-java

🛑 This library is DEPRECATED!
Java
491
star
7

jaeger-kubernetes

Support for deploying Jaeger into Kubernetes
Java
446
star
8

jaeger-client-python

🛑 This library is DEPRECATED!
Python
408
star
9

jaeger-client-csharp

🛑 This library is DEPRECATED!
C#
304
star
10

helm-charts

Helm Charts for Jaeger backend
Mustache
248
star
11

jaeger-clickhouse

Jaeger ClickHouse storage plugin implementation
Go
227
star
12

jaeger-client-cpp

🛑 This library is DEPRECATED!
C++
137
star
13

docker-protobuf

An all-inclusive protoc Docker image for the Jaeger project
Dockerfile
85
star
14

jaeger-idl

A set of shared data model definitions used by Jaeger components.
Thrift
77
star
15

jaeger-analytics-flink

Big data analytics for Jaeger using Apache Flink
Java
67
star
16

jaeger-lib

A collection of shared infrastructure libraries used by different components of Jaeger.
Go
64
star
17

documentation

Documentation/website for the Jaeger Distributed Tracing project.
HTML
62
star
18

jaeger-openshift

Support for deploying Jaeger into OpenShift
Groovy
56
star
19

jaeger-analytics-java

Data analytics pipeline and models for tracing data
Java
43
star
20

jaeger-client-javascript

Note: this SDK is not implemented yet, see https://github.com/jaegertracing/jaeger-client-javascript/issues/1.
JavaScript
32
star
21

jaeger-performance

Home of the Jaeger Performance tests
Java
21
star
22

jaeger-opentelemetry-collector

Experimental: components needed to build Jaeger on top of OpenTelemetry Collector
Makefile
12
star
23

jaeger-otelcol

Jaeger's OpenTelemetry Collector distribution
Go
7
star
24

legacy-client-java

Legacy com.uber.jaeger java client
Java
5
star
25

jaeger-vscode

VSCode extension for Jaeger
TypeScript
5
star
26

security-audits

Jaeger security audits
3
star
27

vertx-create-span

Demo application used in e2e tests for the Jaeger Operator
Java
3
star
28

jaeger-opentelemetry-releases

Go
3
star
29

artwork

Jaeger Logo and Artwork
2
star