• Stars
    star
    174
  • Rank 214,083 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated 30 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Spark job that aggregates zipkin spans for use in the UI

Gitter chat Build Status Maven Central

zipkin-dependencies

Zipkin Dependencies collects spans from storage, analyzes links between services, and stores them for later presentation in the web UI (ex. http://localhost:8080/dependency).

This process is implemented as an Apache Spark job. This job parses all traces in the current day in UTC time. This means you should schedule it to run just prior to midnight UTC.

All Zipkin Storage Components are supported, including Cassandra, MySQL and Elasticsearch.

Versions

  • STORAGE_TYPE=cassandra3 : requires Cassandra 3.11.3+; tested against the latest patch of 3.11
  • STORAGE_TYPE=mysql : requires MySQL 5.6+; tested against MySQL 5.6
  • STORAGE_TYPE=elasticsearch : requires Elasticsearch 5+; tested against last minor release of 6.x and 7.x

Quick-start

Due to SPARK-26134, Zipkin Dependencies currently requires Java 1.8 or 9 to run.

The quickest way to get started is to fetch the latest released job as a self-contained jar. For example:

$ curl -sSL https://zipkin.io/quickstart.sh | bash -s io.zipkin.dependencies:zipkin-dependencies:LATEST zipkin-dependencies.jar
$ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar

You can also start Zipkin Dependencies via Docker.

$ docker run --env STORAGE_TYPE=cassandra3 --env CASSANDRA_CONTACT_POINTS=host1,host2 openzipkin/zipkin-dependencies

Usage

By default, this job parses all traces since midnight UTC. You can parse traces for a different day via an argument in YYYY-mm-dd format, like 2016-07-16.

# ex to run the job to process yesterday's traces on OS/X
$ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar `date -uv-1d +%F`
# or on Linux
$ STORAGE_TYPE=cassandra3 java -jar zipkin-dependencies.jar `date -u -d '1 day ago' +%F`

Environment Variables

zipkin-dependencies applies configuration parameters through environment variables.

The following variables are common to all storage layers:

* `SPARK_MASTER`: Spark master to submit the job to; Defaults to `local[*]`
* `ZIPKIN_LOG_LEVEL`: Log level for zipkin-related status; Defaults to INFO (use DEBUG for details)
* `SPARK_CONF`: Extend more spark configuration with value in properties format and separated with comma. Such as `spark.executor.heartbeatInterval=600000,spark.network.timeout=600000`

Cassandra

Cassandra is used when STORAGE_TYPE=cassandra or STORAGE_TYPE=cassandra3.

Here are the variables that apply

* `CASSANDRA_KEYSPACE`: The keyspace to use. Defaults to "zipkin".
* `CASSANDRA_CONTACT_POINTS`: Comma separated list of hosts / ip addresses part of Cassandra cluster. Defaults to localhost
* `CASSANDRA_LOCAL_DC`: The local DC to connect to (other nodes will be ignored)
* `CASSANDRA_USERNAME` and `CASSANDRA_PASSWORD`: Cassandra authentication. Will throw an exception on startup if authentication fails
* `CASSANDRA_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false.
* `STRICT_TRACE_ID`: When false, dependency linking only looks at 64 bits of a trace ID, defaults to true.

Example usage:

$ STORAGE_TYPE=cassandra3 CASSANDRA_USERNAME=user CASSANDRA_PASSWORD=pass java -jar zipkin-dependencies.jar

MySQL Storage

MySQL is used when STORAGE_TYPE=mysql. The schema is compatible with Zipkin's MySQL storage component.

* `MYSQL_DB`: The database to use. Defaults to "zipkin".
* `MYSQL_USER` and `MYSQL_PASS`: MySQL authentication, which defaults to empty string.
* `MYSQL_HOST`: Defaults to localhost
* `MYSQL_TCP_PORT`: Defaults to 3306
* `MYSQL_USE_SSL`: Requires `javax.net.ssl.trustStore` and `javax.net.ssl.trustStorePassword`, defaults to false.

Example usage:

$ STORAGE_TYPE=mysql MYSQL_USER=root java -jar zipkin-dependencies.jar

Elasticsearch Storage

Elasticsearch is used when STORAGE_TYPE=elasticsearch. The schema is compatible with Zipkin's Elasticsearch storage component.

* `ES_INDEX`: The index prefix to use when generating daily index names. Defaults to zipkin.
* `ES_DATE_SEPARATOR`: The separator used when generating dates in index.
                       Defaults to '-' so the queried index look like zipkin-yyyy-DD-mm
                       Could for example be changed to '.' to give zipkin-yyyy.MM.dd
* `ES_HOSTS`: A comma separated list of elasticsearch hosts advertising http. Defaults to
              localhost. Add port section if not listening on port 9200. Only one of these hosts
              needs to be available to fetch the remaining nodes in the cluster. It is
              recommended to set this to all the master nodes of the cluster. Use url format for
              SSL. For example, "https://yourhost:8888"
* `ES_NODES_WAN_ONLY`: Set to true to only use the values set in ES_HOSTS, for example if your
                       elasticsearch cluster is in Docker. Defaults to false
* `ES_USERNAME` and `ES_PASSWORD`: Elasticsearch basic authentication. Use when X-Pack security
                                   (formerly Shield) is in place. By default no username or
                                   password is provided to elasticsearch.

Example usage:

$ STORAGE_TYPE=elasticsearch ES_HOSTS=host1,host2 java -jar zipkin-dependencies.jar
# To override the http port, add it to the host string
$ STORAGE_TYPE=elasticsearch ES_HOSTS=host1:9201 java -jar zipkin-dependencies.jar

Custom certificates

When using an https endpoint in ES_HOSTS, you can use the following standard properties to customize the certificates used for the connection:

  • javax.net.ssl.keyStore
  • javax.net.ssl.keyStorePassword
  • javax.net.ssl.trustStore
  • javax.net.ssl.trustStorePassword

Building locally

To build the job from source and run against a local cassandra, in Spark's standalone mode.

# Build the spark jobs
$ ./mvnw -T1C -q --batch-mode -DskipTests -Denforcer.fail=false package
$ STORAGE_TYPE=cassandra java -jar ./main/target/zipkin-dependencies*.jar

Running in a Spark cluster

The jar file produced by this build can also run against spark directly. Before anything else, make sure you are running the same version of spark as used here.

You can use the following command to display what this project is built against:

$ SPARK_VERSION=$(./mvnw help:evaluate -Dexpression=spark.version -q -DforceStdout)
$ echo $SPARK_VERSION
2.4.0

Once you've verified your setup is on the correct version, set the SPARK_MASTER variable:

For example, if you are connecting to spark running on the same host:

$ STORAGE_TYPE=cassandra3 SPARK_MASTER=spark://$HOSTNAME:7077 java -jar zipkin-dependencies.jar

Note that the Zipkin team focuses on tracing, not Spark support. If you have Spark cluster related troubleshooting questions, please use their support tools.

Troubleshooting

When troubleshooting, always set ZIPKIN_LOG_LEVEL=DEBUG as this output is important when figuring out why a trace didn't result in a link.

If you set SPARK_MASTER to something besides local, remember that log output also ends up in stderr of the workers.

By default, this job uses the value of system property java.io.tmpdir as location to store temporary data. If you're getting java.io.IOException: No space left on device while processing large sets of trace data, you can specify a different location that has enough space available using -Djava.io.tmpdir=/other/location.

Artifacts

All artifacts publish to the group ID "io.zipkin.dependencies". We use a common release version for all components.

Library Releases

Releases are at Maven Central

Library Snapshots

Snapshots are uploaded to Sonatype after commits to master.

Docker Images

Released versions of zipkin-dependencies are published to Docker Hub as openzipkin/zipkin-dependencies and GitHub Container Registry as ghcr.io/openzipkin/zipkin-dependencies.

See docker for details.

More Repositories

1

zipkin

Zipkin is a distributed tracing system
Java
16,769
star
2

brave

Java distributed tracing implementation compatible with Zipkin backend services.
Java
2,327
star
3

zipkin-go

Zipkin distributed tracing library for go.
Go
604
star
4

zipkin-js

Zipkin instrumentation for Node.js and browsers
JavaScript
561
star
5

b3-propagation

Repository that describes and sometimes implements B3 propagation
518
star
6

zipkin4net

A .NET client library for Zipkin
C#
341
star
7

zipkin-php

Zipkin instrumentation for PHP
PHP
270
star
8

brave-example

A collection of examples how to use brave instrumentation in various frameworks and libraries.
Java
210
star
9

zipkin-reporter-java

Shared library for reporting zipkin spans on transports such as http or kafka
Java
122
star
10

zipkin-ruby

zipkin-tracer ruby gem
Ruby
98
star
11

zipkin-gcp

Reporters and collectors for use in Google Cloud Platform
Java
89
star
12

zipkin-aws

Reporters and collectors for use in Amazon's cloud
Java
69
star
13

zipkin-php-example

See how much time php services spend on an http request
PHP
59
star
14

zipkin-api

Zipkin's language independent model and HTTP Api Definitions
Thrift
59
star
15

zipkin-js-example

Example project that shows how to use zipkin with javascript
JavaScript
58
star
16

zipkin-finagle

Integration between Finagle tracing to Zipkin transports such as http and kafka
Java
40
star
17

openzipkin.github.io

content for https://zipkin.io
HTML
39
star
18

zipkin-browser-extension

Chrome and Firefox browser extensions for Zipkin
JavaScript
25
star
19

pyramid_zipkin-example

See how much time python services spend on an http request
Python
14
star
20

brave-cassandra

Tracing instrumentation for Cassandra and the DataStax Java Driver
Java
12
star
21

docker-java

A small Docker image based on azul/zulu-openjdk-alpine
Shell
11
star
22

zipkin-api-example

Example of how to use the OpenApi/Swagger api spec
Go
9
star
23

brave-karaf

Karaf integration and tests for Brave (java Zipkin tracer)
Java
6
star
24

zipkin-layout-factory

Spring Boot Layout Factory for Zipkin Server derivatives
Shell
6
star
25

zipkin-support

repository for support questions raised as issues
4
star
26

zipkin-ruby-example

Ruby
4
star
27

docker-alpine

Alpine Linux base layer for Zipkin Docker images
Shell
3
star
28

zipkin-release

Documentation and templates used for projects released the same way as OpenZipkin
Python
1
star