• Stars
    star
    326
  • Rank 128,245 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Wingtips is a distributed tracing solution for Java based on the Google Dapper paper.

Wingtips - Give Your Distributed Systems a Dapper Footprint

Maven Central Build Code Coverage License

Wingtips is a distributed tracing solution for Java 7 and greater based on the Google Dapper paper.

There are a few modules associated with this project:

If you prefer hands-on exploration rather than readmes, the sample applications provide concrete examples of using Wingtips that are simple, compact, and straightforward.

Table of Contents

Overview

Distributed tracing is a mechanism to track requests through a network of distributed systems in order to create transparency and shed light on the sometimes complex interactions and behavior of those systems. For example in a cloud-based microservice architecture a single request can touch dozens or hundreds of servers as it spreads out in a tree-like fashion. Without distributed tracing it would be very difficult to identify which part(s) of such a complex system were causing performance issues - either in general or for that request in particular.

Distributed tracing provides the capability for near-realtime monitoring or historical analysis of interactions between servers given the necessary tools to collect and interpret the traces. It allows you to easily see where applications are spending their time - e.g. is it mostly in-application work, or is it mostly waiting for downstream network calls?

Distributed tracing can also be used in some cases for error debugging and problem investigations with some caveats. See the relevant section for more information on this topic.

What is a Distributed Trace Made Of?

  • Every distributed trace contains a TraceID that represents the entire request across all servers/microservices it touches. These TraceIDs are generated as probabilistically unique 64-bit integers (longs). In practice these IDs are passed around as unsigned longs in lowercase hexadecimal string format, but conceptually they are just random 64-bit integers.
  • Each unit of work that you want to track in the distributed trace is defined as a Span. Spans are usually broken down into overall-request-time for a given request in a given server/microservice, and downstream-call-time for each downstream call made for that request from the server/microservice. Spans are identified by a SpanID, which is also a pseudorandomly generated 64-bit long.
  • Spans can have Parent Spans, which is how we build a tree of the request's behavior as it branches across server boundaries. The ParentSpanID is set to the parent Span's SpanID.
  • All Spans contain the TraceID of the overall trace they are attached to, whether or not they have Parent Spans.
  • Spans include a SpanName, which is a more human readable indication of what the span was, e.g. GET_/some/endpoint for the overall span for a REST request, or downstream-POST_https://otherservice.com/other/endpoint for the span performing a downstream call to another service.
  • Spans contain timing info - a start timestamp and a duration value.

See the Output and Logging section for an example of what a span looks like when it is logged.

Quickstart and Usage

Generic Application Pseudo-Code

NOTE: The following pseudo-code only applies to thread-per-request frameworks and scenarios. For asynchronous non-blocking scenarios see this section.

import com.nike.wingtips.Span;
import com.nike.wingtips.Tracer;

// ======As early in the request cycle as possible======
try {
    // Determine if a parent span exists by inspecting the request (e.g. request headers)
    Span parentSpan = extractParentSpanFromRequest(request);
    
    // Start the overall request span (which becomes the "current span" for this thread unless/until a sub-span is created)
    if (parentSpan == null)
        Tracer.getInstance().startRequestWithRootSpan("newRequestSpanName");
    else
        Tracer.getInstance().startRequestWithChildSpan(parentSpan, "newRequestSpanName");
        
    // It's recommended that you include the trace ID of the overall request span in the response headers
    addTraceIdToResponseHeaders(response, Tracer.getInstance().getCurrentSpan());
        
    // Execute the normal request logic
    doRequestLogic();
}
finally {
    // ======As late in the request/response cycle as possible======
    Tracer.getInstance().completeRequestSpan(); // Completes the overall request span and logs it to SLF4J
}

Generic Application Pseudo-Code Explanation

In a typical usage scenario you'll want to call one of the Tracer.getInstance().startRequest...() methods as soon as possible when a request enters your application, and you'll want to call Tracer.getInstance().completeRequestSpan() as late as possible in the request/response cycle. In between these two calls the span that was started (the "overall-request-span") is considered the "current span" for this thread and can be retrieved if necessary by calling Tracer.getInstance().getCurrentSpan().

The extractParentSpanFromRequest() method is potentially different for different applications, however for HTTP-based frameworks the pattern is usually the same - look for and extract distributed-tracing-related information from the HTTP headers and use that information to create a parent span. There is a utility method that performs this work for you: HttpRequestTracingUtils.fromRequestWithHeaders(RequestWithHeaders, List<String>). You simply need to provide the HTTP request wrapped by an implementation of RequestWithHeaders and the list of user ID header keys for your application (if any) and it will do the rest using the standard distributed tracing header key constants found in TraceHeaders. See the javadocs for those classes and methods for more information and usage instructions.

NOTE: Given the thread-local nature of this library you'll want to make sure the span completion call is in a finally block or otherwise guaranteed to be called no matter what (even if the request fails with an error) to prevent problems when subsequent requests are processed on the same thread. The Tracer class does its best to recover from incorrect thread usage scenarios and log information about what happened but the best solution is to prevent the problems from occurring in the first place. See the section below on try-with-resources for some tips on foolproof ways to safely complete your spans.

Is your application running in a Servlet-based framework?

If your application is running in a Servlet environment (e.g. Spring Boot w/ Spring MVC, raw Spring MVC, Jersey, raw Servlets, etc) then this entire lifecycle can be handled by a Servlet Filter. We've created one for you that's ready to drop in and go - see the wingtips-servlet-api Wingtips plugin module library for details. That plugin module is also a good resource to see how the code for a production-ready implementation of this library might look.

Simplified Span Management using Java try-with-resources Statements

Spans support Java try-with-resources statements to help guarantee proper usage in blocking/non-asynchronous scenarios (for asynchronous scenarios please refer to the asynchronous usage section of this readme). As previously mentioned, Spans that are not properly completed can lead to incorrect distributed tracing information showing up, and the try-with-resources statements guarantee that spans are completed appropriately. Here are some examples (note: there are some important tradeoffs you should consider before using this feature):

Overall request span using try-with-resources

try(Span requestSpan = Tracer.getInstance().startRequestWith*(...)) {
    // Traced blocking code for overall request (not asynchronous) goes here ...
}
// No finally block needed to properly complete the overall request span

Subspan using try-with-resources

try (Span subspan = Tracer.getInstance().startSubSpan(...)) {
    // Traced blocking code for subspan (not asynchronous) goes here ...
}
// No finally block needed to properly complete the subspan

Warning about error handling when using try-with-resources to autoclose spans

The try-with-resources feature to auto-close spans as described above can sound very tempting due to its convenience, but it comes with an important and easy-to-miss tradeoff: the span will be closed before any catch or finally blocks get a chance to execute. So if you need to catch any exceptions and log information about them (for example), then you do not want to use the try-with-resources shortcut because that logging will not be tagged with the span info of the span it logically falls under, and if you try to retrieve Tracer.getInstance().getCurrentSpan() then you'll either get the parent span if one exists or null if there was no parent span. This can be confusing and seem counter-intuitive, but it's the way try-with-resources works and is the price we pay for (ab)using it for convenience in a use case it wasn't originally intended for (Spans are not "resources" in the traditional sense).

Because of these drawbacks, and because it's easy to forget about this caveat and add a catch block at some future date and not get the behavior you expect, it's not recommended that you use this feature as common practice - or if you do make sure you call it out with some in-line comments for the inevitable future when someone tries to add a catch block. Instead it's recommended that you complete the span in a finally block manually as described in the Generic Application Pseudo-Code section. It's a few extra lines of code, but it's simple and prevents confusing unexpected behavior.

Thanks to Adrian Cole for pointing out this danger.

Output and Logging

When Tracer completes a span it will log it to a SLF4J logger named VALID_WINGTIPS_SPANS so you can segregate your span information into a separate file if desired. The following is an example of log output for a valid span (in this case it is a root span because it does not have a parent span ID):

14:02:26.029 [main] INFO  VALID_WINGTIPS_SPANS - [DISTRIBUTED_TRACING] {"traceId":"776d455c76fded18","parentSpanId":"null","spanId":"030ab15d0bb503a8","spanName":"somespan","sampleable":"true","userId":"null","startTimeEpochMicros":"1445720545485958","durationNanos":"543516000"}

If an invalid span is detected due to incorrect usage of Tracer then the invalid span will be logged to a SLF4J logger named INVALID_WINGTIPS_SPANS. These specially-named loggers will not be used for any other purpose.

Automatically attaching trace information to all log messages

In addition to the logs this class outputs for completed spans it puts the trace ID for the "current" span into the SLF4J MDC so that all your logs can be tagged with the current span's trace ID. To utilize this you would need to add %X{traceId} to your log pattern (NOTE: this only works with SLF4J frameworks that support MDC, e.g. Logback and Log4j). This causes all log messages, including ones that come from third party libraries and have no knowledge of distributed tracing, to be output with the current span's tracing information.

Here is an example Logback pattern utilizing the tracing MDC info:

traceId=%X{traceId} %date{HH:mm:ss.SSS} %-5level [%thread] %logger - %m%n

And here is what a log message output would look like when using this pattern:

traceId=520819c556734c0c 14:43:53.483 INFO  [main] com.foo.Bar - important log message

This is one of the primary features and benefits of Wingtips - if you utilize this MDC feature then you'll be able to trivially collect all log messages related to a specific request across all services it touched even if some of those messages came from third party libraries.

A Log4j pattern would look similar - in particular %X{traceId} to access the trace ID in the MDC is identical.

Note: You can adjust which span fields are included in the MDC behavior by calling Tracer.setSpanFieldsForLoggerMdc(...). It's recommended that you always include trace ID, but if you want to also include span ID, parent span ID, or even the full span JSON in the MDC (not recommended), you can.

Changing output format

See this section of this readme for information on how to change the serialization representation when logging completed spans (i.e. if you want spans to be serialized to a key/value string rather than JSON).

Nested Sub-Spans and the Span Stack

The span information associated with a thread is modeled as a stack, so it's possible to have nested spans inside the overall "request span" described previously. These nested spans are referred to as "sub-spans". So while the overall request span tracks the work done for the overall request, you can use sub-spans to mark work that you want tracked separately from the overall request. You can start and complete sub-spans using the Tracer sub-span methods: Tracer.startSubSpan(String) and Tracer.completeSubSpan().

These nested sub-spans are pushed onto the span stack associated with the current thread and you can have them as deeply nested as you want, but just like with the overall request span you'll want to make sure the completion method is called in a finally block or otherwise guaranteed to be executed even if an error occurs.

Each call to Tracer.startSubSpan(String) causes the "current span" to become the new sub-span's parent, and causes the new sub-span to become the "current span" by pushing it onto the span stack. Each call to Tracer.completeSubSpan() does the reverse by popping the current span off the span stack and completing and logging it, thus causing its parent to become the current span again.

NOTE: Sub-spans must be perfectly nested on any given thread - you must complete a sub-span before its parent can be completed. This is not generally an issue in thread-per-request scenarios due to the synchronous serial nature of the thread-per-request environment. It can fall apart in more complex asynchronous scenarios where multiple threads are performing work for a request at the same time. In those situations you are in the territory described by the reactive/asynchronous nonblocking section and would need to use the methods and techniques outlined in that section to achieve the goal of logical sub-spans without the restriction that they be perfectly nested.

Using sub-spans to surround downstream calls

One common use case for sub-spans is to track downstream calls separately from the overall request (e.g. HTTP calls to another service, database calls, or any other call that crosses network or application boundaries). Start a sub-span immediately before a downstream call and complete it immediately after the downstream call returns. You can inspect the sub-span to see how long the downstream call took from this application's point of view. If you do this around all your downstream calls you can subtract the total time spent for all downstream calls from the time spent for the overall-request-span to determine time spent in this application vs. time spent waiting for downstream calls to finish. And if the downstream service also performs distributed tracing and has an overall-request-span for its service call then you can subtract the downstream service's request-span time from this application's sub-span time around the downstream call to determine how much time was lost to network lag or any other bottlenecks between the services.

Propagating Distributed Traces Across Network or Application Boundaries

You can use the sub-span ability to create parent-child span relationships within an application, but the real utility of a distributed tracing system doesn't show itself until the trace crosses a network or application boundary (e.g. downstream calls to other services as part of a request).

The pattern for passing traces across network or application boundaries is that the calling service includes its "current span" information when calling the downstream service. The downstream service uses that information to generate its overall request span with the caller's span info as its parent span. This causes the downstream service's request span to contain the same Trace ID and sets up the correct parent-child relationship between the spans.

Propagating tracing info for HTTP client requests

TL;DR

Wingtips uses the Zipkin/B3 specification for HTTP tracing propagation by setting specially-named request headers to the values of the caller's Span. There is a handy HttpRequestTracingUtils.propagateTracingHeaders(...) helper method that performs this work so that it conforms to the B3 spec - all you need to do is wrap your request or headers object in an implementation of the HttpObjectForPropagation interface and pass it to HttpRequestTracingUtils.propagateTracingHeaders(...) along with your current span. The helper method will set the B3 headers on your request to the appropriate values based on the Span you pass in.

Details

If you don't want to use the helper method described above then here are the technical details on how to propagate tracing info on a HTTP client request yourself:

For HTTP requests it is assumed that you will pass the caller's span information to the downstream system using the request headers defined in TraceHeaders. Most headers defined in that class should be included in the downstream request (see below), as well as any application specific user ID header if you want to take advantage of the optional user ID functionality of spans. Note that to be properly B3 compatible (and therefore compatible with Zipkin systems) you should send the X-B3-Sampled header value as "0" for false and "1" for true. All the other header values should be correct if you send the appropriate Span property as-is, e.g. send Span.getTraceId() for the X-B3-TraceId header as it should already be in the correct B3 format.

Propagation requirements:

  • Trace ID: [required]
  • Span ID: [required]
  • Sampleable: [optional but recommended]
  • Parent Span ID: [optional when non-null, don't propagate when null]
  • Span Name: [highly optional] - Propagating span name is optional, and you may wish to intentionally include or exclude it depending on whether you want downstream systems to have access to that info. When calling downstream services you control it may be good to include it for extra debugging info, and for downstream services outside your control you may wish to exclude it to prevent unintentional information leakage.

Tooling to help with tracing propagation

The following Wingtips modules have helpers to simplify tracing propagation when using their respective technologies:

Adjusting Behavior and Execution Options

Sampling

Although this library is efficient and does not take much time for any single request, you may find it causes an unacceptable performance hit on high traffic and latency-sensitive applications (depending on the SLAs of the service) that process thousands of requests per second, usually due to the I/O cost of writing the log messages for every request to disk in high throughput scenarios. Google found that they could achieve the main goals of distributed tracing without negatively affecting performance in these cases by implementing trace sampling - i.e. only processing a certain percentage of requests rather than all requests.

If you find yourself in this situation you can adjust the sampling rate by calling Tracer.getInstance().setRootSpanSamplingStrategy(RootSpanSamplingStrategy) and passing in a RootSpanSamplingStrategy that implements the sampling logic necessary for your use case. To achieve the maximum benefit you could implement an adaptive/dynamic sampling strategy that increases the sampling rate during low traffic periods and lessens the sampling rate during high traffic periods.

Many (most?) services will not notice or experience any performance hit for using this library to sample all requests (the default behavior), especially if you use asynchronous logging features with your SLF4J implementation. It's rare to find a service that needs to handle the combination of volume, throughput, and low-latency requirements of Google's services, therefore testing is recommended to verify that your service is suffering an unacceptable performance hit due to distributed tracing before adjusting sampling rates, and it's also recommended that you read the Google Dapper paper to understand the challenges Google faced and how they solved them with sampling.

Notification of span lifecycle events

You can be notified of span lifecycle events when spans are started, sampled, and completed (i.e. for metrics counting) by adding a listener via Tracer.addSpanLifecycleListener(SpanLifecycleListener).

NOTE: It's important that any SpanLifecycleListener you add is extremely lightweight or you risk having the distributed tracing system become a major bottleneck for high throughput services. If any expensive work needs to be done in a SpanLifecycleListener then it should be done asynchronously on a dedicated thread or threadpool separate from the application worker threads.

Changing serialized representation of Spans for the logs

Normally when a span is completed it is serialized to JSON and output to the logs. If you want spans to be output with a different representation such as key/value string, you can call Tracer.setSpanLoggingRepresentation(SpanLoggingRepresentation), after which all subsequent spans that are logged will be serialized to the new representation.

Span Tags

Tags allow for key-value pairs to be associated with a span as metadata, often useful for filtering or grouping trace information, or integrating with a visualization/analytics system that expects certain tags. For example, say you've instrumented retry logic. It may be desirable to know how many retries were attempted. Or you may want to tag your spans based on an attribute of the request, like user type or an authenticated flag.

Tracer.getInstance().getCurrentSpan().putTag("UserType", user.getType());

Both keys and values are stored as strings. Calling Span.putTag(...) will replace any existing value for the key, or add the new key value pair if one with that key doesn't already exist.

NOTE: If you're wanting to record the time when some event occurred, you should probably use a timestamped annotation instead of a tag. This can provide extra benefits, especially when using visualization or analytics systems that parse timestamped annotations and do interesting things with them.

HTTP Span Tag and Naming Strategies and Adapters

HTTP Span Tag and Naming Strategies are a set of classes that apply consistent tagging and span naming automatically to every span based on data extracted from the HTTP request and response. There are a few default HttpTagAndSpanNamingStrategy implementations you can take advantage of:

  • ZipkinHttpTagStrategy - Based on this Zipkin documentation about Span data policy in HTTP instrumentation, and makes sure your spans are named and tagged such that a Zipkin server can work with them the way it expects. You can reference Wingtips' KnownZipkinTags class to access constants for these without having to pull in Zipkin dependencies for any additional tags you wish to implement.
  • OpenTracingHttpTagStrategy - This is currently effectively a subset of the ZipkinHttpTagStrategy (same tag names and values, just fewer of them). It may diverge in the future. This uses constants from the OpenTracing Tags class. You can reference Wingtips' KnownOpenTracingTags class to access these constants without having to pull in OpenTracing dependencies.
  • NoOpHttpTagStrategy - Does nothing when called. Use this if you want to turn off all span tagging.

Note that the ZipkinHttpTagStrategy and OpenTracingHttpTagStrategy implementations instrument a subset of the total known tags - if there are other tags that you need, you are free to add them.

These tag and naming strategies are reusable for both server-side and client-side span tagging. They don't care about what the request and response objects are, but they do need information from the request and response. This gap is bridged with HttpTagAndSpanNamingAdapter, which is implemented for each individual request/response class type for a given HTTP server or client library. The tag and naming adapter is given to the tag and naming strategy in order to extract the necessary data from the request and response.

We currently have the following default adapters you can use:

Creating new HttpTagAndSpanNamingStrategy and/or HttpTagAndSpanNamingAdapter classes or extending existing ones is not difficult - see the javadocs on those classes.

The Wingtips instrumentation for various HTTP server and client frameworks/libraries will all default to using ZipkinHttpTagStrategy and the appropriate adapter for the framework/library. If you want something else, they all have ways to configure them to use different tag and naming strategies and/or custom adapters. See the javadocs on those classes for details on how to customize the strategy and/or adapter that gets used:

Default HTTP Tags

The default tag strategy is ZipkinHttpTagStrategy. Here's a quick rundown of the tags you get (more info on these tags can be found in KnownZipkinTags):

Tag Description Example value
http.method The HTTP request method used. GET
http.path The HTTP path of the request (path only, not full URL). /some/path
http.url The full URL of the request, including scheme, host, path, and query string. http://some.host/some/path?fooQueryParam=bar
http.route The low-cardinality "template" version of the path. i.e. /some/path/{id} instead of /some/path/12345. This tag will only show up if the library or framework provides a mechanism to determine the path template. If available, it will also be used to help name the span. /some/path/{id}
http.status_code The response HTTP status code. 200
error Only exists if the request is considered an error. If an exception occurred then its message or classname will be used as the tag value. If no exception occurred then the request can still be considered an error if HttpTagAndSpanNamingAdapter.getErrorResponseTagValue(...) returns a non-empty value. That value will be used as the tag value. Most HTTP client adapters consider 4xx or 5xx response codes to indicate an error, while server adapters usually only consider 5xx to be an error. In either case adapters often use the response HTTP status code as the error tag value. An error occurred while doing foo, FooException, or 500

Custom Timestamped Span Annotations

In addition to tags, Wingtips Spans support timestamped annotations. Timestamped annotations are arbitrary notes with a timestamp attached, usually for the purpose of recording when certain events occurred. These will be output along with the usual Wingtips Span data and tags.

The usual mechanism for recording a timestamped annotation is to make a Span.addTimestampedAnnotationForCurrentTime(String) method call to record an annotation for the current time when the event occurs, e.g.:

requestSpan.addTimestampedAnnotationForCurrentTime("retry.initiated");

The timestamps for annotations created in this fashion have microsecond precision relative to the Span's start timestamp (which usually only has millisecond precision due to JVM System clock limitations).

If you need to record an annotation for a different timestamp than when you're adding the annotation, you can create a TimestampedAnnotation manually, and then call Span.addTimestampedAnnotation(TimestampedAnnotation).

NOTE: The most important and common use case for timestamped events is knowing when a client sent a request vs when the server received it (and vice versa on the response). Although you could use the custom timestamped annotations feature described above to do this, it's better to surround a client request with a sub-span and make sure the called service creates an overall request span for itself as well. This technique is described in the "using sub-spans to surround downstream calls" section. Using subspans for request/response timing instead of custom annotations is more reliable (less chance of forgetting to add the important annotations), prevents polluting a server span with client span tags or annotations (and vice versa), prevents incompatibilities with various tracing visualization and analytics systems that might not use the annotation names you picked, and is conceptually more consistent (server calls and client calls should be separate spans).

Usage in Reactive Asynchronous Nonblocking Scenarios

Due to the thread-local nature of this library it is more effort to integrate with reactive (asynchronous non-blocking) frameworks like Netty or actor frameworks than with thread-per-request frameworks. But it is not terribly difficult and the benefit of having all your log messages automatically tagged with tracing information is worth the effort. The Tracer class provides the following methods to help integrate with reactive frameworks:

  • Tracer.registerWithThread(Deque)
  • Tracer.unregisterFromThread()
  • Tracer.getCurrentSpanStackCopy()
  • Tracer.getCurrentTracingStateCopy() (not strictly necessary, but helpful for convenience)

See the javadocs on those methods for more detailed usage information, but the general pattern would be to call registerWithThread(Deque) with the request's span stack whenever a thread starts to do some chunk of work for that request, and call unregisterFromThread() when that chunk of work is done and the thread is about to be freed up to work on a different request. The span stack would need to follow the request no matter what thread was processing it, but assuming you can solve that problem in a reactive framework then the general pattern works well.

NOTE: The wingtips-java8 module contains numerous helpers to make dealing with async scenarios easy. See that module's readme and the javadocs for AsyncWingtipsHelper for full details, however here's some code examples for a few common use cases:

  • An example of making the current thread's tracing and MDC info hop to a thread executed by an Executor:
import static com.nike.wingtips.util.asynchelperwrapper.RunnableWithTracing.withTracing;

// ...

// Just an example - please use an appropriate Executor for your use case.
Executor executor = Executors.newSingleThreadExecutor(); 

executor.execute(withTracing(() -> {
    // Code that needs tracing/MDC wrapping goes here
}));
  • Or use ExecutorServiceWithTracing so you don't forget to wrap your Runnables or Callables (WARNING: be careful if you have to spin off work that shouldn't automatically inherit the calling thread's tracing state, e.g. long-lived background threads - in those cases you should not use an ExecutorServiceWithTracing to spin off that work):
import static com.nike.wingtips.util.asynchelperwrapper.ExecutorServiceWithTracing.withTracing;

// ...

// Just an example - please use an appropriate Executor for your use case.
Executor executor = withTracing(Executors.newSingleThreadExecutor());

executor.execute(() -> {
    // Code that needs tracing/MDC wrapping goes here
});
  • A similar example using CompletableFuture:
import static com.nike.wingtips.util.asynchelperwrapper.SupplierWithTracing.withTracing;

// ...

CompletableFuture.supplyAsync(withTracing(() -> {
    // Supplier code that needs tracing/MDC wrapping goes here.
    return foo;
}));
  • There's a ScheduledExecutorServiceWithTracing that extends ExecutorServiceWithTracing and implements ScheduledExecutorService, for when you need a scheduler that supports automatic Wingtips tracing state propagation.
import static com.nike.wingtips.util.asynchelperwrapper.ScheduledExecutorServiceWithTracing.withTracing;

// ...

// Just an example - please use an appropriate ScheduledExecutorService for your use case.
ScheduledExecutorService scheduler = withTracing(Executors.newSingleThreadScheduledExecutor());

scheduler.schedule(() -> {
    // Code that needs tracing/MDC wrapping goes here
}, 42, TimeUnit.SECONDS);
  • This example shows how you might accomplish tasks in an environment where the tracing information is attached to some request context, and you need to temporarily attach the tracing info in order to do something (e.g. log some messages with tracing info automatically added using MDC):
import static com.nike.wingtips.util.AsyncWingtipsHelperStatic.runnableWithTracing;

// ...

TracingState tracingInfo = requestContext.getTracingInfo();
runnableWithTracing(
    () -> {
        // Code that needs tracing/MDC wrapping goes here
    },
    tracingInfo
).run();
  • If you have a third party library that hands back a CompletableFuture when performing work (like a database call) but doesn't provide hooks for distributed tracing, then you can surround that CompletableFuture with a child span using the wrapCompletableFutureWithSpan helper method (see the javadocs on wrapCompletableFutureWithSpan and OperationWrapperOptions for full details):
import static com.nike.wingtips.util.AsyncWingtipsHelperStatic.wrapCompletableFutureWithSpan;

// ...

OperationWrapperOptions<DbRecord> options = OperationWrapperOptions
    .<DbRecord>newBuilder("mydb-get-foo", SpanPurpose.CLIENT)
    .withSpanTagger((span, payload) -> {
        span.putTag("db.instance", "myapp.foodb");
        span.putTag("db.type", "FooDB");
    })
    .build();   

CompletableFuture<DbRecord> recordFuture = wrapCompletableFutureWithSpan(
    options,
    // fooDbClient.queryFoo() returns the CompletableFuture<DbRecord> we want wrapped in a span.
    //      We simply turn it into a Supplier so that wrapCompletableFutureWithSpan() can kick it off inside
    //      after starting the span wrapper.
    () -> fooDbClient.queryFoo() 
);
  • If you want to use the link and unlink methods manually to wrap some chunk of code, the general procedure looks like this:
import static com.nike.wingtips.util.AsyncWingtipsHelperStatic.linkTracingToCurrentThread;
import static com.nike.wingtips.util.AsyncWingtipsHelperStatic.unlinkTracingFromCurrentThread;

// ...

TracingState originalThreadInfo = null;
try {
    originalThreadInfo = linkTracingToCurrentThread(...);
    // Code that needs tracing/MDC wrapping goes here
}
finally {
    unlinkTracingFromCurrentThread(originalThreadInfo);
}

ALSO NOTE: wingtips-core does contain a small subset of the async helper functionality described above for the bits that are Java 7 compatible, such as Runnable, Callable, and ExecutorService. See AsyncWingtipsHelperJava7 if you're in a Java 7 environment and cannot upgrade to Java 8. If you're in Java 8, please use AsyncWingtipsHelper or AsyncWingtipsHelperStatic rather than AsyncWingtipsHelperJava7.

Using Distributed Tracing to Help with Debugging Issues/Errors/Problems

If an application is setup to fully utilize the functionality of Wingtips then all log messages will include the TraceID for the distributed trace associated with that request. The TraceID should also be returned as a response header, so if you get the TraceID for a given request you can go log diving to find all log messages associated with that request and potentially discover where things went wrong. There are some potential drawbacks to using distributed tracing as a debugging tool:

  • Not all requests are guaranteed to be traced. Depending on a service's throughput, SLA requirements, etc, it may need to implement trace sampling where only a percentage of requests are traced. Distributed tracing was primarily designed as a monitoring tool so sampling tradeoffs may need to be made to keep the overhead to an acceptable level. Google went as low as 0.01% sampling for some of their most high traffic and latency-sensitive services.
  • Even if a given request is sampled it is not guaranteed that the logs will contain any messages that are helpful to the investigation.
  • The trace IDs are pseudorandomly generated 64-bit long numbers. Therefore they are not guaranteed to be 100% unique. They are probabilistically unique but not guaranteed. The likelihood of collisions goes up quickly the more traces you're considering (see the Birthday Paradox/Problem), so you have to be careful to limit your searches to a reasonable amount of time and keep in mind that while collisions for a specific ID are very low it is technically possible.

That said, it can be extremely helpful in many cases for debugging or error investigation and is a benefit that should not be overlooked.

Integrating With Other Distributed Tracing Tools

The logging behavior of Wingtips is somewhat useful without any additions - you can search or parse the distributed tracing output logs manually for any number of purposes directly on the server where the logs are output. There are limits to this approach however, especially as the number of servers in your ecosystem increases, and there are other distributed tracing tools out there for aggregating, searching, and visualizing distributed tracing information or logs in general that you might want to take advantage of. You should also be careful of doing too much on a production server - generally you want to do your searching and analysis offline on a different server both for convenience and to prevent disrupting production systems.

The typical way this goal is accomplished is to have a separate process on the server monitor the distributed tracing logs and pipe the information to an outside aggregator or collector for asynchronous/offline processing. You can use a general-purpose log aggregator that parses the application and tracing logs from all your services and exposes them via search interface, or you can use a distributed-tracing-specific tool like Zipkin, purpose-built for working with distributed trace spans, or any other number of possibilities.

Wingtips now contains some plug-and-play Zipkin support that makes sending spans to Zipkin servers easy. The wingtips-zipkin2 submodule's readme contains full details, but here's a quick example showing how you would configure Wingtips to send spans to a Zipkin server listening at http://localhost:9411:

Tracer.getInstance().addSpanLifecycleListener(
    new WingtipsToZipkinLifecycleListener("some-service-name", 
                                          "http://localhost:9411")
);

Just execute that line as early in your application startup procedure as possible (ideally before any requests hit the service that would generate spans) and you'll see the Wingtips spans show up in the Zipkin UI.

Sample Applications

The following sample applications show how Wingtips can be used in various frameworks and use cases. The VerifySampleEndpointsComponentTest component tests in the sample apps exercise important parts of Wingtips functionality - you can learn a lot by running those component tests, seeing what the sample apps return and the log messages they output, and exploring the associated endpoints in the sample apps to see how it all fits together.

See the sample app readmes for further information on building and running the sample apps as well as things to try:

License

Wingtips is released under the Apache License, Version 2.0

More Repositories

1

Willow

Willow is a powerful, yet lightweight logging library written in Swift.
Swift
1,334
star
2

gimme-aws-creds

A CLI that utilizes Okta IdP via SAML to acquire temporary AWS credentials
Python
902
star
3

Elevate

Elevate is a JSON parsing framework that leverages Swift to make parsing simple, reliable and composable.
Swift
612
star
4

koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Python
570
star
5

burnside

Fast and Reliable E2E Web Testing with only Javascript
JavaScript
382
star
6

hal

hal provides an AWS Lambda Custom Runtime environment for your Haskell applications.
Haskell
235
star
7

brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
Python
176
star
8

spark-expectations

A Python Library to support running data quality rules while the spark job is runningâš¡
Python
161
star
9

SQift

Powerful Swift wrapper for SQLite
Swift
141
star
10

riposte

Riposte is a Netty-based microservice framework for rapid development of production-ready HTTP APIs.
Java
122
star
11

timeseries-generator

A library to generate synthetic time series data by easy-to-use factors and generator
Python
122
star
12

bartlett

A simple Jenkins command line client to serve your needs.
Haskell
81
star
13

cerberus-doc-site

Secure Property Store for Cloud Applications
CSS
81
star
14

aws-greengrass-core-sdk-rust

Provides an idiomatic Rust wrapper around the AWS Greengrass Core C SDK to more easily enable Greengrass native lambda functions in Rust.
Rust
71
star
15

cerberus

The Cerberus micro-service, a secure property store for cloud applications. It includes a REST API, authentication and encryption features, as well as a self-service web UI for users.
Java
62
star
16

referee

Referee is a UI for using Spinnaker Kayenta as a standalone service.
TypeScript
59
star
17

moirai

Libraries that can be used to determine if a feature should be exposed to a user.
Java
52
star
18

riposte-microservice-template

An example template for quickly creating a new Riposte microservice project.
Java
51
star
19

harbormaster

Harbormaster is a webhook handler for the Kubernetes API.
Go
42
star
20

fastbreak

Fastbreak is a simple Java 8 native circuit breaker supporting async future, blocking, and callback/manual modes.
Java
40
star
21

signal_analog

A troposphere-inspired library for programmatic, declarative definition and management of SignalFx Charts, Dashboards, and Detectors.
Python
39
star
22

backstopper

Backstopper is a framework-agnostic API error handling and (optional) model validation solution for Java 7 and up.
Java
38
star
23

react-virtualized-item-grid

React component for efficiently rendering a large, scrollable list of items in a series of wrapping rows
JavaScript
38
star
24

knockoff-factory

A library for generating fake data and populating database tables.
Python
34
star
25

pterradactyl

Pterradactyl is a library developed to abstract Terraform configuration from the Terraform environment setup.
Python
32
star
26

lambda-logger-node

A middleware logger that implements the MDC logging pattern for use in AWS NodeJS Lambdas.
TypeScript
29
star
27

lambda-router

JavaScript
23
star
28

bokor

Bokor is a simple, Record and Playback Mock Server written in Node.js, utilized for Service Virtualization.
JavaScript
23
star
29

piggyback

This tool allows you to tunnel SSH (using ProxyCommand) via HTTPS (with Squid Proxy). It is a python implementation of corkscrew, but over https (TLS) instead of http (plaintext).
Python
17
star
30

cerberus-node-client

Node client for interacting with a Cerberus backend. It can be used in Amazon EC2 instances and Amazon Lambdas.
JavaScript
16
star
31

cerberus-java-client

Java Client for Cerberus
Java
14
star
32

cerberus-lifecycle-cli

Command Line Interface for managing a Cerberus environment in AWS
Java
14
star
33

cerberus-python-client

Python Client for Cerberus
Python
13
star
34

cerberus-management-dashboard

A single page react app that is the self service web UI for administration of Safe Deposit Boxes, access control, and data.
HTML
13
star
35

tdd-training-cube

Papercraft cube used as training aid for Outside-In Test Driven Development
11
star
36

cerberus-go-client

A Golang client for interacting with Cerberus, a secure property store for cloud applications.
Go
11
star
37

cerberus-serverless-components

A collection of AWS Serverless components for Cerberus
Java
11
star
38

gradle-localstack

Gradle plugin for working with mock AWS endpoints using LocalStack.
Java
11
star
39

aws-thin-dynamo-node

A small, fast re-implementation of the AWS Dynamo DocumentClient
JavaScript
10
star
40

cerberus-archaius-client

An Archaius property provider implementation backed by Cerberus.
Java
9
star
41

epc-standards

Implementation of decoding GS1 EPC tags
Java
9
star
42

lambda-zipper

Zip up your node lambda code and production dependencies without pruning node_modules
JavaScript
9
star
43

java-vault-client

This is a java based Vault client library for communicating with Vault via HTTP.
Java
8
star
44

cerberus-cli

A CLI for the Cerberus API.
Go
8
star
45

cerberus-integration-tests

Groovy
8
star
46

cerberus-gateway-puppet-module

Puppet Module for installing Nginx and config downloader scripts
Python
8
star
47

cerberus-consul-puppet-module

A Puppet module for installing Hashicorp's Consul as a service with customized start up scripts for Cerberus.
HTML
7
star
48

Fleam

Scala
6
star
49

bluegreen-manager

Java
6
star
50

aws-thin-s3-node

A super-thin AWS S3 client
JavaScript
5
star
51

homebrew-nike

Homebrew formulas provided by Nike, Inc.
Ruby
5
star
52

sagerender

A library for configuring SageMaker pipelines using hierarchical configuration pattern.
Python
5
star
53

dynamo-arc

TypeScript
5
star
54

metrics-new-relic-insights

Reporter to send Dropwizard Metrics to New Relic Insights.
Java
5
star
55

cerberus-vault-puppet-module

A Puppet module for installing Hashicorp's Vault as a service with customized start up scripts for Cerberus.
HTML
5
star
56

cerberus-spring-boot-client

Spring Boot client for interacting with a Cerberus backend.
Java
4
star
57

dabber

Dabber is a Node CLI tool and AWS Lambda that helps you work with Dynamo.
JavaScript
3
star
58

actions-cerberus-secrets

Read secrets from Cerberus and make it as environment variables in GitHub Actions job so that it can be used in CICD process.
TypeScript
3
star
59

nike-inc.github.io

HTML
3
star
60

aws-thin-ses-node

A super-thin AWS Simple Email Service client
JavaScript
3
star
61

phiera

Python
2
star
62

cerberus-ruby-client

Ruby Client for Cerberus
Ruby
2
star
63

aws-scale

AWS Scaling Made Simple
JavaScript
2
star
64

gimme-a-cli

Gimme a CLI is a Java library for creating quick and easy command line interfaces (CLIs) using JCommander and Spring dependency injection.
Java
2
star
65

Fawcett

A collection of Monocle lenses for navigating Amazon's API models.
Scala
1
star
66

dynamo-butter

JavaScript
1
star
67

redwiggler

The composting worm. Composts your contract specification and tests and confirms that the contract specification is being followed.
Scala
1
star
68

gradle-localdynamodb-plugin

XSLT
1
star
69

gimme-a-cli-starter-project

Clone and modify this project to quickly create your own CLI based on the Gimme a CLI library.
Java
1
star