JTune - a high precision Java CMS optimizer
NOTE
Version 4.0 removes Python 2.0 compat and is now Py3+ only. There are no feature additions. If you're running under a Python 2 environment, you must run a version of JTune prior to v4.
Overview
JTune is a tool that will help you tune and troubleshoot a running JVM (Java 6 - Java 8) without restarting. It currently doesn't work with the G1 garbage collector, and will error out if this is detected. Tuning is based on two metrics: the aggregate time spent doing GCs, and the standard deviation of the GCs. Upon invocation, JTune captures the output of jstat for the given pid as well as the GC log data during the sample period.
Installation
The easiest way to install JTune is via pip
. A pip install
will fetch the latest version of JTune from the PyPI repo.
pip install jtune
The latest development branch can be installed via setuptools, by performing the following.
git clone https://github.com/linkedin/JTune.git
cd JTune
python setup.py install
If you wish to run JTune as a one time action, all JTune logic is present in a single file (jtune.py). Simply copying jtune.py to your target host will work as well.
It is recommended that you install JTune into a Python virtualenv
.
Options Help
In normal use, run JTune with the -p <pid> parameter, and it will run indefinitely. When you are ready, hit CTRL-C, and detailed information about its findings will be printed, along with recommendations for improvement (if any).
There are additional options that you can take advantage of. See below output:
$ jtune -h
usage: jtune [-h] [-o OPTIMIZE] [-P] [-s FGC_STOP_COUNT]
[-y YGC_STOP_COUNT] [-c STOP_COUNT] [-n] (-r [FILE] | -p PID)
Run jstat w/ analytics
optional arguments:
-h, --help show this help message and exit
-o OPTIMIZE, --optimize OPTIMIZE
Optimize for latency or throughput (range 0-11, 0 =
ygc @ 180/min, 11 = ygc @ 1/min). Floats allowed.
-P, --no-paste Don't save the screen output to the paste service
-s FGC_STOP_COUNT, --fgc-stop-count FGC_STOP_COUNT
How many full gcs should happen before I stop (very
important for analytics)
-y YGC_STOP_COUNT, --ygc-stop-count YGC_STOP_COUNT
How many young gcs should happen before I stop
-c STOP_COUNT, --stop-count STOP_COUNT
How many iterations of jstat to run before stopping
-n, --no-jstat-output
Do not show jstat output - only print summary
-p PID, --pid PID Which java PID should I attach to
- You can also have it stop after X number of YGCs, FGCs, or jstat iterations (-y, -s, -c respectively). If you want it to make tuning suggestions, you'll want to let it run for at least 3 FGCs (-s <#>) before exiting.
- There may be cases where you want jtune to optimize for a given number of CMS GCs, you can do this with the '-o #' parameter. Right now you can specify a range between 0 and 11 which corresponds to the 180 CMS/min to 1 CMS/min respectively. In most cases you can leave it as default. The way this parameter is used will likely change.
Command Output
Here's an example of a JTune run for a test instance (broken up into chunks)
- JTune is running against PID 25815 for 40 iterations, exit, and report its findings:
$ jtune -c 40 -p 25815
#####
# Start Time: 2015-03-23 12:31:45.079102 GMT
# Host: fake-host.linkedin.com
#####
EC EP EU S0C/S1C S0U S1U OC OP OU MC MU YGC YGCD FGC FGCD
~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~ ~~~ ~~~~
1.1G 96.0% 1.1G 117.3M 76.5M 0K 13.6G 50.6% 6.9G 90M 88.2M 71876 - 138 -
1.1G 28.0% 329M 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 +1 138 -
1.1G 51.5% 604.7M 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 - 138 -
1.1G 55.1% 646.4M 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 - 138 -
1.1G 74.5% 874.6M 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 - 138 -
1.1G 88.1% 1G 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 - 138 -
1.1G 92.5% 1.1G 117.3M 0K 9.5M 13.6G 50.7% 6.9G 90M 88.2M 71877 - 138 -
1.1G 15.6% 182.8M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 +1 138 -
1.1G 50.3% 589.7M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 - 138 -
1.1G 60.8% 713.8M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 - 138 -
1.1G 65.4% 767.2M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 - 138 -
1.1G 66.0% 774.7M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 - 138 -
1.1G 78.6% 922.5M 117.3M 17M 0K 13.6G 50.7% 6.9G 90M 88.2M 71878 - 138 -
1.1G 5.1% 59.7M 117.3M 0K 14.4M 13.6G 50.7% 6.9G 90M 88.2M 71879 +1 138 -
1.1G 28.5% 335M 117.3M 0K 14.4M 13.6G 50.7% 6.9G 90M 88.2M 71879 - 138 -
1.1G 63.3% 742.9M 117.3M 0K 14.4M 13.6G 50.7% 6.9G 90M 88.2M 71879 - 138 -
1.1G 67.5% 791.6M 117.3M 0K 14.4M 13.6G 50.7% 6.9G 90M 88.2M 71879 - 138 -
1.1G 74.2% 870.4M 117.3M 0K 14.4M 13.6G 50.7% 6.9G 90M 88.2M 71879 - 138 -
1.1G 3.2% 38.1M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 +1 138 -
1.1G 8.9% 104.2M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 - 138 -
1.1G 27.9% 327.2M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 - 138 -
1.1G 29.5% 346.1M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 - 138 -
1.1G 35.2% 413.6M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 - 138 -
1.1G 53.5% 628.1M 117.3M 48.6M 0K 13.6G 50.7% 6.9G 90M 88.2M 71880 - 138 -
1.1G 1.0% 12.1M 117.3M 0K 47.3M 13.6G 50.7% 6.9G 90M 88.2M 71881 +1 138 -
1.1G 19.2% 225.7M 117.3M 0K 47.3M 13.6G 50.7% 6.9G 90M 88.2M 71881 - 138 -
1.1G 72.6% 852.3M 117.3M 0K 47.3M 13.6G 50.7% 6.9G 90M 88.2M 71881 - 138 -
1.1G 79.5% 933.1M 117.3M 0K 47.3M 13.6G 50.7% 6.9G 90M 88.2M 71881 - 138 -
1.1G 5.6% 66M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 +1 138 -
1.1G 36.2% 424.9M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 47.5% 557.2M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 57.0% 669.3M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 66.9% 785.5M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 87.1% 1022.3M 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 99.8% 1.1G 117.3M 53.3M 0K 13.6G 50.7% 6.9G 90M 88.2M 71882 - 138 -
1.1G 36.7% 430.6M 117.3M 0K 78M 13.6G 50.7% 6.9G 90M 88.2M 71883 +1 138 -
1.1G 78.1% 915.9M 117.3M 0K 78M 13.6G 50.7% 6.9G 90M 88.2M 71883 - 138 -
1.1G 19.7% 231.4M 117.3M 117.3M 0K 13.6G 50.9% 6.9G 90M 88.2M 71884 +1 138 -
1.1G 68.5% 804.3M 117.3M 117.3M 0K 13.6G 50.9% 6.9G 90M 88.2M 71884 - 138 -
1.1G 87.2% 1022.7M 117.3M 117.3M 0K 13.6G 50.9% 6.9G 90M 88.2M 71884 - 138 -
* Reading gc.log file... done. Scanned 45 lines in 0.0001 seconds.
* Reading the public access log file... done. Scanned 169 lines in 0.0014 seconds.
- When it exits, this first section gives useful meta information about the process, information about the requests that are coming into it, GC allocation/promotion rates, and survivor death rates.
Meta:
-----
Sample Time: 40 seconds
System Uptime: 1046d18h
CPU Uptime: 25122d21h
Proc Uptime: 6d23h
Proc Usertime: 3d15h (0.01%)
Proc Systime: 8h28m (0.00%)
Proc RSS: 36.95G
Proc VSize: 55.46G
Proc # Threads: 771
YG Allocation Rates*:
---------------------
per sec (min/mean/max): 177.37M/s 214.35M/s 311.97M/s
per day (min/mean/max): 14.61T/d 17.66T/d 25.71T/d
OG Promotion Rates:
-------------------
per sec (min/mean/max): 326.80K/s 6.66M/s 27.12M/s
per hr (min/mean/max): 1.12G/h 23.43G/h 95.33G/h
Survivor Death Rates:
---------------------
Lengths (min/mean/max): 1/1.8/2
Death Rate Breakdown:
Age 1: 4.6% / 63.9% / 95.2% / 36.1% (min/mean/max/cuml alive %)
Age 2: 0.0% / 21.3% / 79.2% / 28.4% (min/mean/max/cuml alive %)
GC Information:
---------------
YGC/FGC Count: 8/0 (Rate: 12.00/min, 0.00/min)
GC Load (since JVM start): 0.39%
Sample Period GC Load: 0.51%
CMS Sweep Times: 0.000s / 0.000s / 0.000s / 0.00 (min/mean/max/stdev)
YGC Times: 17ms / 26ms / 51ms / 11.04 (min/mean/max/stdev)
FGC Times: 0ms / 0ms / 0ms / 0.00 (min/mean/max/stdev)
Agg. YGC Time: 205ms
Agg. FGC Time: 0ms
Est. Time Between FGCs (min/mean/max): 12h8m 34m53s 8m34s
Est. OG Size for 1 FGC/hr (min/mean/max): 1.12G 23.43G 95.33G
Overall JVM Efficiency Score*: 99.488%
Current JVM Configuration:
--------------------------
NewSize: 1.38G
OldSize: 13.62G
SurvivorRatio: 10
MinHeapFreeRatio: 40
MaxMetaspaceSize: 16E
MaxHeapFreeRatio: 70
MaxHeapSize: 15G
MetaspaceSize: 20.80M
NewRatio: 2
- This section provides what analysis it was able to do. For it to do a very accurate/detailed analysis, you need to let it run long enough to capture at least 3 FGCs. It will warn you if it doesn't have enough data, and will not do analysis in a specific area if there is insufficient data. Here you can see that there wasn't enough FGCs.
Recommendation Summary:
-----------------------
Warning: There were only 8 YGC entries to do the analysis on. It's better to
have > 10 to get more realistic results.
* Error: Your survivor age is too short, your last age of 2 has 63.89% of it's
objects still alive. Unset or increase the MaxTenuringThreshold to mitigate this
problem.
---
* The allocation rate is the increase in usage before a GC done. Growth rate
is the increase in usage after a GC is done.
* The JVM efficiency score is a convenient way to quantify how efficient the
JVM is. The most efficient JVM is 100% (pretty much impossible to obtain).
* There were no full GCs during this sample period. This reporting will
be less useful/accurate as a result.
* A copy of the critical data used to generate this report is stored
in /tmp/jtune_data-{user}.bin.bz2. Please copy this to your homedir if you
want to save/analyze this further.
License
This application is distributed under the terms of the Apache Software License version 2.0. See COPYING file for more details.
Authors
- Eric Bullen <[email protected]> (Sr. Site Reliability Engineer at LinkedIn)
- Jesse Ward <[email protected]> (Staff Site Reliability Engineer at LinkedIn)
FAQ
Q: Do I have to run jtune as root?
A: You should run it as the user of the Java process you want to analyze (or root).
Q: What versions of Java does this support?
A: JTune works with Java versions 6-8.
Q: What JVM options should I have turned on to properly use this tool?
A: You should have the following enabled: -Xloggc, -XX:+PrintTenuringDistribution,
-XX:+PrintGCDetails, and -XX:+PrintGCDateStamps
Q: Can it tune the G1 GC?
A: Not at this time. G1 is quite a bit harder to tweak, but it's work in progress.