Senpai

Senpai is an automated memory sizing tool for container applications.

Background

Determining the exact amount of memory required by an application (the workingset size) is a difficult, error-prone task.

Libraries and code pages used during startup are loaded into memory only to be never touched again afterwards. On top of that, the Linux filesystem cache doesn't kick out cold data until that memory is required for new data. Allocated memory is not a good proxy for required memory. This makes it difficult to provision memory correctly and maintain adequate safety margins: Too little, and the applications experience thrashing or out-of-memory kills during load peaks; too much, and costly hardware resources are being wasted.

Senpai is a userspace tool that determines the actual memory requirement of containerized applications.

Using Linux psi metrics and cgroup2 memory limits, senpai applies just enough memory pressure on a container to page out the cold and unused memory pages that aren't necessary for nominal workload performance. It dynamically adapts to load peaks and troughs, and so provides a workingset profile of an application over time.

This information helps system operators eliminate waste, shore up for contingencies, optimize task placement in compute grids, and plan long-term capacity/hardware requirements.

Examples

An example kernel compile job has a peak memory consumption of 800M:

$ time make -j4 -s
real    3m58.050s
user    13m33.735s
sys     1m30.130s

$ sort -n memory.current-nolimit.log | tail -n 1
803934208

However, when a memory limit of 600M is applied, the job finishes in the same amount of time - with 25% less available memory:

# echo 600M > memory.high

$ time make -j4 -s
real    4m0.654s
user    13m28.493s
sys     1m31.509s

$ sort -n memory.current-600M.log | tail -n 1
629116928

Clearly, the full 800M aren't required. But 600M still has an unknown amount of slack - even a 400M limit doesn't materially affect runtime:

# echo 400M > memory.high

$ time make -j4 -s
real    4m3.186s
user    13m20.452s
sys     1m31.085s

$ sort -n memory.current-400M.log | tail -n 1
419368960

At 300M, on the other hand, the workload struggles to make forward progress and finish within a reasonable amount of time:

# echo 300M > memory.high

$ time make -j4 -s
^C
real    9m9.974s
user    10m59.315s
sys     1m16.576s

Finding the exact cutoff where job performance begins to plummet is a tedious trial-and-error process. It also only works when the job does a fixed amount of work every time it runs, like in this example, but that isn't true for many datacenter services that run indefinitely and process highly variable user input.

Senpai determines the memory requirement of an application while the application is running:

# senpai .
2019-08-19 14:26:05 Configuration:
2019-08-19 14:26:05   cgpath = /sys/fs/cgroup/kernelbuild
2019-08-19 14:26:05   min_size = 104857600
2019-08-19 14:26:05   max_size = 107374182400
2019-08-19 14:26:05   interval = 5
2019-08-19 14:26:05   pressure = 1000
2019-08-19 14:26:05   max_probe = 0.01
2019-08-19 14:26:05   max_backoff = 0.1
2019-08-19 14:26:05   log_probe = 1000
2019-08-19 14:26:05   log_backoff = 10
2019-08-19 14:26:05 Resetting limit to memory.current.
2019-08-19 14:26:06 limit=100.00M pressure=0.000000 time_to_probe= 6 total=117669927 delta=0 integral=0
2019-08-19 14:26:07 limit=100.00M pressure=0.000000 time_to_probe= 5 total=117669927 delta=0 integral=0
2019-08-19 14:26:08 limit=100.00M pressure=0.000000 time_to_probe= 4 total=117669927 delta=0 integral=0

$ time make -j4 -s

2019-08-19 14:26:09 limit=100.00M pressure=0.000000 time_to_probe= 3 total=117678359 delta=8432 integral=8432
2019-08-19 14:26:09   backoff: 0.09259305978684715
2019-08-19 14:26:10 limit=109.26M pressure=0.180000 time_to_probe= 5 total=117719536 delta=41177 integral=41177
2019-08-19 14:26:10   backoff: 0.1
2019-08-19 14:26:11 limit=120.18M pressure=0.180000 time_to_probe= 5 total=117768197 delta=48661 integral=48661

...

2019-08-19 14:26:43 limit=340.48M pressure=0.160000 time_to_probe= 5 total=118045638 delta=202 integral=202
2019-08-19 14:26:44 limit=340.48M pressure=0.130000 time_to_probe= 4 total=118045638 delta=0 integral=202
2019-08-19 14:26:45 limit=340.48M pressure=0.130000 time_to_probe= 3 total=118045638 delta=0 integral=202
2019-08-19 14:26:46 limit=340.48M pressure=0.110000 time_to_probe= 2 total=118045638 delta=0 integral=202
2019-08-19 14:26:47 limit=340.48M pressure=0.110000 time_to_probe= 1 total=118045690 delta=52 integral=254
2019-08-19 14:26:48 limit=340.48M pressure=0.090000 time_to_probe= 0 total=118045690 delta=0 integral=254
2019-08-19 14:26:48   probe: -0.001983887611266873
2019-08-19 14:26:49 limit=339.80M pressure=0.090000 time_to_probe= 5 total=118045690 delta=0 integral=0

...

real    4m9.420s
user    13m21.723s
sys     1m33.037s

$ sort -n memory.current-senpai.log | tail -n 1
347762688

Requirements

Linux v4.20 or up with CONFIG_PSI=y
python3

License

senpai is GPL v2.0 licensed, as found in the LICENSE file.

facebookincubator/senpai

facebookincubator

Reviews

Repository Details

Senpai

Background

Examples

Requirements

License

More Repositories