• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Shell
  • License
    Creative Commons ...
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool.

zfs-inplace-rebalancing

Simple bash script to rebalance pool data between all mirrors when adding vdevs to a pool.

asciicast

How it works

This script recursively traverses all the files in a given directory. Each file is copied with a .balance suffix, retaining all file attributes. The original is then deleted and the copy is renamed back to the name of the original file. When copying a file ZFS will spread the data blocks across all vdevs, effectively distributing/rebalancing the data of the original file (more or less) evenly. This allows the pool data to be rebalanced without the need for a separate backup pool/drive.

The way ZFS distributes writes is not trivial, which makes it hard to predict how effective the redistribution will be. See:

Note that this process is not entirely "in-place", since a file has to be fully copied before the original is deleted. The term is used to make it clear that no additional pool (and therefore hardware) is necessary to use this script. However, this also means that you have to have enough space to create a copy of the biggest file in your target directory for it to work.

At no point in time are both versions of the original file deleted. To make sure file attributes, permissions and file content are maintained when copying the original file, all attributes and the file checksum is compared before removing the original file (if not disabled using --checksum false).

Since file attributes are fully retained, it is not possible to verify if an individual file has been rebalanced. However, this script keeps track of rebalanced files by maintaining a "database" file in its working directory called rebalance_db.txt (if not disabled using --passes 0). This file contains two lines of text for each processed file:

  • One line for the file path
  • and the next line for the current count of rebalance passes
/my/example/pool/file1.mkv
1
/my/example/pool/file2.mkv
1

Prerequisites

Balance Status

To check the current balance of a pool use:

> zpool list -v

NAME                                              SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
bpool                                            1.88G   113M  1.76G        -         -     2%     5%  1.00x    ONLINE  -
  mirror                                         1.88G   113M  1.76G        -         -     2%  5.88%      -    ONLINE  
    ata-Samsung_SSD_860_EVO_500GB_J0NBL-part2        -      -      -        -         -      -      -      -    ONLINE  
    ata-Samsung_SSD_860_EVO_500GB_S4XB-part2         -      -      -        -         -      -      -      -    ONLINE  
rpool                                             460G  3.66G   456G        -         -     0%     0%  1.00x    ONLINE  -
  mirror                                          460G  3.66G   456G        -         -     0%  0.79%      -    ONLINE  
    ata-Samsung_SSD_860_EVO_500GB_S4BB-part3         -      -      -        -         -      -      -      -    ONLINE  
    ata-Samsung_SSD_860_EVO_500GB_S4XB-part3         -      -      -        -         -      -      -      -    ONLINE  
vol1                                             9.06T  3.77T  5.29T        -         -    13%    41%  1.00x    ONLINE  -
  mirror                                         3.62T  1.93T  1.70T        -         -    25%  53.1%      -    ONLINE  
    ata-WDC_WD40EFRX-68N32N0_WD-WCC                  -      -      -        -         -      -      -      -    ONLINE  
    ata-ST4000VN008-2DR166_ZM4-part2                 -      -      -        -         -      -      -      -    ONLINE  
  mirror                                         3.62T  1.84T  1.78T        -         -     8%  50.9%      -    ONLINE  
    ata-ST4000VN008-2DR166_ZM4-part2                 -      -      -        -         -      -      -      -    ONLINE  
    ata-WDC_WD40EFRX-68N32N0_WD-WCC-part2            -      -      -        -         -      -      -      -    ONLINE  
  mirror                                         1.81T   484K  1.81T        -         -     0%  0.00%      -    ONLINE  
    ata-WDC_WD20EARX-00PASB0_WD-WMA-part2            -      -      -        -         -      -      -      -    ONLINE  
    ata-ST2000DM001-1CH164_Z1E-part2                 -      -      -        -         -      -      -      -    ONLINE  

and have a look at difference of the CAP value (SIZE/FREE vs ALLOC ratio) between vdevs.

No Deduplication

Due to the working principle of this script, which essentially creates a duplicate file on purpose, deduplication will most definitely prevent it from working as intended. If you use deduplication you probably have to resort to a more expensive rebalancing method that involves additional drives.

Data selection (cold data)

Due to the working principle of this script, it is crucial that you only run it on data that is not actively accessed, since the original file will be deleted.

Snapshots

If you do a snapshot of the data you want to balance before starting the rebalancing script, keep in mind that ZFS now has to keep track of all of the data in the target directory twice. Once in the snapshot you made, and once for the new copy. This means that you will effectively use double the file size of all files within the target directory. Therefore it is a good idea to process the pool data in batches and remove old snapshots along the way, since you probably will be hitting the capacity limits of your pool at some point during the rebalancing process.

Installation

Since this is a simple bash script, there is no package. Simply download the script and make it executable:

curl -O https://raw.githubusercontent.com/markusressel/zfs-inplace-rebalancing/master/zfs-inplace-rebalancing.sh
chmod +x ./zfs-inplace-rebalancing.sh

Dependencies:

  • pacman -S bc - used for percentage calculation

Usage

ALWAYS HAVE A BACKUP OF YOUR DATA!

You can print a help message by running the script without any parameters:

./zfs-inplace-rebalancing.sh

Parameters

Name Description Default
-c
--checksum
Whether to compare attributes and content of the copied file using an MD5 checksum. Technically this is a redundent check and consumes a lot of resources, so think twice. true
-p
--passes
The maximum number of rebalance passes per file. Setting this to infinity by using a value <= 0 might improve performance when rebalancing a lot of small files. 1
--skip-hardlinks Skip rebalancing hardlinked files, since it will only create duplicate data. false

Example

Make sure to run this script with a user that has rw permission to all of the files in the target directory. The easiest way to achieve this is by running the script as root.

sudo su
./zfs-inplace-rebalancing.sh --checksum true --passes 1 /pool/path/to/rebalance

To keep track of the balancing progress, you can open another terminal and run:

watch zpool list -v

Log to File

To write the output to a file, simply redirect stdout and stderr to a file (or separate files). Since this redirects all output, you will have to follow the contents of the log files to get realtime info:

# one shell window:
tail -F ./stdout.log
# another shell window:
./zfs-inplace-rebalancing.sh /pool/path/to/rebalance >> ./stdout.log 2>> ./stderr.log

Things to consider

Although this script does have a progress output (files as well as percentage) it might be a good idea to try a small subfolder first, or process your pool folder layout in manually selected badges. This can also limit the damage done, if anything bad happens.

When aborting the script midway through, be sure to check the last lines of its output. When cancelling before or during the renaming process a ".balance" file might be left and you have to rename (or delete) it manually.

Although the --passes parameter can be used to limit the maximum amount of rebalance passes per file, it is only meant to speedup aborted runs. Individual files will not be process multiple times automatically. To reach multiple passes you have to run the script on the same target directory multiple times.

Dockerfile

To increase portability, this script can also be run using docker:

sudo docker run --rm -it -v /your/data:/data ghcr.io/markusressel/zfs-inplace-rebalancing:latest ./data

Contributing

GitHub is for social coding: if you want to write code, I encourage contributions through pull requests from forks of this repository. Create GitHub tickets for bugs and new features and comment on the ones that you are interested in.

Attributions

This script was inspired by zfs-balancer.

Disclaimer

This software is provided "as is" and "as available", without any warranty.
ALWAYS HAVE A BACKUP OF YOUR DATA!

More Repositories

1

fan2go

A simple daemon providing dynamic fan speed control based on temperature sensors.
Go
138
star
2

py-image-dedup

CLI utility to find near duplicate images and remove all but the best copy.
Python
111
star
3

KodeEditor

A simple code editor with syntax highlighting and pinch to zoom
Kotlin
71
star
4

ESPHome-Smart-Scale

An ESPHome based Smart Scale.
46
star
5

barcode-server

A simple daemon to expose USB Barcode Scanner data to other services using Websockets, Webhooks or MQTT.
Python
46
star
6

ESPHome-Analog-Clock

ESPHome configuration example to create an animated clock using the Neopixel 60 LED ring
C++
20
star
7

KodeHighlighter

Simple, extendable code highlighting for Spannables on Android.
Kotlin
16
star
8

sunix-ledstrip-controller-client

A python library for the Sunix WiFi RGBW LED strip controller (HF-LPB100 chipset)
Python
15
star
9

cli2telegram

Small utility to send Telegram messages from the CLI.
Python
15
star
10

MkDocs-Material-Dark-Theme

A dark theme for the mkdocs-material theme
HTML
11
star
11

KutePreferences

A beautiful, clean and extendable preferences library for Android written in Kotlin
Kotlin
10
star
12

PageIndicatorView

A small, simple, animated page indicator without the need for a viewpager.
Kotlin
9
star
13

openhasp-config-manager

A tool to manage all of your openHASP device configs in a centralized place.
Python
8
star
14

telegram-click

Click inspired command-line interface creation toolkit for python-telegram-bot
Python
7
star
15

keel-telegram-bot

A telegram bot for https://keel.sh/
Python
7
star
16

container-app-conf

Convenient configuration of containerized applications
Python
5
star
17

polybar-addons

A selection of utility programs for displaying stuff in polybar
Go
5
star
18

gopass-chrome-importer

Python tool to import passwords from chrome into gopass
Python
5
star
19

raspyrfm-client

A python library to send rc signals with the RaspyRFM module
Python
4
star
20

grocy-telegram-bot

A telegram bot to interact with Grocy.
Python
4
star
21

travis-telegram-bot

A travis config that can be used to send Telegram messages on new builds
Shell
4
star
22

DataMunch

Android App for managing FreeNAS
Kotlin
3
star
23

Watchface-No.-2

A dot styled watchface for the Pebble platform
C
3
star
24

TutorialTooltip

A simple and easy way to add targeted tutorial messages to your app.
Kotlin
3
star
25

telegram-click-aio

Click inspired command-line interface creation toolkit for aiogram
Python
3
star
26

AlienFXAmbilight

Ambilight for Alienware M14x Laptop
C#
3
star
27

blog

My personal blog
Vue
2
star
28

py-range-parse

Parses commonly used range notations to python objects
Python
2
star
29

unity-udp-networking-sample

Simple UDP networking for simple interaction between games.
C#
2
star
30

xs1-api-client

A python library for accessing the EZcontrolยฎ XS1 Gateway API
Python
2
star
31

freenas-api-client

Easy to use FreeNAS Api client
Kotlin
2
star
32

sunix-controller-hass-component

Home Assistant custom component for the Sunix WiFi RGBW controller
Python
2
star
33

venv-install

Install python packages to independent venvs and still use them from your cli as usual.
Shell
1
star
34

Watchface-No.-1

A simple, modular watchface for the Pebble platform.
C
1
star
35

commons

A collection of commonly useful things
Kotlin
1
star
36

DeineMudda

Nee, deine Mudda!
Python
1
star