• Stars
    star
    122
  • Rank 292,031 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 14 years ago
  • Updated over 14 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.

DedupFS: A deduplicating FUSE file system written in Python

The Python program dedupfs.py implements a file system in user space using FUSE. It's called DedupFS because the file system's primary feature is data deduplication, which enables it to store virtually unlimited copies of files because unchanged data is only stored once. In addition to deduplication the file system also supports transparent compression using the compression methods lzo, zlib and bz2. These properties make the file system ideal for backups: I'm currently storing 250 GB worth of backups using only 8 GB of disk space.

Several aspects of the design of DedupFS were inspired by Venti (ignoring the distributed aspect, for now…) and ZFS, though I've never personally used either. The ArchiveFS and lessfs projects share similar goals but have very different implementations.

Usage

The following shell commands show how to install and use the DedupFS file system on Ubuntu (where it was developed):

$ sudo apt-get install python-fuse
$ git clone git://github.com/xolox/dedupfs.git
$ mkdir mount_point
$ python dedupfs/dedupfs.py mount_point
# Now copy some files to mount_point/ and observe that the size of the two
# databases doesn't grow much when you copy duplicate files again :-)
# The two databases are by default stored in the following locations:
#  - ~/.dedupfs-metastore.sqlite3 contains the tree and meta data
#  - ~/.dedupfs-datastore.db contains the (compressed) data blocks

Status

Development on DedupFS began as a proof of concept to find out how much disk space the author could free by employing deduplication to store his daily backups. Since then it's become more or less usable as a way to archive old backups, i.e. for secondary storage deduplication. It's not recommended to use the file system for primary storage though, simply because the file system is too slow. I also wouldn't recommend depending on DedupFS just yet, at least until a proper set of automated tests has been written and successfully run to prove the correctness of the code (the tests are being worked on).

The file system initially stored everything in a single SQLite database, but it turned out that after the database grew beyond 8 GB the write speed would drop from 8-12 MB/s to 2-3 MB/s. Therefor the file system now stores its data blocks in a separate database, which is a persistent key/value store managed by a dbm implementation like gdbm or Berkeley DB.

Limitations

In the current implementation a file's content needs to fit in a cStringIO instance, which limits the maximum file size to your free RAM. Initially I implemented it this way because I was focusing on backups of web/mail servers, which don't contain files larger than 250 MB. Then I started copying virtual disk images and my file system blew up :-(. I know how to fix this but haven't implemented the change yet.

Dependencies

DedupFS was developed using Python 2.6, though it might also work on earlier versions. It definitely doesn't work with Python 3 yet though. It requires the Python FUSE binding in addition to several Python standard libraries like anydbm, sqlite3, hashlib and cStringIO.

Contact

If you have questions, bug reports, suggestions, etc. the author can be contacted at [email protected]. The latest version of DedupFS is available at http://peterodding.com/code/dedupfs/ and http://github.com/xolox/dedupfs.

License

This software is licensed under the MIT license.
© 2010 Peter Odding <[email protected]>.

More Repositories

1

vim-notes

Easy note taking in Vim
Vim Script
1,585
star
2

vim-easytags

Automated tag file generation and syntax highlighting of tags in Vim
Vim Script
1,018
star
3

vim-session

Extended session management for Vim (:mksession on steroids)
Vim Script
961
star
4

python-coloredlogs

Colored terminal output for Python's logging module
Python
517
star
5

vim-misc

Miscellaneous auto-load Vim scripts
Vim Script
363
star
6

python-humanfriendly

Human friendly input/output for text interfaces using Python
Python
302
star
7

vim-lua-ftplugin

Lua file type plug-in for the Vim text editor
Vim Script
185
star
8

vim-shell

Improved integration between Vim and its environment (fullscreen, open URL, background command execution)
Vim Script
170
star
9

python-rotate-backups

Simple command line interface for backup rotation
Python
160
star
10

vim-colorscheme-switcher

Makes it easy to quickly switch between color schemes in Vim
Vim Script
114
star
11

python-executor

Programmer friendly subprocess wrapper
Python
98
star
12

vim-lua-inspect

Semantic highlighting for Lua in Vim
Lua
94
star
13

vim-reload

Automatic reloading of Vim scripts ((file-type) plug-ins, auto-load/syntax/indent scripts, color schemes)
Vim Script
79
star
14

lua-lxsh

Lexing & Syntax Highlighting in Lua (using LPeg)
Lua
70
star
15

lua-apr

Apache Portable Runtime binding for Lua
C
57
star
16

python-rsync-system-backup

Linux system backups powered by rsync
Python
48
star
17

vim-tools

Python scripts that make it easier (for me) to publish Vim plug-ins
Python
42
star
18

python-negotiator

Scriptable KVM/QEMU guest agent implemented in Python
Python
41
star
19

python-apt-mirror-updater

Automated, robust apt-get mirror selection for Debian and Ubuntu
Python
41
star
20

python-deb-pkg-tools

Debian packaging tools
Python
40
star
21

python-verboselogs

Verbose logging for Python's logging module
Python
33
star
22

vim-pyref

A plug-in for the Vim text editor that provides context-sensitive documentation for Python source code.
Vim Script
31
star
23

python-capturer

Easily capture stdout/stderr of the current process and subprocesses
Python
29
star
24

python-proc

Simple interface to Linux process information
Python
22
star
25

python-chat-archive

Easy to use offline chat archive
Python
18
star
26

python-redock

Human friendly wrapper around Docker
Python
16
star
27

sync-dotfiles

Quickly push your dotfiles from your workstation to your servers.
16
star
28

python-auto-adjust-display-brightness

Automatically adjust Linux display brightness
Python
15
star
29

vim-publish

A Vim plug-in that helps you publish hyperlinked, syntax highlighted source code
Vim Script
14
star
30

python-property-manager

Useful property variants for Python programming
Python
13
star
31

python-qpass

Frontend for pass (the standard unix password manager)
Python
13
star
32

python-naturalsort

Simple natural order sorting API for Python that just works
Python
12
star
33

python-vcs-repo-mgr

Version control repository manager
Python
12
star
34

mopidy-simple-webclient

Simple and minimalistic Mopidy HTTP client, touch friendly, works in most (mobile) web browsers
JavaScript
12
star
35

mpd-myfm

A client for Music Player Daemon that fills your playlist based on similar artists from Last.fm
Python
10
star
36

python-preview-markup

Live preview Markdown and reStructuredText files as HTML in a web browser
Python
8
star
37

lua-buildbot

A build bot for popular Lua projects (Lua 5.1, LuaJIT 1 & LuaJIT 2)
Lua
8
star
38

python-debuntu-tools

Debian and Ubuntu system administration tools
Python
7
star
39

python-linux-utils

Linux system administration tools for Python
Python
7
star
40

python-apache-manager

Monitor and control Apache web server workers from Python
Python
6
star
41

python-npm-accel

Accelerator for npm, the Node.js package manager
Python
6
star
42

python-update-dotdee

Generic modular configuration file manager
Python
6
star
43

python-crypto-drive-manager

Unlock all your encrypted drives with one pass phrase
Python
5
star
44

vim-tlv-mode

Transaction-Level Verilog support for Vim
Vim Script
5
star
45

python-pdiffcopy

Fast large file synchronization inspired by rsync
Python
5
star
46

python-gentag

Simple and powerful tagging for Python objects
Python
4
star
47

python-dwim

Location aware application launcher
Python
3
star