• Stars
    star
    650
  • Rank 68,969 (Top 2 %)
  • Language
    Go
  • License
    Other
  • Created over 3 years ago
  • Updated 15 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Finally, a good FUSE FS implementation over S3

GeeseFS is a high-performance, POSIX-ish S3 (Yandex, Amazon) file system written in Go

Overview

GeeseFS allows you to mount an S3 bucket as a file system.

FUSE file systems based on S3 typically have performance problems, especially with small files and metadata operations.

GeeseFS attempts to solve these problems by using aggressive parallelism and asynchrony.

Also check out our CSI S3 driver (GeeseFS-based): https://github.com/yandex-cloud/csi-s3

POSIX Compatibility Matrix

GeeseFS rclone Goofys S3FS gcsfuse
Read after write + + - + +
Partial writes + + - + +
Truncate + - - + +
fallocate + - - - -
chmod/chown Y - - + -
fsync + - - + +
Symlinks Y - - + +
Socket files Y - - + -
Device files Y - - - -
Custom mtime Y + - + +
xattr + - + + -
Directory renames + + * + +
readdir & changes + + - + +

Y Only works correctly with Yandex S3.

* Directory renames are allowed in Goofys for directories with no more than 1000 entries and the limit is hardcoded.

List of non-POSIX behaviors/limitations for GeeseFS:

  • File mode/owner/group, symbolic links, custom mtimes and special files (block/character devices, named pipes, UNIX sockets) are supported, but they are restored correctly only when using Yandex S3 because standard S3 doesn't return user metadata in listings and reading all this metadata in standard S3 would require an additional HEAD request for every file in listing which would make listings too slow.
  • Special file support is enabled by default for Yandex S3 (disable with --no-specials) and disabled for others.
  • File mode/owner/group are disabled by default even for Yandex S3 (enable with --enable-perms). When disabled, global permissions can be set with --(dir|file)-mode and --(uid|gid) options.
  • Custom modification times are also disabled by default even for Yandex S3 (enable with --enable-mtime). When disabled:
    • ctime, atime and mtime are always the same
    • file modification time can't be set by user (for example with cp --preserve, rsync -a or utimes(2))
  • Does not support hard links
  • Does not support locking
  • Does not support "invisible" deleted files. If an app keeps an opened file descriptor after deleting the file it will get ENOENT errors from FS operations

In addition to the items above:

  • Default file size limit is 1.03 TB, achieved by splitting the file into 1000x 5MB parts, 1000x 25 MB parts and 8000x 125 MB parts. You can change part sizes, but AWS's own limit is anyway 5 TB.

Stability

GeeseFS is stable enough to pass most of xfstests which are applicable, including dirstress/fsstress stress-tests (generic/007, generic/011, generic/013).

See also Common Issues.

Performance Features

GeeseFS rclone Goofys S3FS gcsfuse
Parallel readahead + - + + -
Parallel multipart uploads + - + + -
No readahead on random read + - + - +
Server-side copy on append + - - * +
Server-side copy on update + - - * -
xattrs without extra RTT +* - - - +
Dir preload on file lookup + - - - -
Fast recursive listings + - * - +
Asynchronous write + + - - -
Asynchronous delete + - - - -
Asynchronous rename + - - - -
Disk cache for reads + * - + +
Disk cache for writes + * - + -

* Recursive listing optimisation in Goofys is buggy and may skip files under certain conditions

* S3FS uses server-side copy, but it still downloads the whole file to update it. And it's buggy too :-)

* rclone mount has VFS cache, but it can only cache whole files. And it's also buggy - it often hangs on write.

* xattrs without extra RTT only work with Yandex S3 (--list-type=ext-v1).

Installation

  • Pre-built binaries:
    • Linux amd64. You may also need to install FUSE utils (fuse3 or fuse RPM/Debian package) first.
    • Mac amd64, arm64. You also need osxfuse/macfuse for GeeseFS to work.
    • Windows x64. You also need to install WinFSP first.
  • Or build from source with Go 1.13 or later:
$ git clone https://github.com/yandex-cloud/geesefs
$ cd geesefs
$ go build

Usage

$ cat ~/.aws/credentials
[default]
aws_access_key_id = AKID1234567890
aws_secret_access_key = MY-SECRET-KEY
$ $GOPATH/bin/geesefs <bucket> <mountpoint>
$ $GOPATH/bin/geesefs [--endpoint https://...] <bucket:prefix> <mountpoint> # if you only want to mount objects under a prefix

You can also supply credentials via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

To mount an S3 bucket on startup make sure the credential is configured for root and add this to /etc/fstab:

bucket    /mnt/mountpoint    fuse.geesefs    _netdev,allow_other,--file-mode=0666,--dir-mode=0777    0    0

You can also use a different path to the credentials file by adding ,--shared-config=/path/to/credentials.

See also: Instruction for Azure Blob Storage.

Windows

Everything is the same after installing WinFSP and GeeseFS, except that GeeseFS can't daemonize so you have to create a system service manually if you want to hide the console window.

You can put credentials to C:\Users\<USERNAME>\.aws\credentials, or you can put credentials in any file and specify it with --shared-config file.txt, or you can use environment variables:

set AWS_ACCESS_KEY_ID=...
set AWS_SECRET_ACCESS_KEY=...

And then start GeeseFS with geesefs <bucket> <mountpoint>, where <mountpoint> is either a drive (like K:) or a non-existing directory. Example:

geesefs-win-x64.exe testbucket K:

Benchmarks

See bench/README.md.

Configuration

There's a lot of tuning you can do. Consult geesefs -h to view the list of options.

Common Issues

Memory Limit

Default internal cache memory limit in GeeseFS (--memory-limit) is 1 GB. GeeseFS uses cache for read buffers when it needs to load data from the server. At the same time, default "large" readahead setting is configured to be 100 MB which is optimal for linear read performance.

However, that means that more than 10 processes trying to read large files at the same time may exceed that memory limit by requesting more than 1000 MB of buffers and in that case GeeseFS will return ENOMEM errors to some of them.

You can overcome this problem by either raising --memory-limit (for example to 4 GB) or lowering --read-ahead-large (for example to 20 MB).

Maximizing Throughput

If you have a lot of free network bandwidth and you want to achieve more MB/s of linear write speed, make sure you're writing into multiple files (not just 1) and start geesefs with the following options:

geesefs --no-checksum --memory-limit 4000 \
    --max-flushers 32 --max-parallel-parts 32 --part-sizes 25

This increases parallelism at cost of reducing maximum file size to 250 GB (10000 parts * 25 MB) and increasing memory usage. With a lot of available network bandwidth you'll be able to reach ~1.6 GB/s write speed. For example, with fio:

fio -name=test -ioengine=libaio -direct=1 -bs=4M -iodepth=1 -fallocate=none \
    -numjobs=8 -group_reporting -rw=write -size=10G

Concurrent Updates

GeeseFS doesn't support concurrent updates of the same file from multiple hosts. If you try to do that you should guarantee that one host calls fsync() on the modified file and then waits for at least --stat-cache-ttl (1 minute by default) before allowing other hosts to start updating the file. Other way is to refresh file/directory cache forcibly using setfattr -n .invalidate <filename>. This forces GeeseFS to recheck file/directory state from the server. If you don't do that you may encounter lost updates (conflicts) which are reported in the log in the following way:

main.WARNING File xxx/yyy is deleted or resized remotely, discarding local changes

Asynchronous Write Errors

GeeseFS buffers updates in memory (or disk cache, if enabled) and flushes them asynchronously, so writers don't get an error from an unsuccessful write. When an error occurs, GeeseFS keeps modifications in the cache and retries to flush them to the server later. GeeseFS tries to flush the data forever until success or until you stop GeeseFS mount process. If there is too much changed data and memory limit is reached during write, write request hangs until some data is flushed to the server to make it possible to free some memory.

fsync

If you want to make sure that your changes are actually persisted to the server you have to call fsync on a file or directory. Calling fsync on a directory makes GeeseFS flush all changes inside it. It's stricter than Linux and POSIX behaviour where fsync-ing a directory only flushes directory entries (i.e., renamed files) in it.

If a server or network error occurs during fsync, the caller receives an error code.

Example of calling fsync. Note that both directories and files should be opened as files:

#!/usr/bin/python

import sys, os
os.fsync(os.open(sys.argv[1], os.O_RDONLY))

Command-line sync utility and syncfs syscall don't work with GeeseFS because they aren't wired up in FUSE at all.

Troubleshooting

If you experience any problems with GeeseFS - if it crashes, hangs or does something else nasty:

  • Update to the latest version if you haven't already done it
  • Check your system log (syslog/journalctl) and dmesg for error messages from GeeseFS
  • Try to start GeeseFS in debug mode: --debug_s3 --debug_fuse --log-file /path/to/log.txt, reproduce the problem and send it to us via Issues or any other means.
  • If you experience crashes, you can also collect a core dump and send it to us:
    • Run ulimit -c unlimited
    • Set desired core dump path with sudo sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t
    • Start geesefs with GOTRACEBACK=crash environment variable

License

Licensed under the Apache License, Version 2.0

See LICENSE and AUTHORS

Compatibility with S3

geesefs works with:

  • Yandex Object Storage (default)
  • Amazon S3
  • Ceph (and also Ceph-based Digital Ocean Spaces, DreamObjects, gridscale etc)
  • Minio
  • OpenStack Swift
  • Azure Blob Storage (even though it's not S3)
  • Backblaze B2
  • Selectel S3

It should also work with any other S3 that implements multipart uploads and multipart server-side copy (UploadPartCopy).

Services known to be broken:

  • CloudFlare R2. They have an issue with throttling - instead of using HTTP 429 status code they return 403 Forbidden if you exceed 5 requests per seconds.

Important note: you should mount geesefs with --list-type 2 or --list-type 1 options if you use it with non-Yandex S3.

The following backends are inherited from Goofys code and still exist, but are broken:

  • Google Cloud Storage
  • Azure Data Lake Gen1
  • Azure Data Lake Gen2

References

More Repositories

1

k8s-csi-s3

GeeseFS-based CSI for mounting S3 buckets as PersistentVolumes
Go
530
star
2

yc-solution-library-for-security

Modula-3
220
star
3

docs

Yandex Cloud documentation
216
star
4

terraform-provider-yandex

Terraform Yandex provider
Go
205
star
5

examples

Examples of code, configuration files etc. to be used with Yandex Cloud.
HCL
186
star
6

yc-architect-solution-library

C#
110
star
7

skbtrace

Helper tool for generating and running BPFTrace scripts which trace and measure timings related to Linux Networking Stack, specifically SocKet Buffer contents
Go
83
star
8

python-sdk

Yandex.Cloud Python SDK
Python
81
star
9

go-sdk

Yandex.Cloud Go SDK
Go
77
star
10

cloudapi

Interface definitions of Yandex.Cloud API
74
star
11

yfm-docs

Lets make documentation on YFM
TypeScript
72
star
12

ide-plugin-jetbrains

Kotlin
67
star
13

nodejs-sdk

Yandex.Cloud NodeJS SDK
TypeScript
58
star
14

serverless-plugin

TypeScript
55
star
15

ydb-go-sdk

Deprecated. Use https://github.com/ydb-platform/ydb-go-sdk instead. Migration notes: https://github.com/ydb-platform/ydb-go-sdk/blob/master/MIGRATION_v2_v3.md
Go
42
star
16

yc-solution-library-for-aws

YC & AWS Solution Library
HCL
40
star
17

yfm-transform

Simple transformer YFM (Yandex Flavored Markdown) to HTML.
TypeScript
38
star
18

crossplane-provider-yc

Yandex Cloud Crossplane Provider
Go
32
star
19

docker-machine-driver-yandex

Yandex.Cloud driver for Docker Machine
Go
31
star
20

ydb-java-sdk

Yandex Database Java SDK
Java
31
star
21

fluent-bit-plugin-yandex

Yandex Cloud Logging output for Fluent Bit
Go
20
star
22

cq-source-yc

Yandex Cloud CloudQuery source plugin
Go
18
star
23

java-sdk

Yandex.Cloud Java SDK
Java
16
star
24

cert-manager-webhook-yandex

Go
13
star
25

yc-libvhost-server

A library for building vhost-user protocol servers
C
12
star
26

docs-components

TypeScript
11
star
27

dotnet-sdk

C#
11
star
28

ocr

Python
10
star
29

k8s-cloud-connectors

Go
10
star
30

usecases

9
star
31

yc-serverless-live-debug

TypeScript
8
star
32

yfm-documentation

7
star
33

function-ts-types

TypeScript
6
star
34

yc-guest-agent

YCGuestAgent is a binary executable, running as a service, allowing to reset user password on demand
Go
6
star
35

go-genproto

Go generated proto packages
5
star
36

sentenizer

TypeScript
5
star
37

tsconfig

4
star
38

yc-csi-driver

Yandex Cloud CSI Driver
Go
4
star
39

yc-solution-library-for-azure

HCL
4
star
40

eslint-config

JavaScript
3
star
41

grafana-logs-plugin

Grafana plugin for Yandex Cloud Logging
TypeScript
3
star
42

yc-delta

Delta Lake ะดะปั Yandex Data Processing
Java
3
star
43

yfm2xliff

JavaScript
2
star
44

ui-release-action

2
star
45

java-genproto

Java generated proto packages
Java
2
star
46

yc-api-gateway-integrations

2
star
47

vault-kms-wrapper

Go
2
star
48

stylelint-config

JavaScript
2
star
49

yandex-query-client

Python
2
star
50

react-data-table

TypeScript
1
star
51

yc-alb-ingress-controller

Go
1
star
52

terratest-module-yandex

Terratest helper module for Yandex.Cloud
Go
1
star