DeGit is a decentralized Git projects hosting platform.
You can join by starting a node and pointing your browser
at 127.0.0.1
. Then, just work with it as if it was GitHub.
There is no central point of failure,
since the network of DeGit nodes is run by anonymous volunteers.
To start, simply do (it uses your .ssh/id_rsa
for authentication):
$ gem install degit
$ degit run
In a few seconds you can open https://localhost:8080
and enjoy
the system, which is very similar to GitHub. You can, of course, use
local Git repo, which is on-fly synchronized with other DeGit nodes.
Motivation and Related Works
We are not the first who are thinking about a decentralized solution for hosting and managing of Git repositories. There were a few similar products created before (if you know anything else, please submit a pull request):
- GitChain (abandoned in 2014)
- GitTorrent (abandoned in 2015)
- Drepo (abandoned in 2018)
- Radicle (read this)
- git-issue
- git-ssb
- git-dit
- pando
- mango (abandoned in 2016, watch this)
- ZeroNet (not exactly Git, but relevant)
Even though GitHub, GitLab, BitBucket, Phabricator, SourceForge, CodeCommit, and Gitee are great platforms, they have three critical drawbacks:
- They are not 100% reliable,
- They ban users for almost no reason, and
- They are under the influence of their local governments.
It seems that the need for a decentralized solution is obvious. We believe that the community would enjoy having a platform with the following features:
- Pull requests;
- Issues and milestones;
- Stars and followers;
- GitHub-like web user interface;
- Entirely free for everybody;
- Not owned by anyone;
- Moderated by the board of deputies.
DeGit doesn't support private repositories, only public ones.
How to Start?
If you want to use DeGit in order to host your repositories, just like you use GitHub, read the instructions above: they are very simple. If you want to run a node and contribute to DeGit network with your storage and computational resources, here is how:
First, you install Ruby 2.6+ and Docker (we recommend you to use Ubuntu 18.04).
Then, you make a directory, where Git repositories will be maintained. By default,
it's /var/degit
.
Next, you run this (make sure you don't have SSHD running on the server, or you will have a conflict on the port 22 already open):
$ docker run --rm --port 22:22 --volume /var/degit:/home/git cqfn/degit
The container will start and you will have an ability to manage it via
command line degit
tool. For example, to limit the amount of repositories
it hosts to 100, you just run:
$ degit config max.repositories 100
The command line degit
tool just makes changes to the files located in
/var/degit
, which are respected by the scripts inside the Docker container
running.
How It Works?
The following principles are behind the architecture of DeGit:
.degit
directory inmaster
branch is used to keep meta information- Ownership of a repo is defined by public RSA keys in
.degit
- Issues, PRs, comments, stars, etc. are regular files in
.degit
- Issues, PRs, and comments have hash codes instead of sequential IDs
- Each node decides for itself which repositories to host
- Give-and-take principle is in place: "The more you host for me, the more I host for you"
- Conflicts are resolved through proof-of-availability (PoA) consensus
- Neighbours-discovery protocol is similar to the one used in Zold
Architecture
There are a few components in the system:
The Dashboard is a web server with a GitHub-like interface to let user manage issues, pull requests, milestones and so on.
The Locator is the dispatcher of requests through the network of DeGit nodes. When the user is trying to access the server that doesn't have the repository the user is looking for, the Locator makes a tunnel to another server and redirects the request there.
The Authenticator is responsible for permissions validating and may rely on some external services, like LDAP (in case of enterprise deployment).
The Propagator makes sure that changes pushed to the server are being sent to other servers right after they are accepted.
Data Flow Explained
"Availability" is a non-negative integer assigned by a node to each of its neighbours. The number goes up on every successful interaction with the neighbour. The number goes double-down on each network failure or any other non-logical error.
Here is how the data is propagated when you interact with Git on your laptop (the same happens automatically behind the scene if you use UI in the browser):
- You
git commit
your changes to your branches - You do
git push
to yourlocalhost
- On success, a built-in post-commit hook proceeds:
- It
git fetch
from the first neighbour with the highest availability - It
git merge
if possible and all commits are signed correctly - It
git push
back to the neighbour
It is highly recommended to avoid pushing to the same branch from a few nodes, since it may lead to inability to merge and abandonded (or lost) branches.
Authorization and Authentication
A repository has a list of files in .degit/permissions
directory. Each file
starts with a public RSA key and lists user IDs, permissions, etc.
On each git push
event, post-commit hook goes through the list of added
commits and verifies permissions of each user. If any rule from
.degit/permissions
is not respected, the entire push
operation is rejected.
It is recommended to have at least two users with write access to the master
branch, in order to avoid losing access to the repo when
private RSA key is lost.
Incentives
Unlike Blockchain, a full duplication of all database in all nodes is not required for DeGit. Instead, if a few nodes have the data of a repository, this may be enough for the majority of cases. Thus, each node tries to host a limited number of repositories, according to the disc space available.
Also, each node maintains a list of repositories seen along with the addresses of their
hosting nodes. When a git fetch
arrives for a repository that
the node doesn't have, it returns an error and a suggested list of nodes
to ask for this repo.
Thus, no monetary incentives are provided to node owners, but they are not expected to contribute large computational or storage resources to the system (like it happens in Bitcoin, for example).
Anti-Spam
Vandalism
is possible through new issues and comments: they may be
submitted in large amounts. In order to fight against this, users may use
scripts (a concept very close to smart contracts)
to be executed on each post-commit hook to check
the validity of data submitted to .degit
.
Moderation
To be continued...
DeGit for Enterprise
Out-of-the box version of DeGit doesn't support private repositories. Here is how it may be modified to be hosted inside a company, to support in-house user authentication and restrict access to certain repositories (this is just an example):
Each Web Node is running a Dashboard, which is getting access information from LDAP through the AM (Authentication Module). The AM has all the information about all enterprise users and enables an additional layer of access granting on top of RSA keys.
The DB (Database) contains the entire map of all servers running Git in the enterprise and makes it easier for each node to detect the right location of a repository and redirect requests. It also, being a place of centralization, enables synchronization between nodes via locking: only one Git node may work with a branch in a repository, while all others are waiting for the lock to be released.
A set of Nginx servers may act as a load balancer. A set of HAProxy servers may work as a load balancer for SSH traffic.
How to contribute
Read these guidelines. Make sure your build is green before you contribute your pull request. You will need to have Ruby 2.6+ and Bundler installed. Then:
$ bundle update
$ bundle exec rake
If it's clean and you don't see any error messages, submit your pull request.