• Stars
    star
    991
  • Rank 46,212 (Top 1.0 %)
  • Language
  • Created almost 6 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Monorepo vs. polyrepo: architecture for source code management (SCM) version control systems (VCS)

Monorepo vs. polyrepo

Objective

Monorepo means using one repository that contains many projects, and polyrepo means using a repository per project. This page discusses the similarities and differences, and has advice and opinions on both.

Contents:

See:

Opinion posts:

Hacker News dicussions:

Credits:

  • Opinions and comments on this page are thanks to many people on various discussion websites, such as Hacker News, and lightly edited for clarity.

  • If you're the author of an opinion here, and would like to attribute it, or explain more, please let us know and we'll give you commit access.

Introduction

What is monorepo?

Monorepo is a nickname that means "using one repository for the source code management version control system".

  • A monorepo architecture means using one repository, rather than multiple repositories.

  • For example, a monorepo can use one repo that contains a directory for a web app project, a directory for a mobile app project, and a directory for a server app project.

  • Monorepo is also known as one-repo or uni-repo.

What is polyrepo?

Polyrepo is a nickname that means "using multiple repostories for the source code management version control system".

  • A polyrepo architecture means using multiple repositories, rather than one repository.

  • For example, a polyrepo can use a repo for a web app project, a repo for a mobile app project, and a repo for a server app project.

  • Polyrepo is also known as many-repo or multi-repo.

Comparisons

Key similarities

Key similarities between monorepo and polyrepo:

  • Both architectures ultimately track the same source code files, and do it by using source code management (SCM) version control systems (VCS) such as git or mercurial.

  • Both architectures are proven successful for projects of all sizes.

  • Both architectures are straightforward to implement using any typical SCM VCS, up to a scaling limit.

Key differences

Key differences between monorepo and polyrepo, summarized from many proponents, and intending to highlight typical differences between a typical monorepo and a typical polyrepo.

Monorepo Polyrepo
Contents Typically a repo contains multiple projects, programming languages, packaging processes, etc. Typically a repo contains one project, programming language, packaging process, etc.
Projects Manages projects in one repository, together, holistically. Manages projects in multiple repositories, separately, independently.
Workflows Enables workflows in all projects simultaneously, all within the monorepo. Enables workflows in each project one at a time, each in its own repo.
Changes Ensures changes affect all the projects, can be tracked together, tested together, and released together. Ensures changes affect only one project, can be tracked separately, tested separately, and released separately.
Collaboration Encourages collaboration and code sharing within an organization. Encourages collaboration and code sharing across organizations.
Testing Good white box testing because all projects are testable together, and verifiable holistically. Good black box testing because each project is testable separately, and verifiable independently.
Releases Coordinated releases are inherent, yet must use a polygot of tooling. Coordinated releases must be programmed, yet can use vanilla tooling.
State The current state of everything is one commit in one repo. The current state of everything is a commit per repo.
Coupling Tight coupling of projects. No coupling of projects.
Thinking Encourages thinking about conjoins among projects. Encourages thinking about contracts between projects.
Access

Access control defaults to all projects.

Some teams use tools for finer-grained access control.

Gitlab and GitHub offer ownership control where you can say who owns what directories for things like approving merge requests that affect those directories(using CODEOWNERS). Google Piper has finer-grained access control. Phabricator offers herald rules that stop a merge from happening if a file has changed in a specific subdirrectory. Some teams use service owners, so when a change spans multiple services, they are all added automatically as blocking reviewers.

Access control defaults to per project.

Some teams use tools for broader-grained access control.

GitHub offers teams where you can say one team owns many projects and for things like approving requests that affect multiple repos.

Scaling Scaling needs specialized tooling. It is currently not practical to use vanilla git with very large repos, or very large files, without any extra tooling. For monorepo scaling, teams invest in writing custom tooling and providing custom training. Scaling needs specialized coordination. It is currently not practical to use vanilla git with many projects across many repos, where a team wants to coordinate code changes, testing, packaging, and releasing. For polyrepo scaling, teams invest in writing coordination scripts and careful cross-version compatibility.
Tooling Google wrote the tool “bazel”, which tracks internal dependencies by using directed acyclic graphs. Lyft wrote the tool “refactorator”, which automates making changes in multiple repos, including opening PRs, tracking status, etc.

Tooling

Bazel

Bazel is a fast, scalable, multi-language and extensible build system. Bazel only rebuilds what is necessary. With advanced local and distributed caching, optimized dependency analysis and parallel execution, you get fast and incremental builds.

Bazel requires you to explicitly declare your dependencies for each 'target' you want to build. These dependencies can be within the same Bazel workspace, or imported at build time via say git - there's no need to have all the files directly in your repo.

The nice thing is you can declare the commit id or file hash for the dependency you're importing to make sure you're getting what you expect, and keep Bazel's reproducibility properties.

moon

moon is a multi-language task runner and monorepo management tool. Like Bazel, it only rebuilds what is necessary, with support for local and remote caching, dependency analysis, parallel execution, incremental builds, and even a robust language toolchain.

moon is not a build system, but is a powerful task runner and organization tool, so if you're looking for a tool that sits somewhere between Bazel (full commitment) and Make scripts (no commitment), moon is a great choice.

Lerna

Lerna is a tool that optimizes the workflow around managing multi-package repositories with git and npm. It's primarily for Node.js based repositories.

OctoLinker

Octolinker really helps when browsing a polyrepo on Github. You can just click the import [project] name and it will switch to the repo.

Nx

Nx is a build system with monorepo support and powerful integrations.

Monorepo scaling

Monorepo scaling problem

Monorepo scaling becomes a problem when a typical developer can't work well with the code by using typical tools such as vanilla git.

  • Monorepo scaling eventually becomes impractical in terms of space: when a monorepo grows to have more data than fits on a developer's laptop, then the developer cannot fetch the monorepo, and it may be impractical to obtain to more storage space.

  • Monorepo scaling eventually becomes impractical in terms of time: when a monorepo grows, then a complete file transfer takes more time, and in practice, there are other operations that also take more time, such as git pruning, git repacking.

  • A monorepo may grow so large contain so many projects that it takes too much mental effort to work across projects, such as for searching, editing, and isolating changes.

Monorepo scaling mitigations

Monorepo scaling can be improved by:

  • Some type of virtual file system (VFS) that allows a portion of the code to be present locally. This might be accomplished via a proprietary VCS like Perforce which natively operates this way, or via Google’s “G3” internal tooling, or via Microsoft’s GVFS.

  • Sophisticated source code indexing/searching/discovery capabilities as a service. This is because a typical developer is not going to have all the source code locallly, in a searchable state, using vanilla tooling.

Monorepo scaling metrics

Monorepo scaling seems to become an issue, in practice, at approximately these kinds of metrics:

  • 10-100 developers writing code full time.

  • 10-100 projects in progress at the same time.

  • 10-100 packaging processes during the same time period, such as a daily release.

  • 1K-10K versioned dependencies, such as Node modules, Python packages, Ruby gems, etc.

  • 1M-10M lines of code

Proponents of monorepo

If components need to release together, then use a monorepo

If you think components might need to release together then they should go in the same repo, because you can in fact pretty easily manage projects with different release schedules from the same repo if you really need to.

On the other hand if you've got a whole bunch of components in different repos which need to release together it suddenly becomes a real pain.

If components need to share common code, then use a monorepo

If you have components that will never need to release together, then of course you can stick them in different repositories-- but if you do this and you want to share common code among the repositories, then you will need to manage that code with some sort of robust versioning system, and robust versioning systems are hard. Only do something like that when the value is high enough to justify the overhead. If you're in a startup, chances are very good that the value is not high enough.

I’ve found monorepos to be extremely valuable in an less-mature, high-churn codebase

Need to change a function signature or interface? Cool, global find & replace.

At some point a monorepo outgrows its usefulness. The sheer amount of files in something that’s 10K+ LOC (not that large, I know) warrants breaking apart the codebase into packages.

Still, I almost err on the side of monorepos because of the convenience that editors like vscode offer: autocomplete, auto-updating imports, etc.

A common mission

I find it helpful to think of a company as a group of people engaged in a common mission. The company pursues its mission through multiple subprojects, and every decision taken and every code change introduced is a step towards its primary goal. The code base is a chunk of the company's institutional knowledge about its overarching goal and means to that end.

Looking at it from this perspective, a monorepo can be seen as the most natural expression of the fact that all team members are engaged in a single, if multi-faceted, enterprise.

Proponents of polyrepo

If tech's biggest names use a monorepo, should we do the same?

Some of tech’s biggest names use a monorepo, including Google, Facebook, Twitter, and others. Surely if these companies all use a monorepo, the benefits must be tremendous, and we should all do the same, right? Wrong!

Why? Because, at scale, a monorepo must solve every problem that a polyrepo must solve, with the downside of encouraging tight coupling, and the additional herculean effort of tackling VCS scalability.

Thus, in the medium to long term, a monorepo provides zero organizational benefits, while inevitably leaving some of an organization’s best engineers with a wicked case of PTSD (manifested via drooling and incoherent mumbling about git performance internals).

Coupling between unrelated projects

I worry about the monorepo coupling between unrelated products. While I admit part of this probably comes from my more libertarian world view but I have seen something as basic as a server upgrade schedule that is tailored for one product severely hurt the development of another product, to the point of almost halting development for months. I can't imagine needing a new feature or a big fix from a dependency but to be stuck because the whole company isn't ready to upgrade.

I've read of at least one less serious case of this from google with JUnit: "In 2007, Google tried to upgrade their JUnit from 3.8.x to 4.x and struggled as there was a subtle backward incompatibility in a small percentage of their usages of it. The change-set became very large, and struggled to keep up with the rate developers were adding tests."

Visible organization

I argue that a visible organization of a codebase into repositories makes it easier to reuse code in the same way that interface/implementation splits do: it makes it clearer which parts felt domain-specific and which felt like reusable libraries.

Being able to represent "not directly involved, but versioned together" and "separate enough to be versioned separately" is a very valuable distinction to have in your toolbox.

Once your team is large enough that your developers are not all attending the same standup, then you should be working in multiple repositories. You need to have a release cycle with semVer etc. so that developers who aren't in close communication with you can understand the impact of changes to your code area. Since tags are repository-global, the repository should be the unit of versioning/releasing.

Opinions about splitting

Splitting one repo is easier than combining multiple repos

You can split big repositories into smaller ones quite easily (in Git anyway). If you only need to do this once, then subtree will do the job, even retaining all your history if you want. As another way to split, you can duplicate the repo and pull trees out of each dupe in normal commits.

But combining small repositories together into a bigger repo is a lot harder.

So start out with a monorepo.

Only split a monorepo into multiple smaller repositories when you're clear that it really makes sense.

Splitting may be too fine

My problem with polyrepo is that often organizations end up splitting things too finely, and now I'm unable to make a single commit to introduce a feature because my changes have to live across several repositories.

This makes code review more annoying because you have to tab back and forth to see all the context.

This makes it worse to make changes to fundamental (internal) libraries used by every project. It's too much hassle to track down all the uses of a particular function, so I end up putting that change elsewhere, which means someone else will do it a little different in their corner of the world, which utterly confuses the first person who's unlucky enough to work in both code bases (at the same time, or after moving teams).

Opinions about balances

It's a social problem in how you manage boundaries

Among many of the major monorepos, boundaries still exist, they just become far more opaque because no one has to track them. You find the weird gatekeepers in the dark that spring out only when you get late in your code review process because you touched "their" file and they got an automated notice from a hidden rules engine in your CI process you didn't even realize existed.

In the polyrepo case those boundaries have to be made explicit (otherwise no one gets anything done) and those owners should be easily visible. You may not like the friction they sometimes bring to the table, but at least it won't be a surprise.

Challenges of monorepo and polyrepo

My last 2 jobs have been working on developer productivity for 100+ developer organizations. One organization uses monorepo, one organization uses polyrepo.

Neither really seems to result in less work, or a better experience. But I've found that your choice just dictates what type of problems you have to solve.

Monorepo involves mostly challenges around scaling the organization in a single repo.

Polyrepo involves mostly challenges with coordination.

On-premise applications or desktop applications

If you're creating on-premise applications or desktop applications (things requiring long lived release branches), the discussion is totally different.

I find that shipped software actually operates better in a normal, branched monorepo. You just branch the whole thing. The alternative is several repos and using a package manager, which minimizes merging in many branches, as you can just point the package manager at the updated module version, but brings its own hassles.

I've worked on projects where there were 6-7 major branches active at the same time and several smaller ones, besides the master branch. Then you'd have to merge everywhere applicable, etc. This is a totally different approach from the Google monorepo approach of "master only", basically. And probably one of the main reasons why Golang is having a ton of difficulties in the outside world by not having a proper versioning story.

Comment: Once youre shipping software off prem you need to patch it between major and minor releases. Typically one way to do that is to branch when you do a release to a branch namded for the release. Say 1.2. Then when issues pop up you fix it in the branch then see if it applies to the trunk or other branches after that.

Opinions about alternatives

Could you get the best of both worlds by having a monorepo of submodules?

Code would live in separate repos, but references would be declared in the monorepo. Checkins and rollbacks to the monorepo would trigger CI.

Answer: There's not much good to either world. You need fairly extensive tooling to make working with a repo of submodules comfortable at any scale. At large scale, that tooling can be simpler than the equivalent monorepo tooling, assuming that your individual repos remain "small" but also appropriately granular (not a given--organizing is hard, especially if you leave it to individual project teams). However, in the process of getting there, a monorepo requires no particular bespoke tooling at small or even medium scale (it's just "a repo"), and the performance intolerability pretty much scales smoothly from there. And those can be treated as technical problems if you don't want to approach social problems.

Answer: We actually did this. When I started at Uber ATG one of our devs made a submodule called uber_monorepo that was linked from the root of our git repo. In our repo's .buckconfig file we had access to everything that the mobile developers at Uber had access to by prefixing our targets with //uber_monorepo/. We did however run into the standard dependency resolution issue when you have any loosely coupled dependency. Updating our submodule usually required a 1-2 day effort because we were out of sync for a month or two.

Hybrid of "many repos"

We consider our org to be "many repos". We have several thousands. However, hundreds of them contain 5, 10, or 20+ packages/projects/services. It's funny because we'll talk about creating "monorepos" (plural) for certain part of our product, and it confuses people.

There's a few thousand libraries, those obviously refer to each other.. Some have readme files and that's enough, some have full documentation "books", some have comments in the code and that's enough.

We don't mandate a company wide development process, so each team and groups can choose their own process and how they track their stuff.

We do have automation and tooling to keep track of things though.

Prediction of a new type of VCS

What we all really want is a VCS where repos can be combined and separated easily, or where one repo can gain the benefits of a monorepo without the drawbacks of one.

Prediction: just as DVCS killed off pre-DVCS practically overnight, the thing that will quickly kill off DVCS is a new type of VCS where you can trivially combine/separate repos and sections of repos as needed. You can assign, at the repo level, sub-repos to include in this one, get an atomic commit hash for the state of the whole thing, and where my VCS client doesn't need to actually download every linked repo, but where tools are available to act like I have.

In a sense, we already have all of these features, in folders. You can combine and separate them, you can make a local folder mimic a folder on a remote system, and access its content without needing to download it all ahead of time. They just don't have any VCS features baked in. We've got {filesystems, network filesystems, and VCS}, and each of the three has some features the others would like.

More Repositories

1

architecture-decision-record

Architecture decision record (ADR) examples for software planning, IT leadership, and template documentation
11,989
star
2

queueing-theory

Queueing theory: an introduction for software development
2,078
star
3

git-commit-message

Git commit message: how to write a great git commit message and commit template for version control
915
star
4

ways-of-working

Ways of Working (WoW) with team principles, values, tenets, ground rules, aspirations, norms, working agreements, shared expectations, and group understandings
640
star
5

objectives-and-key-results

Objectives and Key Results (OKR) examples for goals, tasks, plans, projects, and strategy.
344
star
6

pitch-deck

Pitch deck advice for startup founders who want to raise venture capital investment
285
star
7

demo-rust-axum

Demo of Rust and axum web framework with Tokio, Tower, Hyper, Serde
Rust
278
star
8

github-special-files-and-paths

GitHub special files and paths, such as README, LICENSE, .github, docs, dependabot, workflows.
192
star
9

stable-diffusion-image-prompt-gallery

Stable Diffusion: image prompt gallery of examples for various prompts
Shell
178
star
10

maturity-models

Maturity models for IT, Agile, DevOps, TOGAF, Six Sigma, P3M3, etc.
172
star
11

plantuml-examples

PlantUML eaxmples for UML, ERD, wireframes, mind maps, JSON, YAML, WBS, ASCII art, Gantt charts, C4 models, and more
168
star
12

git-commit-template

Git commit template for better commit messages
163
star
13

stable-diffusion-macos-install-help

Stable Diffusion: macOS install help with homebrew, python, anaconda, dream, etc.
130
star
14

key-performance-indicator

Key performance indicator (KPI) examples for metrics, measurements, objectives and key results (OKRs)
129
star
15

brewfile

Brewfile
Ruby
111
star
16

crucial-conversations

Crucial conversations: lessons from the worldwide bestseller book
95
star
17

decision-record

Decision record: how to initiate and complete decisions for teams, organizations, and systems
92
star
18

statement-of-work

Statement Of Work (SOW) example
67
star
19

demo-swift-excel-xlsx-reader-writer

Demo Swift Excel Xlsx Reader Writer
Swift
60
star
20

inclusive-language

Inclusive language
57
star
21

strategic-balanced-scorecard

Strategic Balanced Scorecard: planning business by using OKRs, KPIs, and initiatives
49
star
22

issues

Issues: feature requests, bug reports, customer complaints, security alerts, etc.
44
star
23

oblique-strategies

Oblique Strategies: ideas for creative lateral thinking
42
star
24

care-plan

Care plan: a free open source care plan template for caregivers to help with medical, financial, government, legal, and practical care.
41
star
25

startup-superset

Startup superset: summaries of key concepts, ideas, insights
40
star
26

functional-specifications-template

Functional specifications template
40
star
27

always-improving

Book summaries by "alwaysimproving" for business, productivity, life skills, etc.
35
star
28

milestones

Milestones ideas and examples for project management
32
star
29

software-development-methodologies

Software development methodologies: summaries of agile, scrum, DAD, SAFe, etc.
31
star
30

social-network-plan

Social network plan - goals, ideas, step, and tasks for a new site
30
star
31

company-culture

Company culture ideas from Amazon, Netflix, Harvard, Ultimate, etc.
28
star
32

versioning

Versioning: what it is, how to do it, comments and discussion
28
star
33

leadership

Leadership and management ideas
27
star
34

spade-decision-framework

SPADE decision framework: Setting, People, Alternatives, Decide, Explain
27
star
35

system-quality-attributes

Cross-Functional Requirements a.k.a. Quality Attributes
26
star
36

ooda-loop

OODA loop: notes on John Boyd, strategy, tactics, planning, and paradigms
25
star
37

powerful-questions

Powerful questions - catalyzing insight, innovation, action
25
star
38

business-model-canvas

Business model canvas for value propositions, customer relationships, partner collaborations, etc.
25
star
39

thought-leadership-writing

Thought leadership writing tips for content creators, bloggers, authors, and editors
24
star
40

awesome-developing

Awesome developing: ideas for how to create better software code and collaboration
24
star
41

value-stream-mapping

Value Stream Mapping (VSM) tutorial (work in progress)
24
star
42

big-five-personality-traits

Big Five personality traits: domains, aspects, facets
22
star
43

source-code-management

Source code management → notes and ideas → mono-repos, trunk-based-development, etc.
21
star
44

key-risk-indicator

Key risk indicator (KRI) for risk management and business strategy
20
star
45

wordbooks

Demo wordbooks for business, projects, industries, software, consulting, and more
Lua
20
star
46

code-of-conduct-guidelines

Code of Conduct Guidelines
Shell
19
star
47

git-branch-name

Git branch name ideas, naming conventions, and how to use git branch edit description
19
star
48

git-workflow-help

Git flow help: research on Git flow, GitHub flow, GitLab flow, etc.
18
star
49

demo-tailwind-css

Demo Tailwind CSS along with Gulp and PostCSS
JavaScript
18
star
50

functional-specifications-tutorial

Functional specifications tutorial
17
star
51

sha256-sentence

SHA256 sentence: discover a SHA256 checksum that matches a sentence's description of hex digit words.
Rust
17
star
52

team-focus

TEAM FOCUS concepts by Paul N. Friga in McKinsey Engagement.
17
star
53

stakeholder-analysis

Stakeholder analysis for business project management
HTML
17
star
54

smart-criteria

SMART criteria for goals, objectives, plans, etc.
16
star
55

vision_mission_statements

Vision statements and mission statements by many companies and organizations
16
star
56

interviewing

Interviewing ideas for hiring managers, job seekers, and recruiting candidates
16
star
57

goals-ideas-steps-tasks

Goals, Ideas, Steps, Tasks: GIST Planning
14
star
58

icebreaker-questions

Icebreaker questions to help people, groups, teams, meetings, and such
14
star
59

coordinated-disclosure

Coordinated disclosure for security discoveries, bug reports, etc.
13
star
60

outputs-vs-outcomes

Outputs vs. outcomes: what's the different and why does it matter?
12
star
61

discovery-assessment

Discovery assessment for project management
12
star
62

critical-success-factor

Critical Success Factor (CSF) tutorial
12
star
63

software-operations-items

Software operations items
12
star
64

demo-rust-cargo-tdd

Demo of Rust and Cargo for TDD (test driven development)
Rust
11
star
65

agile-assessment

Agile assessment exercise ideas
11
star
66

metrics

Metrics
10
star
67

responsibility-assignment-matrix

Responsibility assignment matrix (RAM) a.k.a. linear responsibility chart (LRC)
10
star
68

social-value-orientation

Social value orientation (SVO) notes for pro-social pro-self concepts
10
star
69

demo-elixir-phoenix

Demonstration of Elixir language and Phoenix framework
Elixir
10
star
70

pgp-gpg-help

Pretty Good Privacy (PGP) GNU Privacy Guard (GPG) help for encryption
9
star
71

causal-analysis-based-on-system-theory

Causal Analysis based on System Theory (CAST)
9
star
72

demo-job-descriptions

Demo job descriptions
9
star
73

demo-terraform-aws

Demo of Terraform by Hasicorp for AWS
HCL
9
star
74

lean-business-lists

Lean business lists
9
star
75

issue-postmortem-template

Issue postmortem template for incident response documentation
8
star
76

net-promoter-score

Net Promoter Score (NPS) introduction and recommendations
8
star
77

initiatives-and-experiments

Initiatives and experiments
8
star
78

demo-rust-cursive

Demo Rust Cursive crate for terminal user interface (TUI)
Rust
8
star
79

demo-swift-rest

Demo Swift REST
HTML
8
star
80

git-hooks

Git hooks
Shell
7
star
81

adkar-change-management-model

ADKAR change management model: awareness, desire, knowledge, ability, reinforcement
7
star
82

demo-devops

Demo devops
7
star
83

demo-optaplanner

Demo OptaPlanner constraint solver
Java
6
star
84

intent-plan

Intent plan for mision, state, sequence, decisions, antigoals, constraints, expressives
6
star
85

demo-rust-rocket

Demo of Rust Rocket web application framework
Rust
6
star
86

task-life-cycle

Task life cycle (TLC)
5
star
87

feedback-request-template

Feedback request template
5
star
88

first-aid-kit

First Aid Kit
5
star
89

inspiring-people

Inspiring people: one hundred living people with short bios thanks to Wikipedia
5
star
90

principles

Principles: summaries of ethical prinicples, leadership principles, teamwork principles, ui/ux design principles, software programming principles, etc.
5
star
91

aberystwyth-wales-book

Aberystwth Wales Book - Photos of the Town
Shell
4
star
92

safety-philosophy

Safety philosopy: example principles for an organization and management
4
star
93

demo-aws-ses-smtp

Demo of Amazon Web Services (AWS) Simple Email Service (SES) Simple Mail Transfer Protocol (SMTP)
4
star
94

demo-gulp

Demo gulpfile.js using Gulp, PostCSS, Tailwind CSS, Pino, unfold, and more
JavaScript
4
star
95

git-troubleshooting

Git troubleshooting
4
star
96

demo-svelte-hello-world

Demo Svelte JavaScript framework "hello world" app
JavaScript
4
star
97

demo-ansible

Demo Ansible configuration tool for install, update, users, groups, etc.
4
star
98

quad-chart

Quad chart: a rapid project planning summary guide
4
star
99

adventure-gear

Adventure gear for traveling, backpacking, camping, and more
4
star
100

open-source-project-flavors

Open source project flavors: nicknames for various kinds of project types and categories
4
star