Shepherd
Shepherd is a utility for applying code changes across many repositories.
- Powerful: You can write migration scripts using your favorite Unix commands, tools like
jscodeshift
, or scripts in your preferred programming language. - Easy: With just a few commands, you can checkout dozens of repositories, apply changes, commit those changes, and open pull requests with detailed messages.
- Flexible: Ships with support for Git/GitHub, but can easily be extended to work with other version control products like Bitbucket, GitLab, or SVN.
For more high level context, this blog post covers the basics.
Getting started
Install the Shepherd CLI:
npm install -g @nerdwallet/shepherd
Shepherd will now be available as the shepherd
command in your shell:
shepherd --help
Usage: shepherd [options] [command]
...
Take a look at the tutorial for a detailed walkthrough of what Shepherd does and how it works, or read on for a higher-level and more brief look!
Motivation for using Shepherd
Moving away from monorepos and monolithic applications has generally been a good thing for developers because it allows them to move quickly and independently from each other. However, it's easy to run into problems, especially if your code relies on shared libraries. Specifically, making a change to shared code and then trying to roll that shared code out to all consumers of that code becomes difficult:
- The person updating that library must communicate the change to consumers of the library
- The consumer must understand the change and how they have to update their own code
- The consumer must make the necessary changes in their own code
- The consumer must test, merge, and deploy those changes
Shepherd aims to help shift responsibility for the first three steps to the person actually making the change to the library. Since they have the best understanding of their change, they can write a code migration to automate that change and then user Shepherd to automate the process of applying that change to all relevant repos. Then the owners of the affected repos (who have the best understanding of their own code) can review and merge the changes. This process is especially efficient for teams who rely on continuous integration: automated tests can help repository owners have confidence that the code changes are working as expected.
Writing migrations
A migration is declaratively specified with a shepherd.yml
file called a spec. Here's an example of a migration spec that renames .eslintrc
to .eslintrc.json
in all NerdWallet repositories that have been modified in 2018:
id: 2018.07.16-eslintrc-json
title: Rename all .eslintrc files to .eslintrc.json
adapter:
type: github
search_type: code
search_query: org:NerdWallet path:/ filename:.eslintrc
hooks:
should_migrate:
- ls .eslintrc # Check that this file actually exists in the repo
- git log -1 --format=%cd | grep 2018 --silent # Only migrate things that have seen commits in 2018
post_checkout: npm install
apply: mv .eslintrc .eslintrc.json
pr_message: echo 'Hey! This PR renames `.eslintrc` to `.eslintrc.json`'
Let's go through this line-by-line:
id
specifies a unique identifier for this migration. It will be used as a branch name for this migration, and will be used internally by Shepherd to track state about the migration.title
specifies a human-readable title for the migration that will be used as the commit message.adapter
specifies what version control adapter should be used for performing operations on repos, as well as extra options for that adapter. Currently Shepherd only has a GitHub adapter, but you could create a Bitbucket or GitLab adapter if you don't use GitHub. Note thatsearch_query
is specific to the GitHub adapter: it uses GitHub's code search qualifiers to identify repositories that are candidates for a migration. If a repository contains a file matching the search, it will be considered a candidate for this migration. As an alternative tosearch_query
, GitHub adapter can be configured withorg: YOURGITHUBORGANIZATION
. When usingorg
, every repo in the organization that is visible will be considered as a candidate for this migration.search_type
(optional): specifies search type - either 'code' or 'repositories'. If repositories is specified, it does a Github repository search. Defaults to code search if not specified.
The options under hooks
specify the meat of a migration. They tell Shepherd how to determine if a repo should be migrated, how to actually perform the migration, how to generate a pull request message for each repository, and more. Each hook consists of one or more standard executables that Shepherd will execute in sequence.
should_migrate
is a sequence of commands to execute to determine if a repo actually requires a migration. If any of them exit with a non-zero value, that signifies to Shepherd that the repo should not be migrated. For instance, the second step in the aboveshould_migrate
hook would fail if the repo was last modified in 2017, sincegrep
would exit with a non-zero value.post_checkout
is a sequence of commands to be executed once a repo has been checked out and passed anyshould_migrate
checks. This is a convenient place to do anything that will only need to be done once per repo, such as installing any dependencies.apply
is a sequence of commands that will actually execute the migration. This example is very simple: we're just usingmv
to rename a file. However, this hook could contain arbitrarily many, potentially complex commands, depending on the requirements of your particular migration.pr_message
is a sequence of commands that will be used to generate a pull request message for a repository. In the simplest case, this can just be a static message, but you could also programmatically generate a message that calls out particular things that might need human attention. Anything written tostdout
will be used for the message. If multiple commands are specified, the output from each one will be concatenated together.
should_migrate
and post_checkout
are optional; apply
and pr_message
are required.
Each of these commands will be executed with the working directory set to the target repository. Shepherd exposes some context to each command via specific environment variables. Some additional enviornment variables are exposed when using the git
or github
adapters.
-
SHEPHERD_REPO_DIR
is the absolute path to the repository being operated on. This will be the working directory when commands are executed. -
SHEPHERD_DATA_DIR
is the absolute path to a special directory that can be used to persist state between steps. This would be useful if, for instance, ajscodeshift
codemod in yourapply
hook generates a list of files that need human attention and you want to use that list in yourpr_message
hook. -
SHEPHERD_BASE_BRANCH
is the name of the branch Shepherd will set up a pull-request against. This will often, but not always, be master. Only available forapply
and later steps. -
SHEPHERD_MIGRATION_DIR
is the absolute path to the directory containing your migration'sshepherd.yml
file. This is useful if you want to include a script with your migration spec and need to reference that command in a hook. For instance, if I have a scriptpr.sh
that will generate a PR message: mypr_message
hook might look something like this:pr_message: $SHEPHERD_MIGRATION_DIR/pr.sh
-
SHEPHERD_GIT_REVISION
(git
andgithub
adapters) is the current revision of the repository being operated on. -
SHEPHERD_GITHUB_REPO_OWNER
(github
adapter) is the owner of the repository being operated on. For example, if operating on the repositoryhttps://github.com/NerdWalletOSS/shepherd
, this would beNerdWalletOSS
. -
SHEPHERD_GITHUB_REPO_NAME
(github
adapter) is the name of the repository being operated on. For example, if operating on the repositoryhttps://github.com/NerdWalletOSS/shepherd
, this would beshepherd
.
Commands follow standard Unix conventions: an exit code of 0 indicates a command succeeded, a non-zero exit code indicates failure.
Usage
Shepherd is run as follows:
shepherd <command> <migration> [options]
<migration>
is the path to your migration directory containing a shepherd.yml
file.
There are a number of commands that must be run to execute a migration:
checkout
: Determines which repositories are candidates for migration and clones or updates the repositories on your machine. Clones are "shallow", containing no git history. Usesshould_migrate
to decide if a repository should be kept after it's checked out.apply
: Performs the migration using theapply
hook discussed above.commit
: Makes a commit with any changes that were made during theapply
step, including adding newly-created files. The migration'stitle
will be prepended with[shepherd]
and used as the commit message.push
: Pushes all commits to their respective repositories.pr-preview
: Prints the commit message that would be used for each repository without actually creating a PR; uses thepr_message
hook.pr
: Creates a PR for each repo with the message generated from thepr_message
hook.version
: Prints Shepherd version
By default, checkout
will use the adapter to figure out which repositories to check out, and the remaining commands will operate on all checked-out repos. To only checkout a specific repo or to operate on only a subset of the checked-out repos, you can use the --repos
flag, which specifies a comma-separated list of repos:
shepherd checkout path/to/migration --repos facebook/react,google/protobuf
Run shepherd --help
to see all available commands and descriptions for each one.
Developing
Run yarn
to install dependencies.
Shepherd is written in TypeScript, which requires compilation to JavaScript. When developing Shepherd, it's recommended to run yarn build:watch
in a separate terminal. This will incrementally compile the source code as you edit it. You can then invoke the Shepherd CLI by referencing the absolute path to the compiled cli.js
file:
cd ../my-other-project
../shepherd/lib/cli.js checkout path/to/migration
Shepherd currently has minimal test coverage, but we're aiming to improve that with each new PR. Tests are written with Jest and should be named in a *.test.ts
alongside the file under test. To run the test suite, run yarn test
.
We use ESLint to ensure a consistent coding style and to help prevent certain classes of problems. Run yarn lint
to run the linter, and yarn fix-lint
to automatically fix applicable problems.