{"id":346449,"date":"2021-08-17T07:53:41","date_gmt":"2021-08-17T14:53:41","guid":{"rendered":"https:\/\/css-tricks.com\/?p=346449"},"modified":"2021-08-17T07:53:44","modified_gmt":"2021-08-17T14:53:44","slug":"from-a-single-repo-to-multi-repos-to-monorepo-to-multi-monorepo","status":"publish","type":"post","link":"https:\/\/css-tricks.com\/from-a-single-repo-to-multi-repos-to-monorepo-to-multi-monorepo\/","title":{"rendered":"From a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo"},"content":{"rendered":"\n

I’ve been working on the same project for several years. Its initial version was a huge monolithic app containing thousands of files. It was poorly architected and non-reusable, but was hosted in a single repo making it easy to work with. Later, I \u201cfixed\u201d the mess in the project by splitting the codebase into autonomous packages, hosting each of them on its own repo, and managing them with Composer. The codebase became properly architected and reusable, but being split across multiple repos made it a lot more difficult to work with.<\/p>\n\n\n\n

As the code was reformatted time and again, its hosting in the repo also had to adapt, going from the initial single repo, to multiple repos, to a monorepo, to what may be called a “multi-monorepo.”<\/p>\n\n\n\n

Let me take you on the journey of how this took place, explaining why and when I felt I had to switch to a new approach. The journey consists of four stages (so far!) so let\u2019s break it down like that.<\/p>\n\n\n\n\n\n\n

Stage 1: Single repo<\/h3>\n\n\n

The project is leoloso\/PoP<\/code><\/a> and it\u2019s been through several hosting schemes, following how its code was re-architected at different times.<\/p>\n\n\n\n

It was born as this WordPress site<\/a>, comprising a theme and several plugins. All of the code was hosted together in the same repo.<\/p>\n\n\n\n

Some time later, I needed another site with similar features so I went the quick and easy way: I duplicated the theme and added its own custom plugins, all in the same repo. I got the new site<\/a> running in no time.<\/p>\n\n\n\n

I did the same for another site, and then another one, and another one. Eventually the repo was hosting some 10 sites, comprising thousands of files.<\/p>\n\n\n\n

\"\"
A single repository hosting all our code.<\/figcaption><\/figure>\n\n\n

Issues with the single repo<\/h4>\n\n\n

While this setup made it easy to spin up new sites, it didn\u2019t scale well at all. The big thing is that a single change involved searching for the same string across all 10 sites. That was completely unmanageable. Let\u2019s just say that copy\/paste\/search\/replace became a routine thing for me.<\/p>\n\n\n\n

So it was time to start coding PHP the right way<\/a>.<\/p>\n\n\n

Stage 2: Multirepo<\/h3>\n\n\n

Fast forward a couple of years. I completely split the application into PHP packages, managed via Composer and dependency injection.<\/p>\n\n\n\n

Composer uses Packagist<\/a> as its main PHP package repository. In order to publish a package, Packagist requires a composer.json<\/code> file placed at the root of the package’s repo. That means we are unable to have multiple PHP packages, each of them with its own composer.json<\/code> hosted on the same repo.<\/p>\n\n\n\n

As a consequence, I had to switch from hosting all of the code in the single leoloso\/PoP<\/code> repo, to using multiple repos, with one repo per PHP package. To help manage them, I created the organization “PoP” in GitHub<\/a> and hosted all repos there, including getpop\/root<\/code>, getpop\/component-model<\/code>, getpop\/engine<\/code>, and many others.<\/p>\n\n\n\n

\"\"
In the multirepo, each package is hosted on its own repo.<\/figcaption><\/figure>\n\n\n

Issues with the multirepo<\/h4>\n\n\n

Handling a multirepo can be easy when you have a handful of PHP packages. But in my case, the codebase comprised over 200 PHP packages<\/a>. Managing them was no fun.<\/p>\n\n\n\n

The reason that the project was split into so many packages is because I also decoupled the code from WordPress<\/a> (so that these could also be used with other CMSs), for which every package must be very granular, dealing with a single goal.<\/p>\n\n\n\n

Now, 200 packages is not ordinary. But even if a project comprises only 10 packages, it can be difficult to manage across 10 repositories. That’s because every package must be versioned, and every version of a package depends on some version of another package. When creating pull requests, we need to configure the composer.json<\/code> file on every package to use the corresponding development branch of its dependencies. It’s cumbersome and bureaucratic.<\/p>\n\n\n\n

I ended up not using feature branches at all, at least in my case, and simply pointed every package to the dev-master<\/code> version of its dependencies (i.e. I was not versioning packages). I wouldn’t be surprised to learn that this is a common practice more often than not.<\/p>\n\n\n\n

There are tools to help manage multiple repos, like meta<\/a>. It creates a project composed of multiple repos and doing git commit -m \"some message\"<\/code> on the project executes a git commit -m \"some message\"<\/code> command on every repo, allowing them to be in sync with each other.<\/p>\n\n\n\n

However, meta will not help manage the versioning of each dependency on their composer.json<\/code> file. Even though it helps alleviate the pain, it is not a definitive solution.<\/p>\n\n\n\n

So, it was time to bring all packages to the same repo.<\/p>\n\n\n

Stage 3: Monorepo<\/h3>\n\n\n

The monorepo is a single repo that hosts the code for multiple projects. Since it hosts different packages together, we can version control them together too. This way, all packages can be published with the same version, and linked across dependencies. This makes pull requests very simple.<\/p>\n\n\n\n

\"\"
The monorepo hosts multiple packages.<\/figcaption><\/figure>\n\n\n\n

As I mentioned earlier, we are not able to publish PHP packages to Packagist if they are hosted on the same repo. But we can overcome this constraint by decoupling development and distribution of the code: we use the monorepo to host and edit the source code, and multiple repos (at one repo per package) to publish them to Packagist for distribution and consumption.<\/p>\n\n\n\n

\"\"
The monorepo hosts the source code, multiple repos distribute it.<\/figcaption><\/figure>\n\n\n

Switching to the Monorepo<\/h4>\n\n\n

Switching to the monorepo approach involved the following steps:<\/p>\n\n\n\n

First, I created the folder structure in leoloso\/PoP<\/code> to host the multiple projects. I decided to use a two-level hierarchy, first under layers\/<\/code><\/a> to indicate the broader project, and then under packages\/<\/code><\/a>, plugins\/<\/code><\/a>, clients\/<\/code><\/a> and whatnot to indicate the category.<\/p>\n\n\n\n

\"Showing
The monorepo layers indicate the broader project.<\/figcaption><\/figure>\n\n\n\n

Then, I copied all source code from all repos (getpop\/engine<\/code>, getpop\/component-model<\/code>, etc.) to the corresponding location for that package in the monorepo (i.e. layers\/Engine\/packages\/engine<\/code><\/a>, layers\/Engine\/packages\/component-model<\/code><\/a>, etc).<\/p>\n\n\n\n

I didn’t need to keep the Git history of the packages, so I just copied the files with Finder. Otherwise, we can use hraban\/tomono<\/code><\/a> or shopsys\/monorepo-tools<\/code><\/a> to port repos into the monorepo, while preserving their Git history and commit hashes.<\/p>\n\n\n\n

Next, I updated the description of all downstream repos, to start with [READ ONLY]<\/code>, such as this one<\/a>.<\/p>\n\n\n\n

\"Showing
The downstream repo’s “READ ONLY” is located in the repo description.<\/figcaption><\/figure>\n\n\n\n

I executed this task in bulk via GitHub’s GraphQL API<\/a>. I first obtained all of the descriptions from all of the repos, with this query:<\/p>\n\n\n\n

{\n  repositoryOwner(login: \"getpop\") {\n    repositories(first: 100) {\n      nodes {\n        id\n        name\n        description\n      }\n    }\n  }\n}<\/code><\/pre>\n\n\n\n

…which returned a list like this:<\/p>\n\n\n\n

{\n  \"data\": {\n    \"repositoryOwner\": {\n      \"repositories\": {\n        \"nodes\": [\n          {\n            \"id\": \"MDEwOlJlcG9zaXRvcnkxODQ2OTYyODc=\",\n            \"name\": \"hooks\",\n            \"description\": \"Contracts to implement hooks (filters and actions) for PoP\"\n          },\n          {\n            \"id\": \"MDEwOlJlcG9zaXRvcnkxODU1NTQ4MDE=\",\n            \"name\": \"root\",\n            \"description\": \"Declaration of dependencies shared by all PoP components\"\n          },\n          {\n            \"id\": \"MDEwOlJlcG9zaXRvcnkxODYyMjczNTk=\",\n            \"name\": \"engine\",\n            \"description\": \"Engine for PoP\"\n          }\n        ]\n      }\n    }\n  }\n}<\/code><\/pre>\n\n\n\n

From there, I copied all descriptions, added [READ ONLY]<\/code> to them, and for every repo generated a new query executing the updateRepository<\/code> GraphQL mutation:<\/p>\n\n\n\n

mutation {\n  updateRepository(\n    input: {\n      repositoryId: \"MDEwOlJlcG9zaXRvcnkxODYyMjczNTk=\"\n      description: \"[READ ONLY] Engine for PoP\"\n    }\n  ) {\n    repository {\n      description\n    }\n  }\n}<\/code><\/pre>\n\n\n\n

Finally, I introduced tooling to help “split the monorepo.” Using a monorepo relies on synchronizing the code between the upstream monorepo and the downstream repos, triggered whenever a pull request is merged. This action is called “splitting the monorepo.” Splitting the monorepo can be achieved with a git subtree split<\/code><\/a> command but, because I’m lazy, I’d rather use a tool.<\/p>\n\n\n\n

I chose Monorepo builder<\/a>, which is written in PHP. I like this tool because I can customize it with my own functionality<\/a>. Other popular tools are the Git Subtree Splitter<\/a> (written in Go) and Git Subsplit<\/a> (bash script).<\/p>\n\n\n

What I like about the Monorepo<\/h4>\n\n\n

I feel at home with the monorepo. The speed of development has improved because dealing with 200 packages feels pretty much like dealing with just one. The boost is most evident when refactoring the codebase, i.e. when executing updates across many packages.<\/p>\n\n\n\n

The monorepo also allows me to release multiple WordPress plugins at once. All I need to do is provide a configuration to GitHub Actions via PHP code (when using the Monorepo builder) instead of hard-coding it in YAML.<\/p>\n\n\n\n

To generate a WordPress plugin for distribution<\/a>, I had created a generate_plugins.yml<\/code> workflow<\/a> that triggers when creating a release. With the monorepo, I have adapted it to generate not just one, but multiple plugins, configured via PHP through a custom command in plugin-config-entries-json<\/code><\/a>, and invoked like this in GitHub Actions<\/a>:<\/p>\n\n\n\n

- id: output_data\n  run: |\n    echo \"quot;::set-output name=plugin_config_entries::$(vendor\/bin\/monorepo-builder plugin-config-entries-json)\"<\/code><\/pre>\n\n\n\n

This way, I can generate my GraphQL API plugin<\/a> and other plugins hosted in the monorepo all at once. The configuration defined via PHP is this one<\/a>.<\/p>\n\n\n\n

class PluginDataSource\n{\n  public function getPluginConfigEntries(): array\n  {\n    return [\n      \/\/ GraphQL API for WordPress\n      [\n        'path' => 'layers\/GraphQLAPIForWP\/plugins\/graphql-api-for-wp',\n        'zip_file' => 'graphql-api.zip',\n        'main_file' => 'graphql-api.php',\n        'dist_repo_organization' => 'GraphQLAPI',\n        'dist_repo_name' => 'graphql-api-for-wp-dist',\n      ],\n      \/\/ GraphQL API - Extension Demo\n      [\n        'path' => 'layers\/GraphQLAPIForWP\/plugins\/extension-demo',\n        'zip_file' => 'graphql-api-extension-demo.zip',\n        'main_file' =>; 'graphql-api-extension-demo.php',\n        'dist_repo_organization' => 'GraphQLAPI',\n        'dist_repo_name' => 'extension-demo-dist',\n      ],\n    ];\n  }\n}<\/code><\/pre>\n\n\n\n

When creating a release, the plugins are generated via GitHub Actions<\/a>.<\/p>\n\n\n\n

\"Dark
This figure shows plugins generated when a release is created.<\/figcaption><\/figure>\n\n\n\n

If, in the future, I add the code for yet another plugin to the repo, it will also be generated without any trouble. Investing some time and energy producing this setup now will definitely save plenty of time and energy in the future.<\/p>\n\n\n

Issues with the Monorepo<\/h4>\n\n\n

I believe the monorepo is particularly useful when all packages are coded in the same programming language, tightly coupled, and relying on the same tooling. If instead we have multiple projects based on different programming languages (such as JavaScript and PHP), composed of unrelated parts (such as the main website code and a subdomain that handles newsletter subscriptions), or tooling (such as PHPUnit and Jest), then I don’t believe the monorepo provides much of an advantage.<\/p>\n\n\n\n

That said, there are downsides to the monorepo:<\/p>\n\n\n\n