Why Bother?

The obvious question that initially emerges is WHY? Why bother with all the hassle of migrating the code away from GitHub and maintaining the instances that provide the service on our own?

First of all, GitHub Enterprise offers us a more secure way to store the sensitive parts of our codebase by bringing the repositories inside our VPC. Furthermore, because the instances hosting the code and the ones using it are now much closer together, code provisioning can be done much faster.

Secondly, it is better to have some control over your downtime than to be at the mercy of GitHub (or any other service for that matter), which can lead to canceled deployments or angry developers who cannot pull their code. As frustrating as downtime is, it’s not a matter of ‘if’ it will happen, but rather ‘when’ will it will happen.

We chose GitHub Enterprise over other forms of repository hosting providers, like GitLab, because of the reasons above and because our developers were already familiar with the interface, features and embraced the GitHub flow. GitHub Enterprise is easy to update and it had a better API.

image02

Will it stand up in our Production Environment?

Before we could actually start using GitHub Enterprise in production, we needed to see if it could support our blue/green deployment system (more details on this can be read here). This meant that it should stand hundreds of instances that needed to pull their code from the repository, simultaneously.

To test this, we have used two r3.xlarge memory optimized instances, offering 32GB of memory and 4 vCPU, in a replication setup. This is what GitHub recommends for a seat range of 500 to 3000 people.

The tested version is GitHub Enterprise 2.1.4

The trouble of creating new instances for this test and manage them was overkill, so this idea got dropped from head start. Instead, we used our existing (#130) instances in production to test the clone/pull/push operations. They were already there and were about the same number that will be used when we deploy the new setup. MCollective was used to orchestrate the entire fleet of instances.

To monitor the performance of our GitHub Enterprise instances, we will be using the built-in tool that comes with GE 2.1.4, which covers the most important aspects in scope: disk usage, CPU load, memory usage and response time.

Test 1: Clone a Repo with Many Files

The first test (time 9:59 – 10:01 in Picture 1) consisted in a clone against a 23779 file repository, which had about 100MB size in total. It took about 2 minutes for all the instances to fetch the data and the CPU of our GE instances got an average load of about 15.

Picture 1: [ 9:59 - 10:01 ] →  Clone a repo with many files
Picture 1: [ 9:59 – 10:01 ] → Clone a repo with many files
Test 2: Clone a Repo with Big Files

In our second test (time 10:21 – 10:31 in Picture 2), we took the same approach, but this time cloning was done against a repo containing big files: 3 big files were added, filled with 100MB of random data (from /dev/urandom). It’s worth mentioning that this test is against all git optimizations, which follow line diffs and character matching to make the patches as small as possible, but because this was just a clone, it did not matter that much.

Picture 2: [ 10:21 - 10:31 ] →  Clone a repo with big files
Picture 2: [ 10:21 – 10:31 ] → Clone a repo with big files
This test brought a little bit more weight on the servers. It took about 10 minutes to fetch all data and the CPU got an average load of about 8. During the test, the UI also got laggy.

Test 3: Multiple Pushes to Individual Branches

Another test case that got our attention is when multiple developers try to push code to the same repository. Considering that write operations are generally more intensive and need more resources, putting this situation to the test, further proved the reliability of the system.

The test (time 12:00 – 12:04 in Picture 3) consisted in a push operation by each of our ‘worker’ instances to individual branches. Each operation would push a commit consisting of a 100MB file (containing random data) being added to the repository.

Picture 3: [ 12:00 - 12:04 ] →  Multiple pushes to individual branches
Picture 3: [ 12:00 – 12:04 ] → Multiple pushes to individual branches

This test took some time, but did “a lot of damage”, both CPU and memory wise. It took about 4 minutes to fetch all data and the CPU got an average load of about 90 (with a peak of 133).

After the operations were over, the memory usage was still ramped up and the CPU was still in load. It looks like GitHub Enterprise does a lot of caching to keep its files ‘warm’ for further pull operations, and handle next requests swiftly.

GitHub Enterprise has Quick and Helpful Support

Another great aspect of GitHub, is the support team, which have helped us with the deployment of our instances, during the testing phases and when we migrated our data from public GitHub to private GitHub Enterprise. Their responses were quick, helpful and to the point.

Conclusions

GitHub Enterprise proved to be a reliable system, which can withstand multiple clone, pull and push operations, from hundreds of clients, as the tests have proven. Updates come in pretty often, to ensure that the service is optimized and running smoothly and the support team can quickly intervene whenever it is necessary.

There are some aspects that I’d like to see improved, such as:

  • The failover system, which should be done manually, by switching the DNS entry to the secondary instance, when the primary fails
  • Load-balance requests between instances that are in replication
  • Block incoming connections when there are too many requests coming in

Although we have moved to private repository hosting, this does not mean that we are going to neglect our public projects. We plan to further increase the number of our open source projects, add better documentation and guidelines to existing ones and make them contributions-ready. This will happen on our current github.com/hootsuite account, so make sure you keep an eye on it.

Thanks

Thanks all the Operations Team for making this happen, and both Noel Pullen and Mark Eijsermans for helping me publishing my first post.

About the Author

image04Marius Cotofana is living the Hootsuite Life as an Junior DevOps Engineer for more than a year now. He is part of the Analytics Ops team in Bucharest, but loves to learn from all the teams around the office. Marius promotes open-source software, but a no-source lifestyle. Follow him on twitter @cmarius02.