Skip to content

Continuous Integration Notes

Gray edited this page Mar 19, 2019 · 1 revision

Continuous Integration (CI)

Continuous Integration (and associated DevOps strategies) is a very important part of the modern development cycle, especially for cross-platform applications where changes to one operating system may impact and render applications for the other operating systems useless.

The goal of CI is to ensure that no matter how many developers we have, each individual commit and pull request will be compiled and tested across all major platforms to ensure a continuous integration of each commit into the main codebase smoothly, and without introducing breaking changes.

CI services help automate this process by spinning up VMs or build machines that pull the commit from GitHub, prepare dependencies, compile the application, and run tests. This is all set up from configuration files or custom pipelines.

For CaPTk, we want our CI to do the following:

  • Automate compilations for Windows, Linux, and OSX
  • Run tests with varying degrees of comprehensiveness

Why Travis CI?

After some deliberation, we have decided on Travis CI as our current CI mechanism (as of 3/8/2019). Travis CI has:

  • Excellent integration with GitHub via webhooks
  • Ability to test PRs against our CI suite
  • Easy and simple configuration of build matrices and build scripts
  • Support for Linux, OSX, and Windows (beta)
  • No cost for open source projects

Options considered before Travis

GitLab CI Runner

Before Travis CI, CaPTk was hosted on GitLab, which gave us access to the GitLab CI runners. The GitLab CI runners had one major notable issue:

  • Runners needed to be hosted on Windows and OSX machines to actually compile on those platforms. While we could set it up, the solution would only work on GitLab. Since you're reading this on GitHub, you can see that in retrospect that would have been a great waste of man-hours. Either way, portability would have been an issue if hosting solutions change.

Other than this the GitLab CI runner was fairly comparable to Travis CI, as both are configurable with YAML, however personally I find Travis CI's configuration to be much more straightforward as build steps are organized very meticulously if one so chooses. In addition Travis CI also contains CD (Continuous Deployment) services and hooks into Google Cloud Platform and AWS for distribution/deployment, which the GitLab CI runner lacked.

Azure Pipelines

Azure pipelines was avoided mostly due to the configuration methods. Travis CI allows to defer completely to other scripts for the build, where Azure requires the construction of elaborate pipelines to perform CI tasks. Given the current nature of the CI and steps that need to be taken to move files around to avoid certain CMake blocks and other general hack-ish solutions, it would be quite a task to get those to be effective pipelines.

Azure does have some benefits such as "unlimited" build times and great support for windows (It's a MS product after all), however Travis's ease of use won out.

I've seen how ITK does their CI and it appears as if there is Travis CI style configs with support for bash scripts, so I think Azure is ultimately very feasible, as long as the current CI scripts are able to be ported over 100%.

The Current Setup

So what solution is CaPTk using right now, and how is it configured?

Currently (3/8/2019), CaPTk uses Travis CI and custom build scripts for all 3 major platforms. .travis.yml stores the configuration for the build matrix (which sets up builds for trusty ubuntu linux, xenial ubuntu linux, OSX, and windows), system dependencies and packages that need to be pulled in through homebrew or aptitude, and logic to defer the build process to OS-specific scripts found within the .travis directory of CaPTk.

The scripts inside that folder follow a simple naming scheme: name of the OS followed by the type of build. For example, linux-generic is a generic Linux build that a normal user might perform. Currently, the only non generic script is linux-installer, which downloads the latest CaPTk version and tries to perform an install from the package.

Forseeable and Current Issues with Travis CI

Current Issues

While in theory the scripts are simple Travis makes some things difficult:

  • Internet speeds are variable, meaning Travis may take over 10 minutes to download CaPTk and time itself out (If Travis goes over 10 minutes without output they just assume something went wrong with your build and cut it off. Very dumb practice in my opinion.)
  • Travis does not allow connections over FTP, meaning most download mechanisms used in CMake need custom workaround that use wget with an infinite timeout to attempt to hopefully download files
  • Travis has a two hour build limit (formerly 50(!!!) minutes, however we asked for an extended build time)
  • Caching refuses to work properly on Linux, meaning we really milk that 2 hour limit, with most linux builds finishing at the 1:58 mark.
  • Windows builds just won't work. They always fail at the Qt location step which they absolutely shouldn't be.

Forseeable Issues

I believe that as CaPTk's testing suite expands, Travis CI will not be equipped to handle the CI tasks CaPTk will require. Here is why:

  • Travis CI seems to be intended for simple, small projects. CaPTk is neither simple, nor small.
  • Linux builds already push 2 hours, leaving us with a 2 minute window for testing. There is absolutely, positively, no way will every application, potential future application, test, and potential test can run and finish in that time.
  • How will sample data be uploaded to Travis to test against? Downloading CaPTk itself almost times out Travis, medical images will absolutely time it out.
  • Uploading data means there is the potential to accidentally upload PHI, or images we don't want publicly available or visible to Travis.
  • Training Module. Are we really going to test machine learning on medical images on a platform we can barely upload to in under two hours and also have time for every other application? Reminder on Linux that the 2 hours becomes 2 minutes to do all the tests in.

As CI progresses, any online service for CI will eventually become completely unusable due to the demands of a project that needs to meaningfully test it's applications against large sets of data that is both storage intensive and private health information, and this is the major point of failure for the CaPTk CI - Free services that we don't host will simply never be up to the task, and will always be incomplete, missing implementation, or lacking meaningful results.

Does this mean CI is a failure and worth abandoning? Absolutely not. The value of CI is monumental for CaPTk where Windows devs outweigh OSX/Linux devs. In order to properly ensure that all platforms receive equal care, CI reminds and places responsibility directly onto developers to be responsible for the other platforms, and this alone is worth keeping.

So let's talk about what we can do about it.

Alternatives: Good, Bad, and the Ugly

AppVeyor, CircleCI

AppVeyor and CircleCI have the exact same issues Travis has.

AppVeyor

  • 10 minute no-output timeout
  • Build time limit of 60 minutes

Source

Circle CI

  • Monthly build limit of 1000 minutes at the free tier. CaPTk pessimistically takes 120 of those per individual Linux build, meaning at most we get 8.3 linux builds, or perhaps more accurately, 8 commits from a linux machine per month.
  • You get one container for free, meaning we need to pay for all the other OS's we need to test.

Source

Concluding Thoughts

These solutions really are only good to waste time running into the same issues we already have, so let's avoid them. These CI services are not cut to the caliber of CaPTk, which is definitely the flaw in choosing Travis. The simplicity of these systems are only really complimentary to projects that are as equally simple.

Azure Pipelines

Azure has unlimited build times and great support for windows, which are both very good things that Travis CI lacks. Azure allows for 10 concurrent jobs at the free tier which I don't necessarily will become an issue unless we discover 7 new operating systems that we need to support.

My primary concern with Azure is the Linux and OSX side of things, because representing the current processes for both of those platforms as pipelines and not as bash scripts sounds like an incredibly time-intensive task as the various quirks of Azure are uncovered. Unless Azure has a native way of running a bash script to do a build, there will be a significant time investment, and it appears that it does.

Azure also allows us to have a self-hosted job provided the team is small (under 5 people) which could prove to be useful, but then we would be constrained by an 1800 minute monthly limit, which makes this option fairly useless for the same reasons as CircleCI.

The future of CaPTk CI inherently involves self-hosted CI jobs that perform massive test suites against huge data sets, and having one singular hosted job (with a monthly limit, mind you) is not enough.

Overall, it may be worth the investment due to the unlimited build times but it still does not capture CaPTk's need to test against large data that we can't just simply push up to the cloud.

Source

Enterprise Travis on the Cluster

Enterprise Travis is effectively the same as normal Travis, however it can be ran on our own infrastructure. This would both allow us to keep most of our existing configuration and GitHub integrations, as well as allow us to have those massive testing suites.

Some of the issues are obviously going to be setting up and maintaining Travis CI on the cluster, and pricing which Travis makes VERY opaque. It seems like the only way to get pricing is to get in touch with Travis CI directly, and if it has 'enterprise' in the name, it's probably going to be pricey.

Source

Cluster Only

Mainly suggesting this one more of a mental exercise rather than a serious alternative, because having our own CI solution on the cluster means that we need to develop that infrastructure in house along with all of the GitHub web hooks and CI runners. This option would take the most amount of time, probably require the most amount of maintenance, but in exchange would give us the most control possible over our CI.

The biggest downside is now CI only can be tested on Linux, unless we make a whole docker containerization thing, which only just adds to the already complex task of this solution.

Combination of Web & Cluster

So, I think that the best solution right now is to combine some web CI service with the cluster, and what I mean by that is to use the web CI service as mostly a checker to catch issues at compile time across multiple platforms, and for this either Azure or Travis CI can be used. The cluster needs to be able to have some mechanism to kick off large sweeping tests of the entire CaPTk codebase. While the tests themselves will be Linux only, having them will be monumental in detecting issues since most devs are using Windows and there may be result differences across platforms.

The issue with this solution is now that CI and testing aren't in the same location, which I think is a fine trade off to having some CI and no testing. Note that Azure might make life difficult for OSX and Linux as it seems OSX is only supported in the context of Xcode, but Travis definitely makes doing any degree of testing hard on linux due to the fact caching doesn't work, so pick your poison.