For many organisations, moving their applications to the cloud is becoming increasingly advantageous, with benefits such as lower operating costs, increased agility, and increased stability. However, there are also significant challenges and constraints that need to be overcome.
To provide help illustrate some of the challenges and opportunities that the cloud can bring, this post follows the steps we took with one of our customers to build a hybrid cloud infrastructure that followed agile and DevOps practices, whilst still meeting the customer’s security constraints.
Our customer has a new private-cloud facility that allows projects to build and configure infrastructure, but through a change-controlled process. This infrastructure is well secured and both process and technical controls limit its access to the public Internet. It is a good environment for production deployment, but less so for development, as the lack of Internet access makes it hard to get hold of the libraries and tools required to build an application. If each required library had to be obtained by filling in forms to request that someone download it and make it available in an internal mirror this would severely hinder a project’s ability to be agile.
The public Internet provides a great environment for development, with all the required libraries and tools readily to hand, however, with the security of customer data their main priority, our customer needed to find an alternative solution that provided that extra level of security. As they had already invested in their private cloud infrastructure, we decided that creating a hybrid-cloud approach would be the answer. This would mean that development and initial testing would be done in a public cloud environment (AWS) before being shipped to the private cloud, where releases proceed through several environments before finally being deployed to production.
The overall process we came up with was quite straightforward. Application code is compiled and packaged into a docker container, together with all the runtime environment that it needs, then these docker containers are tagged with version numbers and pushed into a nexus repository. From here they can be deployed onto AWS EC2 servers in our AWS development and integration environment and are available via a nexus proxy in the private cloud environments. This provides several security benefits, including the requirement of only one firewall rule (from the nexus proxy in the private cloud, to the nexus repository in AWS) rather than requiring multiple rules to allow traffic to all the sites required to build the code (e.g. maven central).
Both c2b2 and our customer were very keen that the development project would follow DevOps practices, including continuous-build-and-integration and rapid deployment. To help achieve this we made use of Jenkins to automatically build code after a checkin was detected, and Arquillian to automatically execute unit and integration tests. We made particular use of Arquilian cube, which allowed us to create a docker infrastructure, consisting of a number of containers to support our integration testing, and to turn containers on and off to simulate failures as part of that testing.
To achieve rapid deployment, we felt it was important to simplify the deployment process as much as possible, adopting an "infrastructure as code" approach. The tool we selected for this was Ansible, as it has a simple approach, yet also provides out of the box modules for everything that we want to do (especially working with docker containers and repositories). By taking an approach of "disposable docker containers," our approach to upgrades, releases, patches or configuration changes is the same - we create new docker containers that contain the required changes and then deploy them to the environment using our Ansible scripts.
As part of the build, these scripts also perform several health-check tasks - checking that firewall ports are open and diagnosing and alerting us to any environmental issues. The scripts then perform either a "sequential" build (building one server at a time) or build a whole environment in parallel, depending on what is required. Any resources required by the scripts are downloaded from our AWS nexus server via the nexus proxy in the private cloud, and have their checksums validated before use.
Integrity of code
This brings us on to one of the key customer concerns that we needed to address - ensuring the integrity of deployed code. This involves ensuring that the code deployed in the production environments had not been tampered with between being built on our AWS servers and deployed in the private cloud.
We employed a series of techniques to assist with this key concern. Firstly, we used AWS security rules to tie down access to the Jenkins build server and nexus repository, only allowing access from specific whitelisted IP addresses. Then we used TLS between the nexus proxy in the private cloud, and the nexus repository in AWS, with a self-signed certificate (configured as the only trusted certificate for the proxy). Together with a restrictive outgoing firewall rule for the nexus proxy, this ensures that the proxy will only connect to our AWS nexus repo, preventing "man-in-the-middle" attacks that could alter the binaries. Just to be extra sure, we validate the checksum of the docker image in our Ansible scripts to ensure that the downloaded image is the one we intended.
One of the other challenges we faced was monitoring, as the approach of using docker containers for all our service components meant that we had large numbers of small docker containers deployed in each of our environments. We need to obtain both runtime metrics (such as memory usage) and performance metrics (such as number of requests being processed and response time) from each of our containers, while also aggregating them up to the service and host server levels. We looked at a few frameworks for this before settling on an approach where the metric data is written to log files (in a directory mounted from the host server) and forwarded, using Filebeat, to an ELK (Elasticsearch, Logstash, Kibana) stack server (itself a docker container). We can then do processing using Logstash to extract the metrics out of the log messages and make them graphable using Kibana.
Overall, the approaches that we chose have proved to be very powerful, allowing us to meet our customer’s strict security guidelines, whilst also delivering an approach that allows for agile and DevOps practices to be followed. However, as always, there is still scope for further improvements that we could make going forward, such as automating more of the testing that occurs in the test environment, as currently our test automation extends only to testing performed in AWS using stubbed integrations.