Posts

With a Multi-cloud Infrastructure, Control is Key

By Andrew Marshall, Cedexis Director of Product Marketing

Ask any developer or DevOps manager about their first experiences with the public cloud and it’s likely they’ll happily share some memories of quickly provisioning some compute instances for a small project or new app. For (seemingly) a few pennies, you could take full advantage of the full suite of public cloud services—as well as the scalability, elasticity, security and pay-as-you-go pricing model. All of this made it easy for teams to get started with the cloud, saving on both the IT budgets and infrastructure setup time. Public cloud providers AWS, Azure, Google Cloud, Rackspace and others made it easy to innovate.

Fast forward several years and the early promise of the cloud is still relevant: Services have expanded, costs have (in some cases) been reduced and DevOps teams have adapted to the needs of their team by spinning up compute instances whenever they’re needed. But for many companies, the realities of their hybrid-IT infrastructure necessitates support for more than one public cloud provider. To make this work, IT Ops needs a control layer that sits on top of their infrastructure, and can deliver applications to customers over any architecture, including multicloud. This is true, no matter what the reason teams need to support multi-cloud environments.

Prepare for the Worst

As any IT Ops manager, or anyone who has lost access to their web app knows, outages and cloud service degradation happen. Modern Ops teams need a plan in place for when they do. Many companies choose to utilize multiple public cloud providers to ensure their application is always available to worldwide customers, even during an outage. The process of manually re-routing traffic to a second public cloud in the event of an outage is cumbersome, to say the least. Adding an app delivery control plan on top of your infrastructure allows companies to seamlessly and automatically deliver applications over multiple clouds, factoring in real-time availability and performance.

Support Cloud-driven Innovation

Ops teams often support many different agile development teams, often in multiple countries or from new acquisitions. When this is the case, it’s likely that the various teams are using many architectures on more than one public cloud. Asking some dev teams to switch cloud vendors is not very DevOps-y. A better option is to control app delivery automation with a cloud-agnostic control plane that sits on top of any cloud, data center or CDN architectures. This allows dev teams to work in their preferred cloud environment, without worrying about delivery.

Avoid Cloud Vendor Lock-in

Public cloud vendors such as Amazon Web Services or Microsoft Azure aren’t just infrastructure-as-a-service (IaaS) vendors, they sell (or resell) products and services that could very well compete with your company’s offering. In the beginning, using a few cloud instances didn’t seem like such a big deal. But now that you’re in full production and depend on one of these cloud providers for your mission-critical app, this no longer feels like a great strategy. Adding a second public cloud to your infrastructure lessens your dependence on a single cloud vendor who you may be in “coopetition” with.

Multiple-vendor sourcing is a proven business strategy in many other areas of IT, giving you more options during price and SLA negotiations. The same is true for IaaS. Cloud services change often, as new services are added or removed, and price structures change. Taking control over these changes in public cloud service offerings, pricing models and SLAs is another powerful motivator for Ops teams to move to a multi-cloud architecture. An application delivery automation platform that can ingest and act on cloud service pricing data is essential.

Apps (and How They’re Delivered) Have Changed

Monolithic apps are out. Modern, distributed apps that are powered by microservices are in. Similarly, older application delivery controllers (ADCs) were built for a static infrastructure world, before the cloud (and SaaS) were commonly used by businesses. Using an ADC for application delivery requires a significant upfront capital expense, limits your ability to rapidly scale and hinders the flexibility to support dynamic (i.e. cloud) infrastructure. Using ADCs for multiple cloud environments compounds these issues exponentially. A software-defined application delivery control layer eliminates the need for older ADC technology and scales directly with your business and infrastructure.

Regain Control

Full support for multi-cloud in product may sound daunting. After all, Ops teams already have plenty to worry about daily. Adding a second cloud vendor requires a significant ramp-up period to get ready for production-level delivery, and the new protocols, alerts, competencies and other things you need to think about. You can’t be knee-deep in the details of each cloud and still manage infrastructure. Adding in the complexity of application delivery over multiple clouds can be a challenge, but much less so if you use a SaaS-based application delivery platform. With multi-cloud infrastructure, control is key.

Learn more about our solutions for  multi-cloud architectures and discover our application delivery platform.

You can also download our last ebook “Hybrid Cloud, the New Normal” for free here.

Cloud-First + DevOps-First = 81% Competitive Advantage

We recently ran across a fascinating article by Jason Bloomberg, a recognized expert on agile digital transformation, that examines the interplay between Cloud-First and DevOps-First models. That article led us, in turn, to an infographic centered on some remarkable findings from a CA Technologies survey of 900-plus IT pros from around the world. The survey set out to explore the synergies between Cloud and DevOps, specifically in regards to software delivery. You can probably guess why we snapped to attention.

The study found that 20 percent of the organizations represented identified themselves as being strongly committed to both Cloud and DevOps, and their software delivery outperformed other categories (Cloud only, DevOps only, and slow adopters) by 81 percent. This group earned the label “Delivery Disruptors” for their outlying success at maximizing agility and velocity on software projects. On factors of predictability, quality, user experience, and cost control, the Disruptor organizations soared above those employing traditional methods, as well as Cloud-only and DevOps-only methods, by large percentages. For example, Delivery Disruptors were 117 percent better at cost control than Slow Movers, and 75 percent better in this category than the DevOps-only companies.

These findings, among others, got us to thinking about the potential benefits and advantages such Delivery Disruptors can gain from adding Cedexis solutions into their powerful mix. Say, for example, you have agile dev teams working on new products and apps and you want to shorten the execution time for new cloud projects. To let your developers focus on writing code, you need an app delivery control layer that supports multiple teams and architectures. With the Cedexis application delivery platform, you can support agile processes, deliver frequent releases, control cloud and CDN costs, guarantee uptime and performance, and optimize hybrid infrastructure. Your teams get to work their way, in their specific environment, without worrying about delivery issues looming around every corner.

Application development is constantly changing thanks to advances like containerization and microservice architecture — not to mention escalating consumer demand for seamless functionality and instant rich media content. And in a hybrid, multi-cloud era, infrastructure is so complex and abstracted, delivery intelligence has to be embedded in the application (you can read more about what our Architect, Josh Gray, has to say about delivery-as-code here).

To ensure that an app performs as designed, and end users have a high quality experience, agile teams need to automate and optimize with software-defined delivery. Agile teams can achieve new levels of delivery disruption by bringing together global and local traffic management data (for instance, RUM, synthetic monitoring results, and local load balancer health), application performance management, and cost considerations to ensure the optimal path through datacenters, clouds, and CDNs.

Imagine the agility and speed a team can contribute to digital transformation initiatives with fully automated app delivery based on business rules, actionable data from real user and synthetic testing, and self-healing network optimizations. Incorporating these capabilities with a maturing Cloud-first and DevOps-first approach will likely put the top performers so far ahead of the rest of the pack, they’ll barely be on the same racetrack.

 

 

Shellshocked! Big problem. Cedexis swift response.

For 22 years Shellshock lay dormant. This bug exposed the ability to “take control of hundreds of millions of machines around the world, potentially including Macintosh computers and smartphones that use the Android operating system”*.

Bugs

From Cedexis viewpoint – our dev-ops team sprang into action. In short order:

  • The Shellshock vulnerability in ‘bash’ was disclosed publicly on Sept 24th.
  • Ubuntu, our operating system provider, issued a security update for ‘bash’
  • All of our systems were patched and updated by 1pm today (Sept 25th).

The inventor of Bash, Brian J. Fox joked in an interview Thursday, that his first reaction to the Shellshock discovery was, “Aha, my plan worked.”

Indeed.

Rapid Deployment That Won’t Delay Lunch

Part of our disaster mitigation plan has included developing a rapid server deployment process. In addition to saving us a lot of time when launching a new server, rapid deployment ensures that existing servers stay up-to-date with the latest configuration changes and software releases. It also means that when we need to expand our services, populate a new data center or CDN, or recover from a crash, we can be on top of things quickly. Our servers are located in over 40 data centers and cloud providers worldwide and collect over a billion-and-a-half measurements every day. These measurements are crucial to the decision-making power of Openmix.

Deploying on Ubuntu with Puppet

Our deployment plan has several steps. First, we begin with the latest Ubuntu Long Term Support (LTS) release available. Each of our service providers already supports Ubuntu LTS, typically by providing a pre-built virtual machine image. Their configurations can vary from provider to provider, so we have built specialized bootstrap shell scripts that we can use to deploy our own very basic configuration. These scripts are customized by both cloud provider and topographic location of the server. Much of our configuration is based around the hostname of the new server, which the bootstrap script sets based on the location and service provider of the new host.

The bootstrap process also installs and configures Puppet Enterprise from Puppet Labs. This tool provides a configuration management system that allows us to specify the state we want each server to be in. Everything from specific applications to firewall configurations can be defined in Puppet modules, which are gathered into site manifests. The granularity is incredible—even the permissions and ownership of individual file can be specified.

Using Puppet, we are then able to make sure that the new server is working the way we want it to. We know, even before deploying any Cedexis software, that the server is in a known state and running smoothly. We’re also able to keep an eye on hundreds of servers across dozens of data centers at once using Puppet, which makes it an invaluable tool. We love using Puppet, and if you’d like more information about how and why we selected it, you should read our Puppet Labs case study.

Installing the Cedexis Tools

The next step is to use Puppet to add the Cedexis apt repositories to the new server’s configuration. We prepare our software releases as Debian packages, which gives us all of the power of dpkg, the Debian package management system. This allows us to specify dependencies, track software versions, and automatically deploy new packages and revisions. Once the server is aimed at our apt repositories, installing the Cedexis tools is quick and painless.

The new server is just about ready. Because Openmix makes decisions using only very recent measurements, only a small amount of data needs to be transferred to the new server before it can begin making intelligent decisions and joining the rest of our workforce.

We monitor with both Puppet and our own monitoring software. If problems develop, we notice them quickly. From beginning to end, the process of deploying a new server takes about an hour, and the tools we have chosen give us the confidence that each new server is playing nicely with the Cedexis network and doing its job appropriately.

Life at Cedexis: LessOps, not NoOps

As a fast-paced company with modern needs, Cedexis has developed an interesting balance of work between operations and development. Last year, there was a lot of talk about the concept of “NoOps,” which indicated (depending on your perspective) either an elimination of the traditional operations role or a description of what must be a well-oiled and well-working operations team. In either case, the main message seemed to be that operations should focus on automation as much as possible, thus freeing up the need for actual man-hours with eyes on the network.

A crucial part of the Internet discussion involved an article by Operations Engineer John Allspaw in which he criticized the concept of NoOps.

“I do find it disturbing,” said Ramin Khatibi in support of Allspaw, “that enough people have experienced Ops performed so poorly that faced with a working version they assume it must be something else.”

While I also don’t think that NoOps is a realistic approach, I’m not going to mount an argument against its existence. Instead, Cedexis uses its small technical team to the best effect, in a system we’ve been calling “LessOps.”

What is LessOps?

We wanted our LessOps approach to build more collaboration between the operations and development teams, including cross-training. We also hoped that integration between the two groups would lead to a stronger, more robust product. However, it would mean some changes that might not be that popular, such as adding developers to the on-call rotation.

When we first put our developers on call, the operations team was hesitant and skeptical. After all, not many developers have what we might call the operations mindset. Developers tend to be focused on individual problems or components of a system, while operations staff take a more holistic approach out of necessity. We were unsure how it would turn out. However, after a few months under this new system, we’ve enjoyed great benefits.

That Time We Had Zero Alerts

The biggest and most measurable benefit is that after a few weeks with the mixed on-call rotation, for the first time we were experiencing periods with zero alerts and errors on our system health monitor. This happened as a result of several factors.

First, developers were put in charge of both designing and responding to monitors for components they’d written. Error messages written by developers have a tendency to be written for those who are dealing with their own running code. They can be difficult for others to interpret efficiently. These messages, suddenly monitored by engineers, pointed out places in the system where unneeded monitoring and reporting was happening. A number of minor, easy-to-fix issues that operations had not paid sufficient attention to before also were quickly discovered and addressed.

This also led to another change that produced a more robust system. Developers began writing the monitors for the components they had written. Operations know how difficult it is to monitor a system they don’t understand. With ops and developers working together, however, monitors could be written that provide both teams with meaningful information that facilitates more effective responses to alerts by directing them to the proper domain experts.

These monitors have also become part of our software development lifecycle. In production, they operate as live unit tests, and as the system is developed over time, they provide a set of legacy regression tests. These keep us alerted if any changes in the system have produced unpredictable behavior in components we may not actively monitor or develop anymore.

A Culture of Documentation

The cross-training between operations and development also produces much stronger documentation. We have learned, as have many teams, that good documentation needs to be a core element of the team’s culture. Everyone on the team has realized the benefit of stable, consistent documentation that is kept up to date.

Overall, we have found that the LessOps approach, with operations and engineering working closely with each other, has produced more automation, fewer system faults, and more free time for our ops staff to play LAN games. Plus, the teams work extremely well together, with architecture and design decisions made more efficiently as a group, and problems being solved faster with domain experts quickly on hand.