99.999 Is Not Enough: An OpenCloud Approach to Delivering Application Uptime and Performance

Executive Summary

The pressure to keep vital applications online and performing well is extreme. The stakes are high; application downtime means loss of revenue, and application slowdown means loss of customers.

At the same time, it is hard to achieve end-to-end visibility of production environments because they span data centers, vendors, and even internal IT teams. Sometimes the only group that can help troubleshoot problems for an application is the development team; this diverts the time of important resources.

As a result, IT departments remain mired in the present, tied to keeping the application up and running, and expected to avoid problems from the past. Looking strategically toward the future is a luxury many can’t afford, despite the constant demands on IT for the newest and latest.

In this white paper CITO Research examines how Rackspace® Critical Application Services can help clients achieve end-to-end visibility of their application environments, maintain high performance, and help prevent applications from crashing, all at a reasonable monthly cost. With a 100% production platform uptime guarantee, experienced webscale engineers on staff offering Fanatical Support®, and enterprise-class monitoring tools from CA Technologies offering end-to-end application and infrastructure visibility, Rackspace Critical Application Services is a credible and compelling alternative for achieving top application performance.


One of the major challenges that IT departments face today is identifying and mitigating application performance problems. When asked about the most critical issues facing IT in 2013, Sven Hammar, CEO of Apica, responded this way: “As mobile Internet usage steadily increases to exceed desktop usage, and as web applications become more complex, application monitoring and more specifically the ability to detect and locate the origin of performance problems will be a challenge for IT organizations. 2013 will be the year when performance monitoring will shift focus from just ‘Is it up or is it down?’ to providing agile support for performance status and optimization of applications both in the cloud and on local enterprise networks.”1

Why is this challenge so great? Simply because the consequences of not addressing it are greater than ever. If an application being down for an hour results in significant losses of revenue ($500,000 to $1 million or more) or collateral damage to the brand, there can be no question that application performance is placing a significant stress on the IT department.

Such high stakes keep IT tied up in the present—trying to keep things running, and in the past—ensuring that infamous outage from last year doesn’t happen again, or to our company.

The strain of trying to keep everything running in the face of dynamic demand indicates a need for end-to-end visibility across environments. Visibility will enable you to identify problems before they become problems, and will also allow you to anticipate demand spikes and add capacity where needed.

CITO Research has determined that IT departments can reap large efficiency gains from working with a qualified managed operations provider. In this paper, CITO Research examines Rackspace Critical Application Services, which offers:

  • 100% production platform uptime guarantee
  • Experienced webscale engineers
  • Application performance monitoring for online applications that meet certain criteria

This may prove to be a cost-effective solution for enterprise IT departments with one or more critical applications.

The Strain on Applications (and Enterprise IT)

Enterprise IT departments are already stressed, and this sometimes shows up in avoidable human errors. Through 2015, 80% of outages impacting mission-critical services will be caused by people and process issues—not the failure of the core technology—and more than 50% of those people/process issues will be change/configuration/release integration and hand-offs.2 The developers on critical applications too often get involved in maintaining or patching the infrastructure, resulting in less time for developing the app that is driving revenue and a greater diversion of scarce resources.

Furthermore, IT departments must stay abreast of a barrage of new developments in technology while maintaining and supporting an existing complex IT landscape. Often no single person has an end-to-end view of the application across environments. It spans vendors, it spans infrastructures (cloud and on-premise), and it spans specialized IT teams. Many IT departments are split by expertise—here a network administrator, there a DBA, there a storage expert—and each has a tool optimized for his individual area. Maintaining a critical application often requires all of these areas to coordinate with vendors and with each other.

When there are problems and outside vendors are brought into the mix, finger pointing results. The end-to-end view is so massive, interdependent, and dynamic that it is hard to maintain a single source of information about it.

To top it off, rogue managers often do an end-run around IT and use credit cards to fund new cloud initiatives, adding further complexity to the environment and reducing visibility.

Finally, ensuring application performance and uptime translates into a heavy personal load on IT staff who must be available to provide support after hours and on holidays.

Rackspace Critical Application Services: A Cost-effective Alternative

A performance monitoring solution for an in-house production environment that runs 24x7x365 can easily cost $500,000 up-front, before anyone works an hour. A consulting firm or systems integrator could create an application hosting and monitoring environment for a several million-dollar consulting fee for a six-month-plus engagement, plus licensing costs of around $250,000.

To provide a cost-effective alternative, Rackspace worked with CA Technologies to offer Rackspace Critical Application Services. IT operations improve when Rackspace webscale engineers augment inhouse IT staff, freeing them to perform tasks that contribute to innovation and top-line growth.

Critical Application Services provides a 100% production platform uptime guarantee on approved high-availability environments, something very few IT departments, let alone hosting providers, can offer. (The production platform uptime guarantee supplements Rackspace’s traditional 100% network uptime guarantee.) Using monitoring tools from CA Technologies, Rackspace’s experienced webscale engineers monitor and manage all layers of the stack, including applications, operating systems, servers, and networking. This experience is enveloped within Fanatical Support: employees devote themselves to ensuring an excellent customer experience, are intimately conversant with their customers’ environments, and directly answer phone calls from clients.

An Extension of the IT Team

Rackspace Critical Application Services saves time, money, and personnel effort for IT departments striving to support high-performance applications. IT departments are viewed as cost centers, yet are always being asked to do more. Saddled with maintaining, configuring, and patching existing IT assets, IT is also being asked to help the business launch tomorrow’s cloud, social, and mobile apps, often with yesterday’s budget and staff. When experienced staff leaves, they aren’t always replaced, and their experience and knowledge walk out with them. Fewer resources mean heavier workloads, less time, and lower levels of staff experience. Under pressure to do more with less, IT struggles with managing multiple vendor environments, including a near-constant barrage of patches, upgrades, and requests to provision resources while juggling business requests to launch new applications and services.

For example, the CA Technologies suite of applications help Rackspace identify any application component that causes trouble—even those that are not owned by Rackspace or the customer. That means IT spends less time putting out fires and more time working on innovations that drive the business forward. Rather than license CA Technologies or a similar platform and attempt to deploy their applications on their own hardware, with all the maintenance and support time that implies, enterprise customers can simply deploy the support, managed operations environment, and monitoring service as a complete package, letting Rackspace become an extension of the IT team—the part devoted to “keeping the lights on” and reducing IT stress. Rackspace can take the call at 2AM when there’s a problem with your high-end production environment instead of someone on your staff. Ray Velez, Global CTO of Razorfish said, “Rackspace isn’t just another one of our vendors; they are partners and an extension of our team. They treat our clients’ needs as a top priority, and move at an agency’s pace. The countless hours they spend helping us architect the right solutions, and their desire to help us work more efficiently is just a small part of their first-class Fanatical Support.”

Anatomy of a Guarantee

Rackspace Critical Application Services offers a dedicated team of webscale engineers, application and infrastructure performance monitoring, and ongoing optimization and guidance on deployments. Their approach to application and infrastructure monitoring is threefold:

  • Infrastructure Discovery - The team and its technology automatically discover infrastructure and map the relationships between infrastructure and the applications it serves. This step is key to providing end-to-end support. Armed with a complete and up-to-date picture of the infrastructure, Rackspace can advise developers about dependencies when new components are deployed.
  • Proactive Performance Monitoring and Root Cause Analysis - Using CA Technologies’ industry-leading root-cause analysis models, Rackspace webscale engineers can correlate thresholds and alarms and predict performance issues in advance. The team can pinpoint potential issues anywhere in the stack, from the transaction layer and end-user experience down to network switches. If they don’t meet their first goal, which is to isolate and fix problems before the customer picks up the phone, the team then helps IT prioritize, triage, respond to, and remediate the issue. Critical Application Services gives IT 360-degree visibility across the entire application and hardware stack.
  • Traffic Analysis and Predictive Capacity Planning - The team works to understand net- work and application behavior, including application and resource consumption rates, allowing them to predict performance bottlenecks before they happen. Rackspace also works with procurement to provide hard data about where resources will be needed given application trending, data that can also show where resources can be redeployed or decommissioned if the application is overprovisioned.

The Rackspace Critical Application Services team has been building high-availability Managed Operations solutions with CA Technologies for several years and has developed high-performance reference architectures that can support a wide array of applications and infrastructures, both dedicated and hydrid, on-premise and in the cloud. Rackspace Critical Application Services also includes support for particularly challenging platforms that require multiple instances or extensive configuration. Figure 1 shows how Rackspace Critical Application Services provides end-to-end visibility.

CA Technologies’ Capabilities

While it’s certainly possible to assemble a toolset from scratch, it’s not likely to be cheaper or faster than doing so through Critical Application Services. Very few enterprises have CA Technologies’ ability to support both network-aware application management and application-aware network management—which together offer a complete understanding of how application demands affect the performance of the network and different network configurations affect applications.

CA Technologies is deployed in ten of the largest banks in the US, with good reason. The company is known for its patented, model-based root cause analysis library, with more than 10,000 pre-defined issue models. These models are built into each component of the CA Technologies suite, and Rackspace’s webscale engineers are trained to use them. This can substantially reduce mean time to repair—whatever the problem is, Rackspace webscale engineers have probably seen it before.

Rackspace chose the CA Technologies suite for two reasons:

  • It provides capabilities for monitoring across data centers, so applications that cross environments can be effectively managed, essentially offering a single pane view into performance.
  • It provides comprehensive capabilities to monitor everything that the Rackspace webscale engineers need to see in order to support the complete application stack.

Because the CA Technologies suite is comprehensive, it is also complex and difficult to implement on your own, requiring multiple admins in addition to software licensing costs and ongoing maintenance. Use of the CA Technologies suite is built into the Critical Application Services offer, which costs roughly the equivalent of one full-time employee per year.3

The following components of the CA Technologies suite are built into the Rackspace Critical Application Services offer:

  • Wiley is an application performance management (APM) component that measures transaction performance. Wiley understands all of the implications of a transaction in a given application from the end user to the data center. It can dissect, analyze, and determine the root cause of transaction failures as well as detect potential failures and predict performance degradation.
  • eHealth is a network performance monitoring (NPM) solution that tracks bandwidth utilization, processor utilization, and errors in hardware at the infrastructure level. eHealth detects and predicts problems with routers, switches, servers, and load balancers and communicates issues that are likely to impede application performance. Heuristic analysis of each part of the infrastructure at time intervals allows the team to observe how an infrastructure failure may affect one or more applications, enabling proactive resource allocations that can head off crises.
  • Spectrum is an event aggregation and correlation agent. Spectrum models the entire application stack and uses the model to identify events and calculate their impact. It collects native hardware and OS alerts as well as signals from Wiley and eHealth and isolates the root cause of issues that would take hours to diagnose by other means.

Been There, Done That

The Rackspace Critical Application Services team devotes itself entirely to fully understanding customers’ performance requirements, application environment needs, and dependencies and builds out an appropriate plan to monitor and optimize performance efficiently. As a result, the IT professional can focus on his customers, not his infrastructure.

The experience of the Rackspace team, in not only the CA Technologies platform, but all platforms used in the stack, means enterprise IT can be relieved of staying on top of the latest solutions, patches, and upgrades. Rackspace’s experienced team and proven technologies have been validated by the industry. With more than 190,000 customers and $1 billion in annual revenue, Rackspace ranks as the leader in the Gartner Magic Quadrant for managed hosting providers, for the sixth year in a row. With 190,000 customers, the Rackspace webscale engineers have seen almost any kind of problem you can imagine and know how to resolve it.

As a proactive complement to IT departments, Rackspace Critical Application Services can offer the agility required of “startup” projects, with the visibility and control offered by CA Technologies, predicting peak loads as much as three weeks in advance.

With enterprise-class CA Technologies application and network performance monitoring, customers can take advantage of this combination of support, expertise, and tools, maintaining critical applications for less than the cost of acquiring the application and network performance monitoring tools on their own. Instead of having a shared IT service environment, the business gets a dedicated team focused on key production environments. Instead of finger pointing and delays in resolving a complex, multi-platform, multi-vendor issue, there is one number to call, even at 2AM, and there is always someone at Rackspace who knows the details of the specific deployment on the other end of the line.

In addition to saving money, IT saves time. Critical Application Services has the visibility to precisely pinpoint problems, letting the remediation, whether a different SQL statement or replacement of a lagging server, be handled as quickly as possible. Instead of trying to take on everything with limited resources, IT can refocus high-value personnel on key projects to drive the business forward and avoid having developers troubleshooting infrastructure.

The fact is that for the most part, IT is constantly forced to stay in the present, keeping everything running. With expertise from Rackspace Critical Application Services, IT can turn to the future and work on strategic efforts and innovative new projects. (Several Critical Application Services clients have found that they now have time to build the mobile applications everyone has been asking for.) IT can focus on analysis and providing recommendations to the business, rather than simply monitoring and collecting data. They get access to experienced webscale engineers who speak their language and have expertise in the tools they use. In essence, outsourcing performance monitoring of critical applications allows IT professionals to sleep at night—for many, it could be the first good night’s sleep in a long while.


Business users, partners, and consumers expect instant gratification from their online applications, and the pressure is on IT to provide it. Charged with cutting costs, adding new skills, testing and configuring new products, and monitoring and maintaining the technologies they already have, IT departments are stressed.

CITO Research has determined that Rackspace’s Critical Application Services offers a credible and compelling solution by transferring the most laborious and complex tasks of IT to a monthly service. Rackspace Critical Application Services combines a high level of support and expertise, including a 100% production platform uptime guarantee, with experienced webscale engineers and best-in-class monitoring technology that is normally out of reach for all but the largest enterprises. With solutions such as this in place, IT departments can return to designing and delivering innovative projects that create top-line value for the business and looking to the future, planning strategic initiatives. As Vashdev Vangani, Department Manager of Information Technology at Mazda North American Operations said, “The Critical Application Service team is a huge value-add to my business. Before, we had to react; now, we can proactively scale for future demand.”

Aided by the expertise of two Gartner Magic Quadrant leaders, IT organizations can get affordable help in keeping critical revenue producing applications online and running at peak performance.


1 http://www.business2community.com/tech-gadgets/the-future-of-enterprise-it-30-executives-share-their-2013-predictions-0365559#ogMQ2uUKTzscK0j4.99
2 Ronni J. Colville and George Spafford, “Configuration Management for Virtual and Cloud Infrastructures,” Gartner, http://www.rbiassets.com/getfile.ashx/42112626510
3 Ted Chamberlin and Lydia Leong, “Magic Quadrant for Managed Hosting,” Gartner, March 5, 2012, http://www.gartner.com/technology/reprints.do?id=1-19K9FPH&ct=120305&st=sb

Continue the conversation in the Rackspace Community.