My previous post, NFV Business Case Digest, shares a variety of sources commenting on what it takes to deliver the business case for NFV (Network Functions Virtualization). I am now focusing on findings recently presented by Alcatel-Lucent’s CloudBand Consulting Practice and Ecosystem Program.
This 30 minute video features Joaquin De La Vega Gonzalez-Sicilia and Valery Noto‘s introduction and recap, which can be viewed on the corporate YouTube channel. Here is the link to the project’s whitepaper: “Business Case for Moving DNS to the Cloud.”
I checked out the “transcript” feature on YouTube, but quite a bit of the stuff appears to be garbled. So, I’m volunteering a transcript (below) though not a 100% verbatim and I’m adding a bit of color commentary here and there. These charts also show a more recent presentation version. I will be referring to this post in my incoming presentation at Software Telco Congress on Tuesday.
Minute 01:53 – This study was performed in cooperation with at tier 1 service provider planning to replace existing DNS (Domain Name System) service infrastructure as it reaches end of life. This case study was set to address whether migrating to an NFV model would improve TCO (Total Cost of Ownership) compared to simply replacing them, which the report’s key findings confirm. The existing DNS deployment is comprised of a total of 104 servers. The team defined two options:
- [A] Replacing the existing infrastructure with new COTS (Commercial off the Shelf) x86 servers and keeping the current PMO (Present Mode of Operations).
- [B] Implementing the NFV model with 11 small CloudBand Nodes instead (132 servers), and CBMS (CloudBand Management System) with a new FMO (Future Mode of Operations).
This analysis looks at OPEX (Operating Expenditures) such as: capacity growth, software upgrading and healing processes. The initial focus was to first understand how these processes were carried on under the current PMO. New efficiencies under the FMO leveraging CloudBand impact items such as ‘lead time’ and ‘manpower’ given the introduction of task automation under option. OPEX’s infrastructure related costs include: floor space, power, cooling and maintenance. CAPEX (Capital Expenditures) focuses on procuring hardware infrastructure.
The main finding is that even a simple application such as DNS would benefit enormously from running on an NFV platform: processes are greatly simplified with leaner and automated operations, OPEX and CAPEX are reduced significantly, all contributing to dramatic TCO (Total Cost of Ownership) savings coupled with unparalleled operational and business agility.
I would like to add that ‘agility’ mitigates risks and opportunity costs. Conventional architectures force network operators to make decisions among apparently mutually exclusive opportunities. This is due to systems that are complex and costly to operate and scale. Basically, this means forgoing market opportunities due to structural shortcomings rather than by making strategic decisions, which raises opportunity costs and undermines competitiveness.
Additionally, the fact that a network operator can get the NFV journey started with a one application, DNS in this case, helps with delivering an early success story. Network operators of a variety of sizes can reap many of the benefits enabled by working with software defined environments, as proven by enterprise cloud computing already.
Minute 05:20 – Let’s start by highlighting 6.8 to $9.8 million in savings when migrating to option [B] vs. option [A].
Under option [A] a network operator’s investment in infrastructure is typically spent on dedicated systems only running DNS. However, with option [B]’s NFV model different network functions, e.g. DNS, AAA (Authentication, Authorization and Accounting), PCRF (Policy and Charging Rules Function) just to name a few, can share infrastructure and tap into a pool of resources and spare capacity.
An application multi-tenant environment makes the most of available resources and minimizes costly duplication. Therefore, option [B]’s best scenario is one where the underlying infrastructure costs are allocated across a given set of applications based on what they each consume. This is a utility business model. The cost related to the capacity (compute, storage, networking, etc.) used by DNS delivers the $9.8 million in savings shown above. From a DNS standpoint, everything else is idle capacity cost that under the NFV model should be considered separately. That is capacity that the service provider would manage to achieve high server utilization levels to maximize ROA (Return on Asset).
Nonetheless, saving $6.8 million is impressive enough. This figure assumes virtualization and automation, though the servers would only run DNS. That is the ‘no-multi-tenancy’ sub-option.
Minute 08:07 – The team analyzed 6 different categories, 5 are related to OPEX as shown above, CAPEX focusing on hardware infrastructure.
This chart’s upper row outlines FMP’s:
- option [B1] – upper row – which involves no-multi-tenancy as the system is dedicated to running DNS only
- option [B2] – bottom row – factors multi-tenancy, where the infrastructure and platform are shared by several applications in addition to DNS.
Reading the figures:
- option [A] – PMO’s cost exceeds $14 million and is incurred with neither virtualizations nor cloud technologies, the same number is shown in both rows.
- option [B1] – in the no-multi-tenancy-scenario all of the costs are allocated to the DNS deployment: this amounts to $7.7 million, a 49% reduction.
- option [B2] – when the business case accounts for what DNS is planned to consume, the cost comes down to $4.5 million, an even more dramatic 70% reduction.
- The difference between [B1] and [B2] is the cost of idle capacity from a DNS’ business case viewpoint.
- Capacity growth process: a 48% reduction, there is no difference between [B1] and [B2]. The phases currently followed (PMO) by the service provider when scaling are: planning, ordering and all of the cabling, installation and configuration tasks performed on site. We need to single out year 1 in this 5 year planning period because new infrastructure is being deployed. However, under option [B] we leverage year one’s infrastructure to scale in subsequent years with VMs (Virtual Machines). Options [A] and [B] show similar lead times and costs in the first year. Option [B]’s NFV’s scaling makes a big difference in years 2+ when compared option [A]’s deployments based on just adding more hardware. FMO delivers up to a 98% reduction in lead time by downsizing tasks such as site surveys, which is kept down to verifying capacity using management tools already in place. There is no longer a need for installing new physical infrastructure. In contrast, adding more hardware takes, most typically, a lengthy supply chain process. Networking wise, there is a need for checking the availability of IP addresses but deploying applications is done automatically by means of recipes. That only takes minutes, opposed do weeks and even months for conventional deployments.
- Software upgrading process: (see next chart).
- Healing process: an 86% reduction with the stages being: identifying the failure, triggering and executing the solution and conducting root cause analysis to prevent future issues. CloudBand automatically identifies and troubleshoots by spinning up a VM taking things over from the one that’s either down or malfunctioning. No manpower, human error and/or process latency with automated healing. The operations team gets involved in post-mortem root case analysis and introducing FMO implies a learning curve.
- Floor space, power and cooling: a 58-89% reduction; load balancer virtualization eliminates half of the space required by DNS. Power consumption is also positively impacted by addressing growth with no additional additional servers. Calculations behind cooling factor a 1:1 correlation to power consumption: same savings assumed for power were applied to cooling.
- Software licenses and maintenance: 23-47% mostly thanks to a reduction in hardware maintenance, best results achieved when looking at multi-tenancy.
- Infrastructure: 56-88% where the most significant advantage entails lower footprint, higher utilization levels and ROA over TCO’s five year period.
Minute 20:11 – As an example, the software upgrading process entails: planning, obtaining the software, testing, installation and configuration with a 77% reduction in lead time.
Under PMO testing, installation and configuration can be complex and tedious. Today, this is usually done by taking advantage of opening maintenance windows at night. Lead time grows with expanding footprints as more servers have to be installed to meet growing traffic needs.
Under FMO there are no major changes with regards to planning and procurement, but testing does no longer require setting up new labs. This can be done with sandboxes instead. Installation and configuration deliver the bigger changes: generic recipes that automate the deployment are provided by the application (DSN) vendor which might require some customization work (a one time effort), then all servers can be automatically upgraded at once by pushing that recipe.
Last but not least, the whitepaper also includes a sensitivity analysis on page 16 which points to 35% in total savings when looking at a conservative scenario agreed upon with the service provider collaborating in this analysis.
Picture credits: charts courtesy of Alcatel-Lucent.