Tagged: Management and Orchestration

Lean NFV Ops: scaling basics.


“Just-in-Time means making only what is needed, when it is needed, and in the amount needed […] it is necessary to create a detailed production plan […] to eliminate waste, inconsistencies, and unreasonable requirements, resulting in improved productivity.”Just in Time, Philosophy of Complete Elimination of Waste byToyota.

“Each unpredictable feature demanded by customers is considered an opportunity […] this requires rapid adjustment of production capability. Dynamic and flexible network utilizations in functional modules can maximize the strength of each resource and the overall risk and costs are reduced.”Flexible Manufacturing System for Mass Customization Manufacturing by Guixiu Qiao, Roberto Lu and Charles McLean.

“Providing capacity in a more expedient fashion allows us to deploy a functioning and consumable business service more quickly […] at the core of our self-service functionality is a hosting automation […] On-demand self-service is a critical aspect of our cloud environment; however, without underlying business logic, controls, and transparency, an unconstrained on-demand enterprise private cloud will quickly exceed its capacity by doling out allocations beyond its supply.”Implementing On-Demand Services by Intel.

“Elasticity is commonly understood as the ability of a system to automatically provision and deprovision computing resources on demand as workloads change […] in a way that the end-user does not experience any performance variability.”Elasticity in Cloud Computing: What It Is, and What It Is not by Nikolas Roman Herbst, Samuel Kounev and Ralf Reussner.


image

This past few months I’ve followed a few discussions on virtualization and scalability.

There is such a thing as becoming a victim of success when pent up demand strikes and a business fails to scale accordingly.

Capacity management has typically prompted over-engineering decisions and long lead times taking a year or more in the telecoms industry. This can result in concerns about delayed breakeven points, underutilizing precious resources as well as limited offerings due to the higher cost of oversubscribing.

Lean means staying nimble at any size, streamlining and keeping lead times as short as possible by design. Effective and efficient capacity management relies on understanding economies of scale and scope. The first relates to achieving larger scales triggering more efficient utilization levels and, therefore, lower and more competitive average costs.

Scope means taking advantage of synergies and common infrastructure and platforms to deliver a variety of services, application multi-tenancy being an example in NFV’s (Network Functions Virtualization’s) context.

Active portfolio management follows: complementary application lifecycles can share resources and raise overall utilization levels in the process. Moreover, some applications can be deconstructed and modularized so that specific subsets become standalone services available to (or reused by) other applications. These can be decoupled to join a common pool and scale independently.

In some discussions we refer to growth models where “scale” follows a “vertical” approach while “scope” adds breath with new functions and is, therefore, a horizontal expansion model. This breakdown allows for plotting and segmenting growth/de-growth scenarios in a simple matrix. I am experimenting with new ways of helping visualize these concepts. This is work in progress and the final result will look different from early drafts poste here. Though, I think that they can be used for the time being.


One other thought… elasticity relates to following demand curves: offer meets demand by dynamically adapting capacity. This entails provisioning, deprovisioning and a virtuous circle by means of gracefully tearing down resources, which are freed up and exposed for other applications to leverage. Elastic computing seems to make us think of unlimited just-in-time capacity, but there are upper and lower boundaries involving diminishing returns. It just so happens that virtualization has pushed the envelope by considerably widening and shifting these constrains.

It is worth reflecting on Gordon Moore’s law in this context: many incremental and disruptive innovations yield exponential performance improvements in today’s cloud age. That can be coupled with NFV’s (Network Functions Virtualization’s) shift from lengthy lead times, cumbersome operations and costly dedicated hardware to automated systems working with a wide supply of more affordable COTS (Commercial of The Shelf) hardware and open source solutions.


image

Let’s now focus on the notion of service decomposition and how that impacts scaling.

This exercise often starts with deconstructing monolithic systems typically relying on vertically integrated architectures, then looking at the actual services involved, dependencies, flows… and figuring out what is best to keep integrated vs. modularized, centralized vs. distributed.

This also entails looking at opportunities for what it takes to streamline development time, such as code reuse and processes worth exposing by means of API (Application Programming Interfaces). Note that many applications do not need to duplicate assets and can become distributed systems consuming resources and processes running elsewhere.

In this section’s graphic, the application is a VNF (Virtual Network Function) which has been decomposed and right-sized to run in three different VMs (Virtual Machines) of different volumes instead of procuring a single physical server for just this application.

Lighter gray blocks at the back end present a pool of services available to that and other applications. As an example, when decoupling an application’s logic from the app’s data we get to leverage DaaS (Database as a Service) as one of the shared services.

These are the “scaling” terms provided by ETSI (European Telecommunications Standards Institute) NFV reference documents:

  • Scaling up: extending a resource (compute, memory, storage) to a given VM.
  • Scaling down: decreasing resource allocation.
  • Scaling out: creating a new instance, adding VMs.
  • Scaling in: removing VMs.

Circling back with service decomposition: there are scaling scenarios where there is no need to go through the trouble of scaling out an entire application, but just a specific service at stake, such as one of the VMs or the database in the previous example.

In some other scenarios scaling can prompt application updates and/or upgrades to enable new functionality. Suitable “upgrade windows” can be hard to find when multiples services are in demand and expected to remain always-on anytime. A stateless architecture means that the session’s state is kept outside of the application, with the shared database in this example. Traffic can be redirected to an application’s mated pair, this is a second instance which was kept on active standby mode until the maintenance event.

This also means going beyond 1+1 models where everything is duplicated (mated pair concept) for failover sake. There often are more efficient n+k systems in HA (High Availability) environments. Note that, paradoxically enough, rolling out upgrades happens to be a primary source of maintenance issues thereafter, adding to the need for sustaining service continuity at all times coupled with zero touch and zero downtime.


Zero touch is delivered by automation, which relies on continuous system monitoring, engineering triggers and preceding work with recipes, templates and/or playbooks (these are alternative terms based on different technologies) detailing what needs to happen for to execute a lifecycle event. Scaling is the subject of this post and onboarding, backup, healing, termination are other lifecycle events just to name a few more.

Programmability drives flexible automation, which is data driven and based on analytics. Predictive analytics goes a step further to project and address trends so that actions can be taken in advance. In our Lean NFV Ops demonstration we purposely stimulate network traffic with a load generator to exemplify this. We run scenarios illustrating both (a) fully automated scaling and (b) autonomation by switching to manual controls that put the operations team in charge at every step.

Autonomic computing is powered by machine learning. Research on NFV autonomics points to the ability to self-configure, specially so under unplanned conditions. Looking into automation and distribution modes helps define maturity levels for NFV, that being a topic for another article.


image

Let’s zoom out to discuss scaling in the context of the platform.

ETSI NFV defines MANO as the Management and Orchestration system. “Managing” refers to addressing the application’s lifecycle needs, scaling being one of them. The notion of “orchestrating” focuses on the underlying resources to be consumed.

The MANO layer is thought out as NFV’s Innovation Platform, which I show in purple color: the thickness of that layer conveys the degree to which an application uses more (right) or less (left) of MANO’s capabilities. This is an application multi-tenant environment where VNF1 shows a monolithic app example in contrast to VNFn which is meant to take full advantage of MANO’s automation.

This cross-section shows a horizontal architecture as the platform supports multiple applications as well as back end systems. Horizontal and vertical solutions scale differently. A common platform presents à la carte features and start small, growing and scaling to enable homogenous end to end management across the applications, while the monolithic approach moves forward with siloed operations on an application by application basis.

One more example, growing by adding interdependent services is a discouraging endeavor when reconfiguring multiple functions becomes overwhelming. SFC (Service Function Chaining) comes to the rescue in a virtual environment by providing network programmability and dynamic automation to create networks connecting new services. NFV’s scaling needs make a good case for SDN (Software Defined Networking), the technology behind SFC.


image

Now moving to what’s under the hood.

NFVI stands for Network Functions Virtualization Infrastructure. Most typically, what we can see and touch is a data center environment providing resources consumed by the applications such as compute, memory, storage and networking to begin with.

The visual in this section shows a conceptual server farm right under the platform. Blue nodes on the left and brown ones on the right are physically placed at different geographic locations, yet forming part of the same NFVI orchestrated by MANO. The gray one is being added: scaling out of the existing infrastructure. The green node lays outside and can be leveraged when bursting:

  • Scaling out: adding more servers (gray cube).
  • Scaling up: leveraging clusters and/or distributed computing to share the load (blue and brown cubes).
  • Bursting: tapping into third party infrastructure to address capacity spikes (green cube).

Note that, in this context, scaling up can also mean upgrading servers to handle larger workloads. This can also be about using an existing chassis while replacing a server with a new node featuring more processing, data acceleration, lower energy needs, etc.


Early on we talked about COTS’ being easier to scale out when compared to proprietary dedicated hardware. It has partly to do with standardization, centralized management and consolidation, the existing supply chain for x86 systems and node automation.

We can also factor consumption based models where a given application’s business case is not impacted by up-front CAPEX (Capital Expenditures). Instead, the application business case accounts for resource usage levels which, once again, benefits from economies of scale and scope. The notion of elasticity makes infrastructure planning transparent to the application.

Capacity and performance management skills remain of the essence: the move to applications based on stateless architectures means that scaling distributed applications places a greater emphasis on API behavior by addressing capacity and speed in terms of RPS (Requests Per Second). And, nonetheless, the telecommunications industry is known to require high capacity, low latency SFC, which is driving data plane acceleration solutions.


image

We can now zoom out.

Scaling is not a new thing or need. Conventional architectures can scale, they just don’t do it fast or effectively enough in a cost effective fashion. Taking months and years to get the job done risks missing markets and taxing resources which would have been needed to create innovative services.

Admittedly, one of the objectives behind writing this was wrestling with jargon by outlining “scaling” terms in context, whether related to application, platform or infrastructure. Hopefully, that goal was accomplished. Otherwise, please let me know.

One other thought… NFV is a change agent. Hence, cool technical wizardry alone does not suffice. We are discussing emerging technologies causing interest in connecting dots across behavioral economics (and not just the business case) and organizational cultures and decision making in the telecoms sector. Understanding the human factor matters.

As usual, I will be glad to continue the conversation by exchanging emails, over LinkedIn or in person if you happen to be around at IDF15, Intel Developers Forum, in San Francisco’s Moscone Center on August 18-20.

See the Cloud at BBWF 2014.


image

Alcatel-Lucent Stand B10


“The real benefit to virtualization is that we no longer have to think about networks in the “fixed” sense. Virtualization is more about being able to dynamically change and scale the network based on the ebbs and flows of traffic on the network. So instead of experiencing a lack of capacity during specific times of the day in certain locations, you can ramp up the capacity when and where you need it to meet demand and quickly scale it back when it’s not needed. The building blocks for this architecture are IP and Cloud technology — both of which can be delivered “virtually” in a datacenter.”

If there is anything that everyone can agree about — it is that Cloud is here to stay. [Network operators] want to accelerate investment at both the heart of the network IP to Ultra-Broadband Access, either fixed or mobile […] we just announced at this show a new wireline offering called micro-nodes. These are similar to a mobile ‘small cells’ in that they bring fixed ultra-broadband closer to users to massively add capacity where it is needed.  That is the truly surprising aspect of this industry. Innovation is happening everywhere — in the applications, in the cloud and certainly still in the network whether it is virtualized or not.”

Interview with Michel Combes, CEO, Alcatel-Lucent.


image

http://alcatel-lucent.cvent.com/BBWF2014


We are gearing up for BBWF. Alcatel-Lucent’s booth will feature seven simultaneous demonstrations. I will be there at Area 3 Demo 1 to discuss VoLTE (Voice over Long Term Evolution) in the context of NFV (Network Functions Virtualization) and the Carrier Cloud.

Long story short, our industry is fulfilling expectations on the so-called “the network is the platform” paradigm. This is now possible by adopting cloud computing technologies and opening what effectively become shared resource pools consumed on demand via API (Application Programmable Interfaces).

This model advances by taking down silos as vendors such as us (Alcatel-Lucent) and our partners can now rapidly deploy applications leveraging the same infrastructure and platforms.

Not amused yet? Compare that to lengthier and far complex conventional deployments where more expensive, tightly integrated software stacks and black boxes have come to depend on dedicated hardware and fragmented management, not to mention issues arising from redundant functions, overhead, mind boggling trouble shooting and poorly utilized systems negating ROA (Return on Assets).


Here is a quick teaser, these are some of the concepts I will address at BBWF:

  • SOFTWARE CENTRIC – having decoupled control and data planes, application logic from service data, we are now immersed in a more dynamic an agile software defined service environment.
  • OPEN & EXTENSIBLE – the network becomes an abstraction that is transparent to the developer, enabling the application to consume resources and leverage processes via API.
  • MULTI-TENANT – the underlying fabric and resulting cloud communications platform allow for application multi-tenancy and achieve so under a multi-vendor’s best of bread model.
  • AUTOMATED – a catalog presents applications which can be automatically onboarded via templates / recipes, then provisioned and deployed in no time.
  • SCALABLE – applications, whether VNFs (Virtual Network Functions) or end user facing online services benefit from dynamic service chaining, meeting demand curves by growing and degrowing as needed.
  • PROGRAMMABLE – programmability and context awareness drive the orchestration of events and resources involved in end to end lifecycle management, which then become subject to automation and iterative improvements.
  • PLACEMENT – the efficient placement of loads entails analytics and algorithms understanding when is best to centralize and consolidate or to leverage distributed systems, all based on capacity management rules and SLA (Service Level Agreements).
  • ECONOMICS – feature and performance parity are achieved under leaner and cost efficient operations, a smaller physical footprint, and significantly lower opportunity costs and, therefore, risks.

As Michel pointed out in the above interview, the telecommunications industry is no longer at a cross-road pondering what the future might entail. After having been involved in cloud projects for the past five years, it is clear that we are facing a point of no return, which is forcing a path forward. The industry’s cloud journey has already begun.

This is not about “drinking the Kool-Aid” and marketing spin. Most would agree that legacy systems that served the industry well can now lead to network bloatware and a patchwork, unable to gracefully scale as demand shifts. Many would also nod their heads when acknowledging that the above concepts are not really new to the telecommunications industry: them all have been broadly discussed in the past twenty years or so. But, this time around, current attempts benefit from dramatically different economics, a new ecosystem, lessons learned and proven IT models that are leading the way.

Given the fact that we are discussing emerging technologies and disruptive rather than incremental innovation, this is not without the usual challenges involving next generation systems. So, in our discussion we will also look at maturity levels, technology readiness, pivoting in the midst of market changes, talent, organizational behaviors and lessons learned, and I will be happy to discuss NFV economics.


image


AREA 3 DEMO 1 – VoLTE Innovation and Growth – What Services to Launch.

“Alcatel-Lucent’s market-leading Cloud Communications Platform combines proven, robust VoLTE performance together with open innovation based on the emergence of new technologies such as WebRTC and IMS APIs. Complementing the solution, Alcatel-Lucent also brings a dynamic ecosystem of developers who use our Web Developer Portal and Sandbox to develop and test new compelling applications and services.”

“For this Broadband World Forum edition we have especially selected some of those apps for you. Come see Alcatel-Lucent’s “VoLTE Innovation and Growth” demonstration and embark on a new communications journey of competitive, innovative apps that enrich the subscribers’ broadband experience (fixed & mobile), thereby helping you to deliver new services to the market faster and make the shift to today’s data-dominant revenues. Through this demonstration you will see with our Cloud Communications Platform how you can:

  • Use Network APIs to enrich VoLTE service with added features
  • Use WebRTC client APIs to extend VoLTE service to the web (second screen such as a tablet, PC, TV, etc.)
  • Mix Client and Network-side APIs to integrate Communications into Business Process (Communications as a Feature) and deliver an integrated CRM experience across Fixed and Mobile (Saleforce.com integration).”

image


AREA 3 DEMO 2 – Dynamic Services with NFV and SDN.

“To gain competitive advantage, service providers are continually looking for creative services and enhanced service options. NFV provides the opportunity to dynamically deploy new functionality without causing service disruption or CPE change.  The demo will show how services can be rapidly composed, deployed and modified (service chaining) and how they are automatically scaled up to meet surges in demand. It will demonstrate how the network adapts automatically in tune with the changing service requests highlighting the value of integrated NFV and SDN.”


See you in Amsterdam : )

image