“In this panel our guests will discuss key issues facing orchestration providers, including the standardization efforts for descriptors and information models, intelligent workload placement options with Intel architecture, and operator priorities for NFV-O and MANO in 2016” – Intel Network Builders Summit. San Francisco, August 17, 1:30 pm.
From left to right: Jose de Francisco, Ted East, Bob Haddlenton, Valerie Noto. Alcatel-Lucent’s Lean NFV Ops Demo Station at IDF15.
The Lean NFV Ops Roadshow is back after a short summer break. Our team would like to thank Intel for inviting us to join Network Builders Summit, Partner Summit, and the prominent Developer Forum (IDF15) this past week in San Francisco. The above pictures were taken at Alcatel-Lucent’s station, which featured a live interactive demonstration. See my earlier post “Gearing for IDF15” for more information on that.
Additionally, you can see a new element: a Galileo powered retro Whack-a-Mole machine, WAM for short : )
Long story short, IDF15 attendees were able to have fun by testing our Lean NFV Ops system when playing this game, which is designed to create catastrophic failures.
On a serious note, the higher the game’s score the more critical the issues experienced. This exemplified zero touch healing and recovery executed in real time to sustain a High Availability (HA) environment. In these pictures you can see Intel’s Diane Bryant, SVP/GM Data Center Group, and Sandra Rivera, VP/GM Network Platforms Groups, putting our demo to the test.
I would also like to thank Renu Navale who moderated “The State of Management & Orchestration”, a panel discussion where I joined Ciena’s Recep Ozdag, Overture’s Scott Vandiver and Rift.io’s Tony Schoener in a fairly engaging discussion.
We discussed opportunities and challenges involving ETSI NFV’s Management & Orchestration (MANO) architecture, standardization efforts across the board, information models and enhanced platform awareness (EPA) among other topics.
There were 50+ people attending this session. Those providing feedback on this session mentioned that topics and viewpoints expressed there were followed with interest. My understanding is that the video will be available in a couple of weeks.
Regarding questions on the underlying fundamentals behind Lean NFV Ops: this link provides a brief paper on the topic just to begin with.
Generally speaking, “Lean” practices couple (a) effective quality management with (b) efficient end-to-end systems. That summarizes Lean’s paradigm across any industry we happen to look into, whether we are talking about manufacturing, services or software for that matter.
Our conversation on Lean NFV Ops has been well received since first presented this at Mobile World Congress earlier in the year. Proven interest has to do with addressing quality by discussing Reliability, Availability and Serviceability (RAS) and Quality of Experience (QoE) in the telecommunications context. These are critical success factors in an industry concerned with a carrier grade tradition facing which is now conflicting views when confronted with with compelling cloud economics.
I created the above chart on Lean NFV Ops to clearly show that we are optimizing for both Service Level (SLA) and Return on Asset (ROA) since High Availability (HA) and High Utilization levels shape dynamic network behaviors. Simply put, Lean NFV Ops is abut effectively delivering timely “service(s) of value” by operating at “resource efficient levels”. Making a return on the Communication Service Provider (CSP) investments (ROI) in NFV depends on achieving that. Otherwise, virtualization critics will keep reasonably doubting the need for any changes.
Equally important, sustaining a competitive advantage means that early implementations can be based on nimble architectures, which remain streamlined (and, therefore, stay lean vs. becoming bloated and over-engineered) as they rapidly change scale and/or scope. With that in mind, embracing Lean NFV Ops’ agile “continuous improvement” and making room for incremental innovation happen to be of the essence.
One more thought, Lean NFV Ops is not about a mechanical one to one implementation of known Lean principles. This conversation is centered on understanding the nature of the telecommunications sector so that we can come down to defining “how-to” guiding principles.
Lean \ˈlēn\ adjective: athletic, strong and healthy, absence of excess.
Circling back to IDF15, I am happy to share that we had good meetings with a number of partners throughout IDF15. Everyone’s feedback on Lean NFV Ops was very encouraging, most openly saying to be impressed by our end-to-end solution approach:
- Operations Support System (OSS) – Motive Dynamic Operations
- Software Defined Networking (SDN) – Nuage Networks
- Sophisticated Virtual Network Functions (VNFs) – Rapport
- Appliances deployed as VNFs – CloudBand Ecosystem
- All running on CloudBand 3.0, the platform discussed in this interview.
Tieto, a CloudBand Ecosystem Member, was also present at IDF’s Network Builders zone. Tieto’s team had onboarded a VNF on our platform and Daniel Nilson was kind enough to share his latest paper.
I would like to take this chance to give Ted East and our Cloud Innovation Center (CIC) team “kudos” to acknowledge the hard work going into all the timely system upgrades, which continue to make the Lean NFV Ops live demonstration an invaluable asset (WAM’s latest retrofit included : ) Last but not least, thanks to Andreas, Asaf, Debbie, Erez, Guy, Ken, Phil and Val for all the couching and guidance when preparing for IDF.
Click above to access this album on Flickr.
“Intel® Network Builders is an ecosystem of independent software vendors (ISVs), operating system vendors (OSVs), original equipment manufacturers (OEMs), telecom equipment manufacturers (TEMs), system integrators and carriers, coming together to accelerate the adoption of network functions virtualization (NFV)- and software-defined networking (SDN)- based solutions in Telco networks, public, private enterprise and hybrid clouds.” – About Intel Network Builders.
“We see IDF15 as a partnership. Intel and Developers/Makers/Technologists. We’ll share our vision and technology leadership […] Join us on August 18-20, San Francisco, Moscone Center.” – Intel Developer Forum.
Glad to share that our team is returning to IDF. We had a terrific experience last year and are looking forward to IDF15. This event is quickly approaching: just 11 days away at the time of writing this.
IDF14 was kind to us. In addition to opportunities to meet with customers and partners, as well as discussing the latest on Network Functions Virtualization (NFV), Alcatel-Lucent was featured as “Best in Show” jointly with Microsoft and Lenovo. Moreover, our CloudBand platform was the recipient of the Software and Services Award.
The above short video introduces the demonstration system that we are deploying this year. By the way, TelecomTV displayed this clip in a Proof of Concept (PoC) section created for NFV demonstrations from a variety of vendors.
However, please note that our demonstration is not a PoC. The Lean NFV Ops system features commercially available solutions from Alcatel-Lucent and our partners, all running on CloudBand 3.0 Clicking on the right picture will take you to Intel’s page on our platform.
CloudBand is comprised of two distinctive solutions: Nodes that can be easily deployed as part of the carrier’s Network Functions Virtualization Infrastructure (NFVI) and the prominent Management System which delivers the Management and Orchestration (MANO) platform.
By the way, NFVI and MANO are terms outlined in the NFV reference architecture provided by the working group focusing on this topic at the European Telecommunications Standards Institute (ETSI).
CloudBand’s node automation software runs on Commercial-Off-The-Shelf (COTS) hardware, these are x86 systems. Intel’s CPUs power Alcatel-Lucent’s Cloud Innovation Center’s (CIC) showcase.
At IDF15 we will also discuss boosting packet processing in context of the Data Plan Development Kit (DPDK) engineered to enable x10 performance. In the meantime, Alan’s article provides quick insights on what this mean to Virtual Network Functions (VNF) such as the Evolved Packet Core (vEPC). See reference links below.
This is relevant because the Lean NFV Ops demo deploys a fully virtualized and completely functional Voice over Long Term Evolution (VoLTE) system from the ground up. This needs the vEPC and the IP Multimedia Subsystem (IMS) working together.
Long story short, we’ll be making live 4G video calls onsite with this system, which we showed at IDF14 already. This time around our team will also conduct a number of sophisticated lifecycle operations involving maintenance events with full service continuity: zero downtime, all transparent to the end user mobile broadband experience.
Basically, you will see not last year’s PoC but a live demonstration system with real solutions in action. We are now operating in an end-to-end environment tested in real time by a variety rainy day scenarios. Additionally, we will cover high availability (HA), smart placement, dynamic scaling and root cause analysis (RCA) among other key topics. Last but not least, we’ll share Bell Labs’ research findings on automation and NFV economics. There is even more new stuff…
Better yet, instead of just making a VoLTE call with 4G phones as shown in the above video, we will be using Web Real Time Communications (WebRTC) as part of the experience. This means using ubiquitous web browsers on any kind of mobile device and/or conventional desktop.
My understanding is that RealSense comes from Intel Perceptual Computing looking into immersive communications and gesture based user interfaces. By using Personify’s application our demo captures, cuts out and projects the end user’s face and body which is then seeing as video overlay. This means that we can any backgrounds of our choice.
This can be experienced as a form of Augmented Reality (AR) where a person, who is at his/her home, is seen by the other user as if he/she was in a museum room: moving around, stopping next to pictures of interest and having a real time video conversation. This happens in the context of a video call where both end users are comfortably talking from home.
By the way, we’ll be brining another interactive gadget with us which works with Intel’s Galileo board. But, you will have to come to IDF to play with that one. Just ask for the “Whack-a-Mole” : )
We will be glad to meet at IDF. Feel free to stop by our booth and/or to schedule a meeting:
EVENT: Intel Developer Forum 2015.
VENUE: Moscone Center in San Francisco. August 18-21.
BOOTH: Network Builders Community #173.
I will be speaking at:
EVENT: Intel Network Builders Summit.
VENUE: The Westin Saint Francis. August 17, 1:30 pm. Room Elizabeth.
PANEL: “The State of Management & Orchestration (MANO)”
I also plan to attend the following two IDF Mega Sessions:
“5G: Innovation from Client to Cloud” with Sandra Rivera and Aicha Evans.
“Making The Future… with You” presented by Genevieve Bell.
See you there.
“This interactive demonstration shows the positive impact of agile service launch subject to Reliability, Availability, Serviceability (RAS) scenarios. It features an application centered system involving sophisticated Virtual Network Functions (VNF) and integrates Operations Support System (OSS), NFV’s Management and Orchestration (MANO) as well as Software Defined Networking (SDN) under a modular and scalable approach.”
“In addition to Alcatel-Lucent’s portfolio, which is represented by Motive Dynamic Operations (MDO), CloudBand Management Platform (CBMS) and Cloud Node, Nuage Networks, Virtual Evolved Packet Core (vEPC), Virtual IP Multimedia Subsystem (vIMS) our conversation illustrates Ecosystem examples involving third party partners, findings from Bell Labs Research and presents opportunities for following up with hands-on activities at the Cloud Innovation Center (CIC).”
00:00 – Hi, my name is Jose. We are going to discuss operations in the context of NFV, Network Functions Virtualization. We will do that for the purpose of delivering service agility because launching new applications in the marketplace should be as easy as getting them deployed with just one click.
00:30 – This is a real environment, this is not a proof of concept. These are products that are either available today or in production in 2015. Namely Motive Dynamic Operations (MDO), the OSS, Nuage Networks’ SDN (Software Defined Networking) framework, the CloudBand platform, which manages the lifecycle of the VNFs (Virtual Network Functions) as well as orchestrating the underlying cloud infrastructure. Last but not least, we will also discuss findings from Bell Labs’ research. To complete the environment that we are operating with today, you will see a fully virtualized RAN (Radio Access Network) as well as the mobile core with the vEPC (virtual Evolved Packet Core) and vIMS (virtual IP Multimedia Subsystem), all working together to deliver this VoLTE (Voice over Long Term Evolution) live video session.
01:20 – We are going to follow two basic principles in this demonstration. Principle number one: these are very sophisticated systems and we are bringing them together, therefore, there is no denial that we need to abstract out complexity to deliver simplicity, that way we can manage operations. Principle number two: no matter what we do in the background operationally speaking, the user experience, the video in this case, should continue to play completely unscratched. At the end of this demonstration we will review these two principles to check how we did.
01:50 – Deploying any application should be as easy as… and here is the virtualization catalog that we use in our labs at the Cloud Innovation Center, it should be as easy as selecting what I need and launching the application to the NFV Operations Center. The heavy lifting is actually performed by CloudBand, the MANO (Management and Orchestration) platform. It understands the application requirements, the lifecycle, and will make sure that things talk to the right components to spin up virtual machines and onboard the service.
02:20 – Moreover, now we need for traffic to flow through this new application, this new service. I am now talking to Nuage Network’s SDN (Software Defined Networking) framework to get that going in a split second. So, I am now working on SFC (Service Function Chaining). And there you are.
02:45 – Now, let’s continue to test more things in the marketplace in real time. I am now delivering yet another application: a content filtering service. Maybe I should also deploy a WebRTC (Web Real Time Communications) server. And here it is. By the way, all the virtual machines in green color are carrying load this minute, the virtual machines shown in blue are on standby. These other are mated pairs for reliability so that we can work in HA, this is a High Availability environment. Moreover, virtual machines laid horizontally are services and products from third party partners also onboarded on the CloudBand platform.
03:25 – As you see, we need to do some more service chaining, and we are now working again with Nuage Networks’s SDN. I am going to do the chaining for this one application. Note that this is fully programmable, everything is fully automated.
03:40 – Let’s discuss what happens when a network operator becomes victim of success. That would be a situation where this video service becomes very popular because it works well. There is [unplanned] pent up demand with more subscribers using the service. Therefore traffic grows. Let’s simulate that kind of situation. These are load generators which I am going to work with to conduct a stress test. As you can see, traffic is ramping up already. The question now is, will we have enough capacity available to meet new demand? Things are not looking that good… but as we detect this trend thanks to Bell Labs analytics, the platform starts spinning up new virtual machines and onboarding necessary services so that we can get some relief. [As a result] now we are working with new subscribers without a glitch.
04:40 – The opposite is also true. Let’s say that there is no longer that much demand for this one service. There aren’t so many subscribers. Traffic is no longer flowing through our system at the same scale. Let’s simulate that. Traffic is going down this minute. The very same way we were scaling and creating more capacity before, we are now going to take down all of those added systems so that we can make the underlying resources for the next batch of successful applications to utilize. As you see, the ones in red are continuously being monitored so that we can clean up and, once again, gracefully terminate those services.
05:20 – We can do all of these things because we are working in a data center environment. These are CloudBand’s Cloud Nodes. This is COTS (Commercial Off The Shelf) infrastructure, these are not dedicated servers. Therefore, we can continue to spin up new virtual machines and onboard applications. We can continue to reuse these resources [compute, memory, storage, networking] at very high utilization levels over and over.
05:50 – If you are successful, in addition to experiencing demand and coping with capacity… at some point you will be facing updates, upgrades… maintenance events. Let’s simulate that too. This is a RAS (Reliability, Availability, Serviceability) test. We could start by opening a maintenance window, the more applications we have, the harder it is to find those at the right time without disrupting the video experience, the user experience. We could trigger a network failure instead, some issue that impacts QoS (Quality of Service) or, perhaps, a cloud failure that could involve a corrupted virtual machine. Let’s cause that last one.
06:30 – The machine that has been compromised has been flagged [in red]. The load has already been placed on the mated pair. There was [service continuity] no disruption of any kind as far as the user experience is concerned. Be have been able to do that thanks to smart placement combined with a distributed architecture. The data center that you see on the left, DC number one, is based at a central location where we have consolidated assets for the purpose of delivering cost efficiencies. [On the right] data center number two is at a distributed location closer to the network’s edge for performance sake instead.
07:10 – Everything that we have been discussing up to this point is available from Alcatel-Lucent’s portfolio in 2015. In the next few minutes, I will share with you research findings from Bell Labs projects. These relate to analytics for smart load placement and autonomics, that is machine learning for NFV.
07:30 – You were able to notice that as I moved the load to the other data center, the service was not disrupted but I lost HA (High Availability) [by operating in a simplex environment instead]. Now I need to look for the best placement for the new mated pair that will become my new backup should something happen to the virtual machine that’s carrying the load right now. The question is: where should I do that?
07:55 – Bell Labs’ recommendations engine is checking cloud requirements and conditions, it couples that with equivalent network requirements and conditions, it understands what any given application needs in the lifecycle. It reads the contract because it does not make sense for me to deploy something in a more expensive environment, which would defeat my business case and cloud economics. By the same token, I cannot deploy the load in an inferior environment, which would not meet the SLA (Service Level Agreement). Additional policies: these could be engineering events or any other kind of rules. This could be weather conditions because I wouldn’t like to move the load to a data center that is going to be compromised by terrible weather for that matter.
08:45 – If I like this recommendation which prompts me to move the load from “cloud one” to the “Barcelona data center” I could just click “accept” and move forward. What if there was a better option? I am going to ask the recommendations engine to present another option. In this other case it says that I should be moving the load to a different data center closer to my next destination, so that the service is provided closer to my location.
09:10 – In any case, at any given point of time, I should be able to do RCA (Root Cause Analysis). For that purpose we get to display fine grained, correlated analytics. We built a dynamic dashboard that we can always check to asses the current situation and do troubleshooting accordingly. The various metrics come, and are fed, by the different solutions that you see represented in the smaller screens on each side of the NFV Ops Center. If this was a false alarm I would then click on “stand down” and nothing would executed. The reality is that false alarms can happen. If I need to buy more time to get more data, to do further analysis, I would then click on “standby” instead.
10:10 – There is research on autonomics as I was sharing before. This means that the recommendations engine, time after time, learns from these behaviors and it becomes more predictive and, eventually, it gives you even better custom recommendations further optimizing system performance as well as any other kind of efficiencies.
10:30 – I am going to accept the recommendation that works best for me, which is the first one. In the background, what you would see are the very same things that we saw early on: virtual machines being spun up, applications being onboarded, networks being created… with all of that happening literally in just minutes. This is very different from PMO (Present Mode of Operations) where it takes filling out forms, scheduling meetings, talking to a lot of people. Then it takes maybe hours, if not days, perhaps, weeks before we get anything done. Here things are programmable, fully automated, and things happen in real time as you can see by means of this demonstration.
11:10 – We have also brought to you a single pane of glass to abstract out complexity. When drilling down, it pays to go to the UI (User Interfaces) of the specific solutions. This [single pane of glass] is not an Alcatel-Lucent product, this is just illustrating a requirement from many of our customers who are asking for the APIs (Application Programming Interfaces) from this various solutions to build their own dashboards and their own screens.
11:30 – Well, this completes the demonstration. As I was saying early on: a 100% real, this is no PoC (Proof of Concept), all of the products with the exception of Bell Labs research. which we just discussed, are currently available or in production in 2015, this year. Thank you.