Control Plane Virtualization - Juniper QFX5100 Series (2015)

Juniper QFX5100 Series (2015)

Chapter 2. Control Plane Virtualization

The key factors driving the Juniper QFX5100 are the advent of virtualization and cloud computing; however, there are many facets to virtualization. One is decoupling the service from the physical hardware. When this is combined with orchestration and automation, the service is now said to be agile: it has the ability to be quickly provisioned, even within seconds. Another aspect is scale in the number of instances of the service. Because it becomes so easy to provision a service, the total number of instances quickly increases.

Compute virtualization is such a simple concept, yet it yields massive benefit to both the end user and operator. The next logical step is to apply the benefits of compute virtualization to the control plane of the network. After all, the control board is nothing but an x86 processor, memory, and storage.

The immediate benefit of virtualizing the control board might not be so obvious. Generally, operators like to toy around and create a virtual machine (VM) running Linux so that they’re able to execute operational scripts and troubleshoot. However, there is a much more exciting use case to virtualization of the control board. Traditionally, only networking equipment that was chassis-based was able to support two routing engines. The benefit of two routing engines is that it increases the high availability of the chassis and allows the operator to upgrade the control plane software in real time without traffic loss. This feature is commonly referred to as In-Service Software Upgrade (ISSU). One of the key requirements of ISSU is to have two routing engines that are synchronized using the Nonstop Routing (NSR), Nonstop Bridging (NSB), and Graceful Routing Engine Switchover (GRES) protocols. Fixed networking equipment such as top-of-rack (ToR) switches generally have only a single routing engine and do not support ISSU due to the lack of a second routing engine. Taking advantage of virtualization allows a ToR switch to have two virtualized routing engines that make possible features such as ISSU. The Juniper QFX5100 family takes virtualization to heart and uses the Linux kernel-based virtual machine (KVM) as the host operating system and places Junos, the network operating system, inside of a VM. When an operator wants to perform a real-time software upgrade, the Juniper QFX5100 switch will provision a second routing engine, synchronize the data, and perform the ISSU without dropping traffic.

Another great benefit of compute virtualization inside of a switch is that you can create user-defined VMs and run your own applications and programs on the switch. Use cases include Network Functional Virtualization (NFV), network management, and statistical reporting.

Architecture

Recall that the Juniper QFX5100 series is split into two major components (see Figure 2-1): the control board and switch board. The control board is the foundation for the control plane, whereas the switch board is the foundation for the data plane.

QFX5100 architecture

Figure 2-1. QFX5100 architecture

Focusing on the control board components, the blocks shaded in gray represent all of the roles in that architecture that are responsible for virtualizing the control plane. The control board is running commodity hardware that’s easily compatible with common hypervisors. The processor is an Intel 1.5 Ghz dual-core Sandy Bridge CPU, and there is 8 GB of memory and a 32 GB solid-state disk (SSD). The Juniper QFX5100 boots directly into CentOS Linux instead of Junos; this provides the platform with several advantages. The first advantage is the ability to virtualize the underlying hardware by using Linux KVM and QEMU; the second advantage is the ability to host operational daemons and Application Programming Interfaces (APIs) directly on the host operating system.

To make the management of the hypervisor easier, the virtualization library (libvir) is used to provision and manage the VMs. The libvir provides a normalized management framework across a set of hypervisors. The ability to use a common framework to control a hypervisor provides more flexibility in the future if any of the underlying components happen to change.

Host Operating System

As mentioned in the previous section, the Juniper QFX5100 boots directly into Linux, specifically CentOS. This provides the operating system and virtualization foundation for Junos and all other network-related functionality.

Let’s log in to the host operating system and do some exploring:

dhanks@qfx5100> request app-engine host-shell

Last login: Sun Nov 17 14:30:47 from 192.168.1.2

--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC

Now, let’s take a peek at the PCI bus and see what’s installed on the host operating system:

-sh-4.1# lspci

00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller

(rev 09)

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI

Express Root Port (rev 09)

00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI

Express Root Port (rev 09)

00:01.2 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI

Express Root Port (rev 09)

00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI

Express Root Port (rev 09)

00:1c.0 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #1 (rev 08)

00:1c.1 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #2 (rev 08)

00:1c.2 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #3 (rev 08)

00:1c.3 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #4 (rev 08)

00:1d.0 USB controller: Intel Corporation DH89xxCC USB2 Enhanced Host Controller #1 (rev 08)

00:1f.0 ISA bridge: Intel Corporation DH89xxCC LPC Controller (rev 08)

00:1f.2 SATA controller: Intel Corporation DH89xxCC 4 Port SATA AHCI Controller (rev 08)

00:1f.3 SMBus: Intel Corporation DH89xxCC SMBus Controller (rev 08)

00:1f.7 System peripheral: Intel Corporation DH89xxCC Watchdog Timer (rev 08)

01:00.0 Co-processor: Intel Corporation Device 0434 (rev 21)

01:00.1 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)

01:00.2 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)

01:00.3 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)

01:00.4 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)

07:00.0 Unassigned class [ff00]: Juniper Networks Device 0062 (rev 01)

08:00.0 Unassigned class [ff00]: Juniper Networks Device 0063 (rev 01)

09:00.0 Ethernet controller: Broadcom Corporation Device b854 (rev 02)

Pretty vanilla so far. Four CPUs, a USB port, a SATA controller, and some network interface controllers (NICs). But, the two Juniper Networks devices are interesting; what are they? These are the FPGA controllers that are responsible for the chassis fan, sensors, and other environmental functions.

The final device is the Broadcom 56850 chipset. The way a network operating system controls the Packet Forwarding Engine (PFE) is simply through a PCI interface by using a Software Development Kit (SDK).

Let’s take a closer look at the CPU:

-sh-4.1# cat /proc/cpuinfo

processor : 0

vendor_id : GenuineIntel

cpu family : 6

model : 42

model name : Intel(R) Pentium(R) CPU @ 1.50GHz

stepping : 7

cpu MHz : 1500.069

cache size : 3072 KB

physical id : 0

siblings : 4

core id : 0

cpu cores : 2

apicid : 0

initial apicid : 0

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat

pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm

constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni

pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1

sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow

vnmi flexpriority ept vpid

bogomips : 3000.13

clflush size : 64

cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual

power management:

processor : 1

vendor_id : GenuineIntel

cpu family : 6

model : 42

model name : Intel(R) Pentium(R) CPU @ 1.50GHz

stepping : 7

cpu MHz : 1500.069

cache size : 3072 KB

physical id : 0

siblings : 4

core id : 0

cpu cores : 2

apicid : 1

initial apicid : 1

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat

pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm

constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni

pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1

sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow

vnmi flexpriority ept vpid

bogomips : 3000.13

clflush size : 64

cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual

power management:

processor : 2

vendor_id : GenuineIntel

cpu family : 6

model : 42

model name : Intel(R) Pentium(R) CPU @ 1.50GHz

stepping : 7

cpu MHz : 1500.069

cache size : 3072 KB

physical id : 0

siblings : 4

core id : 1

cpu cores : 2

apicid : 2

initial apicid : 2

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat

pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm

constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni

pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1

sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow

vnmi flexpriority ept vpid

bogomips : 3000.13

clflush size : 64

cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual

power management:

processor : 3

vendor_id : GenuineIntel

cpu family : 6

model : 42

model name : Intel(R) Pentium(R) CPU @ 1.50GHz

stepping : 7

cpu MHz : 1500.069

cache size : 3072 KB

physical id : 0

siblings : 4

core id : 1

cpu cores : 2

apicid : 3

initial apicid : 3

fpu : yes

fpu_exception : yes

cpuid level : 13

wp : yes

flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat

pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm

constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni

pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1

sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow

vnmi flexpriority ept vpid

bogomips : 3000.13

clflush size : 64

cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual

power management:

The CPU is a server-class Intel Xeon E3-1200 processor; it’s a single socket with four cores. There’s plenty of power to operate multiple VMs and the network operating system.

Now, let’s move on to the memory:

-sh-4.1# free

total used free shared buffers cached

Mem: 7529184 3135536 4393648 0 158820 746800

-/+ buffers/cache: 2229916 5299268

Swap:

After some of the memory has been reserved by other hardware and the kernel, you can see that we have about 7.3 GB total.

Next, let’s see how many disks there are and how they’re partitioned:

-sh-4.1# fdisk -l

Disk /dev/sdb: 16.0 GB, 16013852672 bytes

255 heads, 63 sectors/track, 1946 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000dea11

Device Boot Start End Blocks Id System

/dev/sdb1 * 1 125 1000000 83 Linux

Partition 1 does not end on cylinder boundary.

/dev/sdb2 125 1857 13914062+ 83 Linux

Disk /dev/sda: 16.0 GB, 16013852672 bytes

255 heads, 63 sectors/track, 1946 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000d8b25

Device Boot Start End Blocks Id System

/dev/sda1 * 1 125 1000000 83 Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2 125 1857 13914062+ 83 Linux

Disk /dev/mapper/vg0_vjunos-lv_junos_recovery: 4294 MB, 4294967296 bytes

255 heads, 63 sectors/track, 522 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/mapper/vg0_vjunos-lv_var: 11.3 GB, 11307843584 bytes

255 heads, 63 sectors/track, 1374 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/mapper/vg0_vjunos-lv_junos: 12.9 GB, 12884901888 bytes

255 heads, 63 sectors/track, 1566 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

The host system has two SSD storage devices, each with 16 GB of capacity. From the partition layout illustrated in Figure 2-2, you can see that we’re running the Linux Volume Manager (LVM).

Linux LVM and storage design

Figure 2-2. Linux LVM and storage design

There are two 16 GB SSDs, which are part of the Linux LVM. The primary volume group is vg0_vjunos. This volume group has three volumes that are used by Junos:

§ lv_junos_recovery

§ lv_var

§ lv_junos

Linux KVM

When the Juniper QFX5100 boots up, the host operating system is Linux. All of the control plane operations happen within the network operating system, Junos. The Juniper QFX5100 takes advantage of compute virtualization in the host operating system by using Linux KVM. A VM is created specifically for Junos. Given that KVM can create multiple VMs, the Juniper QFX5100 series has the ability to perform ISSU and support third-party VMs that can host additional services such as network management and monitoring.

virsh

The Juniper QFX5100 uses the libvir library as well as the libsh management user interface to interact with Linux KVM. If you’re familiar with libvir, walking around the virtualization capabilities of the Juniper QFX5100 will come as second nature. If you aren’t familiar with libvir, let’s use virsh to explore and see what’s happening under the hood.

The first thing we need to do is drop into the host shell from the Junos CLI:

dhanks@qfx5100> request app-engine host-shell

Last login: Sun Nov 17 14:30:47 from 192.168.1.2

--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC

Now, let’s take a look at the VMs installed in the Linux KVM:

-sh-4.1# virsh list --all

Id Name State

----------------------------------------------------

1 vjunos0 running

By default there’s a single VM running the Junos networking operating system. The VM’s name is vjunos0 with an ID of 1, and we can see that the state is running.

Hmm. Are you curious as to what version of the libvir library and QEMU the Juniper QFX5100 is using? Let’s find out:

-sh-4.1# virsh version

Compiled against library: libvir 0.9.10

Using library: libvir 0.9.10

Using API: QEMU 0.9.10

Running hypervisor: QEMU 0.12.1

At this point, let’s take a look at the overall host memory and CPU statistics:

-sh-4.1# virsh nodememstats

total : 7269088 kB

free : 4147596 kB

buffers: 264772 kB

cached : 761476 kB

-sh-4.1#

-sh-4.1# virsh nodecpustats

user: 305995340000000

system: 145678380000000

idle: 11460475070000000

iowait: 1075190000000

sdf

Now that we’re familiar with what the host system is capable of, software versions, and of course how many VMs are configured, let’s examine the Junos VM:

-sh-4.1# virsh dominfo vjunos0

Id: 1

Name: vjunos0

UUID: 100e7ead-ae00-0140-0000-564a554e4f53

OS Type: hvm

State: running

CPU(s): 1

CPU time: 445895.2s

Max memory: 2000896 kB

Used memory: 2000896 kB

Persistent: no

Autostart: disable

Managed save: no

Each VM has a unique identifier that can be used to refer to the VM. One of the more interesting attributes is the OS Type, which is set to hvm; this stands for Hardware Virtual Machine. Because Junos is based on FreeBSD and heavily modified to support network control plane functions, it’s difficult to say that it’s pure FreeBSD. Instead, the alternative is to use a vendor-neutral OS Type of hvm, which basically means that it’s an x86-based operating system.

Let’s focus on the memory and network settings for vjunos0:

-sh-4.1# virsh dommemstat vjunos0

rss 1895128

-sh-4.1# virsh domiflist vjunos0

Interface Type Source Model MAC

-------------------------------------------------------

vnet0 bridge virbr0 e1000 52:54:00:bf:d1:6c

vnet1 bridge ctrlbr0 e1000 52:54:00:e7:b6:cd

In the 13.2X53D20 version of Junos, there are two bridges installed for the VMs within KVM. The vnet0/virbr0 interface is used across all of the VMs to communicate with the outside world through their management interfaces. The other interface, vnet1/ctrlbr0, is used exclusively for ISSU. During an ISSU, there are two copies of Junos running; all control plane communication between the VMs are performed over this special bridge so that any other control plane functions such as Secure Shell (SSH), Open Shortest Path First (OSPF), and Border Gateway Protocol (BGP) aren’t impacted while synchronizing the kernel state between the master and backup Junos VMs.

Another interesting place to look for more information is in the /proc filesystem. We can take a look at the process ID (PID) of vjunos0 and examine the task status:

-sh-4.1# cat /var/run/libvirt/qemu/vjunos0.pid

2972

-sh-4.1# cat /proc/2972/task/*/status

Name: qemu-kvm

State: S (sleeping)

Tgid: 2972

Pid: 2972

PPid: 1

TracerPid: 0

Uid: 0 0 0 0

Gid: 0 0 0 0

Utrace: 0

FDSize: 256

Groups:

VmPeak: 2475100 kB

VmSize: 2276920 kB

VmLck: 0 kB

VmHWM: 1895132 kB

VmRSS: 1895128 kB

VmData: 2139812 kB

VmStk: 88 kB

VmExe: 2532 kB

VmLib: 16144 kB

VmPTE: 4284 kB

VmSwap: 0 kB

Threads: 2

SigQ: 1/55666

SigPnd: 0000000000000000

ShdPnd: 0000000000000000

SigBlk: 0000000010002840

SigIgn: 0000000000001000

SigCgt: 0000002180006043

CapInh: 0000000000000000

CapPrm: fffffffc00000000

CapEff: fffffffc00000000

CapBnd: fffffffc00000000

Cpus_allowed: 04

Cpus_allowed_list: 2

Mems_allowed:

00000000,00000000,00000000,00000000,00000000,00000000,00000000,

00000000,00000000,00000000,00000000,00000000,00000000,00000000,

00000000,00000001

Mems_allowed_list: 0

voluntary_ctxt_switches: 5825006750

nonvoluntary_ctxt_switches: 46300

Name: qemu-kvm

State: S (sleeping)

Tgid: 2972

Pid: 2975

PPid: 1

TracerPid: 0

Uid: 0 0 0 0

Gid: 0 0 0 0

Utrace: 0

FDSize: 256

Groups:

VmPeak: 2475100 kB

VmSize: 2276920 kB

VmLck: 0 kB

VmHWM: 1895132 kB

VmRSS: 1895128 kB

VmData: 2139812 kB

VmStk: 88 kB

VmExe: 2532 kB

VmLib: 16144 kB

VmPTE: 4284 kB

VmSwap: 0 kB

Threads: 2

SigQ: 1/55666

SigPnd: 0000000000000000

ShdPnd: 0000000000000000

SigBlk: ffffffde7ffbfebf

SigIgn: 0000000000001000

SigCgt: 0000002180006043

CapInh: 0000000000000000

CapPrm: fffffffc00000000

CapEff: fffffffc00000000

CapBnd: fffffffc00000000

Cpus_allowed: 04

Cpus_allowed_list: 2

Mems_allowed:

00000000,00000000,00000000,00000000,00000000,00000000,00000000,

00000000,00000000,00000000,00000000,00000000,00000000,00000000,

00000000,00000001

Mems_allowed_list: 0

voluntary_ctxt_switches: 5526311517

nonvoluntary_ctxt_switches: 586609665

One of the more interesting things to notice is the Cpus_allowed_list, which is set to a value of 2. By default, Juniper assigns the third CPU directly to the vjunos0 VM; this guarantees that other tasks outside of the scope of the control plane don’t negatively impact Junos. The value is set to 2 because the first CPU has a value of 0. We can verify this again with another virsh command:

-sh-4.1# virsh vcpuinfo vjunos0

VCPU: 0

CPU: 2

State: running

CPU time: 311544.1s

CPU Affinity: --y-

We can see that the CPU affinity is set to y on the third CPU, which verifies what we see in the /proc file system.

App Engine

If you’re interested in learning more about the VMs but don’t feel like dropping to the host shell and using virsh commands, there is an alternative called the Junos App Engine, which is accessible within the Junos CLI.

To view the App Engine settings, use the show app-engine command. There are several different views that are available, as listed in Table 2-1.

View

Description

ARP

View all of the ARP entries of the VMs connected into all the bridge domains

Bridge

View all of the configured Linux bridge tables

Information

Get information about the compute cluster, such as model, kernel version, and management IP addresses

Netstat

Just a simple wrapper around the Linux netstat –rn command

Resource usage

Show the CPU, memory, disk, and storage usage statistics in an easy-to-read format

Table 2-1. Junos App Engine views

Let’s explore some of the most common Junos App Engine commands and examine the output:

dhanks@QFX5100> show app-engine arp

Compute cluster: default-cluster

Compute node: default-node

Arp

===

Address HWtype HWaddress Flags Mask Iface

192.168.1.2 ether 10:0e:7e:ad:af:30 C virbr0

This is just a simple summary show command that aggregates the management IP, MAC, and the bridge table to which it’s bound.

Let’s take a look at the bridge tables:

dhanks@QFX5100> show app-engine bridge

Compute cluster: default-cluster

Compute node: default-node

Bridge Table

============

bridge name bridge id STP enabled interfaces

ctrlbr0 8000.fe5400e7b6cd no vnet1

virbr0 8000.100e7eadae03 yes virbr0-nic

vnet0

Just another nice wrapper for the Linux brctl command. Recall that vnet0 is for the regular control plane side of Junos, whereas vnet1 is reserved for inter-routing engine traffic during an ISSU:

dhanks@QFX5100> show app-engine resource-usage

Compute cluster: default-cluster

Compute node: default-node

CPU Usage

=========

15:48:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

15:48:46 all 0.30 0.00 1.22 0.01 0.00 0.00 0.00 2.27 96.20

15:48:46 0 0.08 0.00 0.08 0.03 0.00 0.00 0.00 0.00 99.81

15:48:46 1 0.08 0.00 0.11 0.00 0.00 0.00 0.00 0.00 99.81

15:48:46 2 1.03 0.00 4.75 0.01 0.00 0.00 0.00 9.18 85.03

15:48:46 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00

Memory Usage

============

total used free shared buffers cached

Mem: 7098 3047 4051 0 258 743

Swap: 0 0 0

Disk Usage

==========

Filesystem Size Used Avail Use% Mounted on

tmpfs 3.5G 4.0K 3.5G 1% /dev/shm

/dev/mapper/vg0_vjunos-lv_var

11G 198M 9.7G 2% /var

/dev/mapper/vg0_vjun

os-lv_junos

12G 2.2G 9.1G 20% /junos

/dev/mapper/vg0_vjunos-lv_junos_recovery

4.0G 976M 2.8G 26% /recovery

/dev/sda1 962M 312M 602M 35% /boot

Storage Information

===================

VG #PV #LV #SN Attr VSize VFree

vg0_vjunos 2 3 0 wz--n- 26.53g 0

show app-engine resource-usage is a nice aggregated command showing the utilization of the CPU, memory, disk, and storage information; it’s a very easy way to get a bird’s-eye view of the health of the App Engine.

ISSU

Since the original M Series routers, one of the great Junos features is its ability to support ISSU. With ISSU, the network operating system can upgrade the firmware of the router without having to shut it down and impact production traffic. One of the key requirements for ISSU is that there are two routing engines. During an ISSU, the two engines need to synchronize kernel and control plane state with each other. The idea is that one routing engine is upgraded while the other routing engine is handling the control plane.

Although Juniper QFX5100 switches don’t physically have two routing engines, they are able to carry out the same functional requirements thanks to the power of virtualization. The Juniper QFX5100 series is able to create a second VM running Junos during an ISSU to meet all of the synchronization requirements, as is illustrated in Figure 2-3.

Each Junos VM has three management interfaces. Two of those interfaces, em0 and em1, are used for management and map to the external interfaces C0 and C1, respectively. The third management interface, em2, is used exclusively for communication between the two Junos VMs. For example, control plane protocols such as NSR, NSB, and GRES are required in order for a successful ISSU to complete; these protocols would communicate across the isolated em2 interface as well as an isolated ctrlbr0 bridge table in the Linux host.

The QFX5100 Linux KVM and management architecture

Figure 2-3. The QFX5100 Linux KVM and management architecture

The backup Junos VM is only created and running during an ISSU. At a high level, Junos goes through the following steps during an ISSU:

§ The backup Junos VM is created and started.

§ The backup Junos VM is upgraded to the software version specified in the ISSU command.

§ The PFE goes into an ISSU-prepared state in which data is copied from the PFE to RAM.

§ The PFE connects to the recently upgraded backup Junos VM, which now becomes the master routing engine.

§ The PFE performs a warm reboot.

§ The new master Junos VM installs the PFE state from RAM back into the PFE.

§ The other Junos VM is shut down.

§ Junos has been upgraded and the PFE has performed a warm reboot.

Let’s see an ISSU in action:

dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-

domestic-signed.tgz

warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!

ISSU: Validating Image

error: 'Non Stop Routing' not configured

error: aborting ISSU

error: ISSU Aborted!

ISSU: IDLE

Ah, bummer! What happened here? There are some requirements for the control plane that must be enabled before a successful ISSU can be achieved:

§ NSR

§ NSB

§ GRES

§ Commit Synchronization

Let’s configure these quickly and try an ISSU once again.

{master:0}[edit]

dhanks@QFX5100# set chassis redundancy graceful-switchover

{master:0}[edit]

dhanks@QFX5100# set protocols layer2-control nonstop-bridging

{master:0}[edit]

dhanks@QFX5100# set system commit synchronize

{master:0}[edit]

dhanks@QFX5100# commit and-quit

configuration check succeeds

commit complete

Exiting configuration mode

OK, now that all of the software features required for ISSU are configured and committed, let’s try the ISSU one more time:

dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-

domestic-signed.tgz

warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!

ISSU: Validating Image

ISSU: Preparing Backup RE

Prepare for ISSU

ISSU: Backup RE Prepare Done

Extracting jinstall-qfx-5-flex-13.2X51-D20.2-domestic ...

Install jinstall-qfx-5-flex-13.2X51-D20.2-domestic completed

Spawning the backup RE

Spawn backup RE, index 1 successful

GRES in progress

GRES done in 0 seconds

Waiting for backup RE switchover ready

GRES operational

Copying home directories

Copying home directories successful

Initiating Chassis In-Service-Upgrade

Chassis ISSU Started

ISSU: Preparing Daemons

ISSU: Daemons Ready for ISSU

ISSU: Starting Upgrade for FRUs

ISSU: Preparing for Switchover

ISSU: Ready for Switchover

Checking In-Service-Upgrade status

Item Status Reason

FPC 0 Online

Send ISSU done to chassisd on backup RE

Chassis ISSU Completed

ISSU: IDLE

Initiate em0 device handoff

pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway

pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway

pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway

pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway

em0: bus=0, device=3, func=0, Ethernet address 10:0e:7e:b2:2d:78

hub 1-1:1.0: over-current change on port 1

hub 1-1:1.0: over-current change on port 3

hub 1-1:1.0: over-current change on port 5

QFX5100 (ttyd0)

login:

Excellent! The ISSU has completed successfully and no traffic was impacted during the software upgrade of Junos.

One of the advantages of the Broadcom warm reboot feature is that no firmware is installed in the PFE. This effectively makes the ISSU problem a control plane–only problem, which is very easy to solve. When you need to synchronize both the PFE firmware and control plane firmware, there are more moving parts, and the problem is more difficult to solve. Juniper MX Series by Douglas Richard Hanks, Jr. and Harry Reynolds (O’Reilly) thoroughly explains all of the benefits and drawbacks of ISSU in such a platform that upgrades both the control plane firmware in addition to the PFE firmware. The end result is that a control plane–only ISSU is more stable and finishes much faster when compared to a platform such as the Juniper MX. However, the obvious drawback is that no new PFE features can be used as part of a control plane–only ISSU, which is where the Juniper MX would win.

Summary

This chapter walked you through the design of the control plane and how the Juniper QFX5100 is really just a server that thinks it’s a switch. The Juniper QFX5100 has a powerful Intel CPU, standard memory, and SSD hard drives. What was surprising is that the switch boots directly into Linux and uses KVM to virtualize Junos, which is the network operating system. Because Junos is running a VM, it enables the Juniper QFX5100 to support carrier-class features such as ISSU, NSR, and NSB.