Juniper QFX5100 Series (2015)
Chapter 2. Control Plane Virtualization
The key factors driving the Juniper QFX5100 are the advent of virtualization and cloud computing; however, there are many facets to virtualization. One is decoupling the service from the physical hardware. When this is combined with orchestration and automation, the service is now said to be agile: it has the ability to be quickly provisioned, even within seconds. Another aspect is scale in the number of instances of the service. Because it becomes so easy to provision a service, the total number of instances quickly increases.
Compute virtualization is such a simple concept, yet it yields massive benefit to both the end user and operator. The next logical step is to apply the benefits of compute virtualization to the control plane of the network. After all, the control board is nothing but an x86 processor, memory, and storage.
The immediate benefit of virtualizing the control board might not be so obvious. Generally, operators like to toy around and create a virtual machine (VM) running Linux so that they’re able to execute operational scripts and troubleshoot. However, there is a much more exciting use case to virtualization of the control board. Traditionally, only networking equipment that was chassis-based was able to support two routing engines. The benefit of two routing engines is that it increases the high availability of the chassis and allows the operator to upgrade the control plane software in real time without traffic loss. This feature is commonly referred to as In-Service Software Upgrade (ISSU). One of the key requirements of ISSU is to have two routing engines that are synchronized using the Nonstop Routing (NSR), Nonstop Bridging (NSB), and Graceful Routing Engine Switchover (GRES) protocols. Fixed networking equipment such as top-of-rack (ToR) switches generally have only a single routing engine and do not support ISSU due to the lack of a second routing engine. Taking advantage of virtualization allows a ToR switch to have two virtualized routing engines that make possible features such as ISSU. The Juniper QFX5100 family takes virtualization to heart and uses the Linux kernel-based virtual machine (KVM) as the host operating system and places Junos, the network operating system, inside of a VM. When an operator wants to perform a real-time software upgrade, the Juniper QFX5100 switch will provision a second routing engine, synchronize the data, and perform the ISSU without dropping traffic.
Another great benefit of compute virtualization inside of a switch is that you can create user-defined VMs and run your own applications and programs on the switch. Use cases include Network Functional Virtualization (NFV), network management, and statistical reporting.
Architecture
Recall that the Juniper QFX5100 series is split into two major components (see Figure 2-1): the control board and switch board. The control board is the foundation for the control plane, whereas the switch board is the foundation for the data plane.
Figure 2-1. QFX5100 architecture
Focusing on the control board components, the blocks shaded in gray represent all of the roles in that architecture that are responsible for virtualizing the control plane. The control board is running commodity hardware that’s easily compatible with common hypervisors. The processor is an Intel 1.5 Ghz dual-core Sandy Bridge CPU, and there is 8 GB of memory and a 32 GB solid-state disk (SSD). The Juniper QFX5100 boots directly into CentOS Linux instead of Junos; this provides the platform with several advantages. The first advantage is the ability to virtualize the underlying hardware by using Linux KVM and QEMU; the second advantage is the ability to host operational daemons and Application Programming Interfaces (APIs) directly on the host operating system.
To make the management of the hypervisor easier, the virtualization library (libvir) is used to provision and manage the VMs. The libvir provides a normalized management framework across a set of hypervisors. The ability to use a common framework to control a hypervisor provides more flexibility in the future if any of the underlying components happen to change.
Host Operating System
As mentioned in the previous section, the Juniper QFX5100 boots directly into Linux, specifically CentOS. This provides the operating system and virtualization foundation for Junos and all other network-related functionality.
Let’s log in to the host operating system and do some exploring:
dhanks@qfx5100> request app-engine host-shell
Last login: Sun Nov 17 14:30:47 from 192.168.1.2
--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC
Now, let’s take a peek at the PCI bus and see what’s installed on the host operating system:
-sh-4.1# lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller
(rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:06.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core Processor Family PCI
Express Root Port (rev 09)
00:1c.0 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #1 (rev 08)
00:1c.1 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #2 (rev 08)
00:1c.2 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #3 (rev 08)
00:1c.3 PCI bridge: Intel Corporation DH89xxCC PCI Express Root Port #4 (rev 08)
00:1d.0 USB controller: Intel Corporation DH89xxCC USB2 Enhanced Host Controller #1 (rev 08)
00:1f.0 ISA bridge: Intel Corporation DH89xxCC LPC Controller (rev 08)
00:1f.2 SATA controller: Intel Corporation DH89xxCC 4 Port SATA AHCI Controller (rev 08)
00:1f.3 SMBus: Intel Corporation DH89xxCC SMBus Controller (rev 08)
00:1f.7 System peripheral: Intel Corporation DH89xxCC Watchdog Timer (rev 08)
01:00.0 Co-processor: Intel Corporation Device 0434 (rev 21)
01:00.1 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.2 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.3 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
01:00.4 Ethernet controller: Intel Corporation DH8900CC Series Gigabit Network (rev 21)
07:00.0 Unassigned class [ff00]: Juniper Networks Device 0062 (rev 01)
08:00.0 Unassigned class [ff00]: Juniper Networks Device 0063 (rev 01)
09:00.0 Ethernet controller: Broadcom Corporation Device b854 (rev 02)
Pretty vanilla so far. Four CPUs, a USB port, a SATA controller, and some network interface controllers (NICs). But, the two Juniper Networks devices are interesting; what are they? These are the FPGA controllers that are responsible for the chassis fan, sensors, and other environmental functions.
The final device is the Broadcom 56850 chipset. The way a network operating system controls the Packet Forwarding Engine (PFE) is simply through a PCI interface by using a Software Development Kit (SDK).
Let’s take a closer look at the CPU:
-sh-4.1# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU @ 1.50GHz
stepping : 7
cpu MHz : 1500.069
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow
vnmi flexpriority ept vpid
bogomips : 3000.13
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU @ 1.50GHz
stepping : 7
cpu MHz : 1500.069
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow
vnmi flexpriority ept vpid
bogomips : 3000.13
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU @ 1.50GHz
stepping : 7
cpu MHz : 1500.069
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow
vnmi flexpriority ept vpid
bogomips : 3000.13
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Pentium(R) CPU @ 1.50GHz
stepping : 7
cpu MHz : 1500.069
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni
pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1
sse4_2 x2apic popcnt aes xsave avx lahf_lm arat epb xsaveopt pln pts dts tpr_shadow
vnmi flexpriority ept vpid
bogomips : 3000.13
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
The CPU is a server-class Intel Xeon E3-1200 processor; it’s a single socket with four cores. There’s plenty of power to operate multiple VMs and the network operating system.
Now, let’s move on to the memory:
-sh-4.1# free
total used free shared buffers cached
Mem: 7529184 3135536 4393648 0 158820 746800
-/+ buffers/cache: 2229916 5299268
Swap:
After some of the memory has been reserved by other hardware and the kernel, you can see that we have about 7.3 GB total.
Next, let’s see how many disks there are and how they’re partitioned:
-sh-4.1# fdisk -l
Disk /dev/sdb: 16.0 GB, 16013852672 bytes
255 heads, 63 sectors/track, 1946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000dea11
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 125 1000000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sdb2 125 1857 13914062+ 83 Linux
Disk /dev/sda: 16.0 GB, 16013852672 bytes
255 heads, 63 sectors/track, 1946 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d8b25
Device Boot Start End Blocks Id System
/dev/sda1 * 1 125 1000000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 125 1857 13914062+ 83 Linux
Disk /dev/mapper/vg0_vjunos-lv_junos_recovery: 4294 MB, 4294967296 bytes
255 heads, 63 sectors/track, 522 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/mapper/vg0_vjunos-lv_var: 11.3 GB, 11307843584 bytes
255 heads, 63 sectors/track, 1374 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/mapper/vg0_vjunos-lv_junos: 12.9 GB, 12884901888 bytes
255 heads, 63 sectors/track, 1566 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
The host system has two SSD storage devices, each with 16 GB of capacity. From the partition layout illustrated in Figure 2-2, you can see that we’re running the Linux Volume Manager (LVM).
Figure 2-2. Linux LVM and storage design
There are two 16 GB SSDs, which are part of the Linux LVM. The primary volume group is vg0_vjunos. This volume group has three volumes that are used by Junos:
§ lv_junos_recovery
§ lv_var
§ lv_junos
Linux KVM
When the Juniper QFX5100 boots up, the host operating system is Linux. All of the control plane operations happen within the network operating system, Junos. The Juniper QFX5100 takes advantage of compute virtualization in the host operating system by using Linux KVM. A VM is created specifically for Junos. Given that KVM can create multiple VMs, the Juniper QFX5100 series has the ability to perform ISSU and support third-party VMs that can host additional services such as network management and monitoring.
virsh
The Juniper QFX5100 uses the libvir library as well as the libsh management user interface to interact with Linux KVM. If you’re familiar with libvir, walking around the virtualization capabilities of the Juniper QFX5100 will come as second nature. If you aren’t familiar with libvir, let’s use virsh to explore and see what’s happening under the hood.
The first thing we need to do is drop into the host shell from the Junos CLI:
dhanks@qfx5100> request app-engine host-shell
Last login: Sun Nov 17 14:30:47 from 192.168.1.2
--- Host 13.2I20131114_1603_vsdk_build_30 built 2013-11-14 16:03:50 UTC
Now, let’s take a look at the VMs installed in the Linux KVM:
-sh-4.1# virsh list --all
Id Name State
----------------------------------------------------
1 vjunos0 running
By default there’s a single VM running the Junos networking operating system. The VM’s name is vjunos0 with an ID of 1, and we can see that the state is running.
Hmm. Are you curious as to what version of the libvir library and QEMU the Juniper QFX5100 is using? Let’s find out:
-sh-4.1# virsh version
Compiled against library: libvir 0.9.10
Using library: libvir 0.9.10
Using API: QEMU 0.9.10
Running hypervisor: QEMU 0.12.1
At this point, let’s take a look at the overall host memory and CPU statistics:
-sh-4.1# virsh nodememstats
total : 7269088 kB
free : 4147596 kB
buffers: 264772 kB
cached : 761476 kB
-sh-4.1#
-sh-4.1# virsh nodecpustats
user: 305995340000000
system: 145678380000000
idle: 11460475070000000
iowait: 1075190000000
sdf
Now that we’re familiar with what the host system is capable of, software versions, and of course how many VMs are configured, let’s examine the Junos VM:
-sh-4.1# virsh dominfo vjunos0
Id: 1
Name: vjunos0
UUID: 100e7ead-ae00-0140-0000-564a554e4f53
OS Type: hvm
State: running
CPU(s): 1
CPU time: 445895.2s
Max memory: 2000896 kB
Used memory: 2000896 kB
Persistent: no
Autostart: disable
Managed save: no
Each VM has a unique identifier that can be used to refer to the VM. One of the more interesting attributes is the OS Type, which is set to hvm; this stands for Hardware Virtual Machine. Because Junos is based on FreeBSD and heavily modified to support network control plane functions, it’s difficult to say that it’s pure FreeBSD. Instead, the alternative is to use a vendor-neutral OS Type of hvm, which basically means that it’s an x86-based operating system.
Let’s focus on the memory and network settings for vjunos0:
-sh-4.1# virsh dommemstat vjunos0
rss 1895128
-sh-4.1# virsh domiflist vjunos0
Interface Type Source Model MAC
-------------------------------------------------------
vnet0 bridge virbr0 e1000 52:54:00:bf:d1:6c
vnet1 bridge ctrlbr0 e1000 52:54:00:e7:b6:cd
In the 13.2X53D20 version of Junos, there are two bridges installed for the VMs within KVM. The vnet0/virbr0 interface is used across all of the VMs to communicate with the outside world through their management interfaces. The other interface, vnet1/ctrlbr0, is used exclusively for ISSU. During an ISSU, there are two copies of Junos running; all control plane communication between the VMs are performed over this special bridge so that any other control plane functions such as Secure Shell (SSH), Open Shortest Path First (OSPF), and Border Gateway Protocol (BGP) aren’t impacted while synchronizing the kernel state between the master and backup Junos VMs.
Another interesting place to look for more information is in the /proc filesystem. We can take a look at the process ID (PID) of vjunos0 and examine the task status:
-sh-4.1# cat /var/run/libvirt/qemu/vjunos0.pid
2972
-sh-4.1# cat /proc/2972/task/*/status
Name: qemu-kvm
State: S (sleeping)
Tgid: 2972
Pid: 2972
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
Utrace: 0
FDSize: 256
Groups:
VmPeak: 2475100 kB
VmSize: 2276920 kB
VmLck: 0 kB
VmHWM: 1895132 kB
VmRSS: 1895128 kB
VmData: 2139812 kB
VmStk: 88 kB
VmExe: 2532 kB
VmLib: 16144 kB
VmPTE: 4284 kB
VmSwap: 0 kB
Threads: 2
SigQ: 1/55666
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000010002840
SigIgn: 0000000000001000
SigCgt: 0000002180006043
CapInh: 0000000000000000
CapPrm: fffffffc00000000
CapEff: fffffffc00000000
CapBnd: fffffffc00000000
Cpus_allowed: 04
Cpus_allowed_list: 2
Mems_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 5825006750
nonvoluntary_ctxt_switches: 46300
Name: qemu-kvm
State: S (sleeping)
Tgid: 2972
Pid: 2975
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
Utrace: 0
FDSize: 256
Groups:
VmPeak: 2475100 kB
VmSize: 2276920 kB
VmLck: 0 kB
VmHWM: 1895132 kB
VmRSS: 1895128 kB
VmData: 2139812 kB
VmStk: 88 kB
VmExe: 2532 kB
VmLib: 16144 kB
VmPTE: 4284 kB
VmSwap: 0 kB
Threads: 2
SigQ: 1/55666
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: ffffffde7ffbfebf
SigIgn: 0000000000001000
SigCgt: 0000002180006043
CapInh: 0000000000000000
CapPrm: fffffffc00000000
CapEff: fffffffc00000000
CapBnd: fffffffc00000000
Cpus_allowed: 04
Cpus_allowed_list: 2
Mems_allowed:
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 5526311517
nonvoluntary_ctxt_switches: 586609665
One of the more interesting things to notice is the Cpus_allowed_list, which is set to a value of 2. By default, Juniper assigns the third CPU directly to the vjunos0 VM; this guarantees that other tasks outside of the scope of the control plane don’t negatively impact Junos. The value is set to 2 because the first CPU has a value of 0. We can verify this again with another virsh command:
-sh-4.1# virsh vcpuinfo vjunos0
VCPU: 0
CPU: 2
State: running
CPU time: 311544.1s
CPU Affinity: --y-
We can see that the CPU affinity is set to y on the third CPU, which verifies what we see in the /proc file system.
App Engine
If you’re interested in learning more about the VMs but don’t feel like dropping to the host shell and using virsh commands, there is an alternative called the Junos App Engine, which is accessible within the Junos CLI.
To view the App Engine settings, use the show app-engine command. There are several different views that are available, as listed in Table 2-1.
View |
Description |
ARP |
View all of the ARP entries of the VMs connected into all the bridge domains |
Bridge |
View all of the configured Linux bridge tables |
Information |
Get information about the compute cluster, such as model, kernel version, and management IP addresses |
Netstat |
Just a simple wrapper around the Linux netstat –rn command |
Resource usage |
Show the CPU, memory, disk, and storage usage statistics in an easy-to-read format |
Table 2-1. Junos App Engine views |
Let’s explore some of the most common Junos App Engine commands and examine the output:
dhanks@QFX5100> show app-engine arp
Compute cluster: default-cluster
Compute node: default-node
Arp
===
Address HWtype HWaddress Flags Mask Iface
192.168.1.2 ether 10:0e:7e:ad:af:30 C virbr0
This is just a simple summary show command that aggregates the management IP, MAC, and the bridge table to which it’s bound.
Let’s take a look at the bridge tables:
dhanks@QFX5100> show app-engine bridge
Compute cluster: default-cluster
Compute node: default-node
Bridge Table
============
bridge name bridge id STP enabled interfaces
ctrlbr0 8000.fe5400e7b6cd no vnet1
virbr0 8000.100e7eadae03 yes virbr0-nic
vnet0
Just another nice wrapper for the Linux brctl command. Recall that vnet0 is for the regular control plane side of Junos, whereas vnet1 is reserved for inter-routing engine traffic during an ISSU:
dhanks@QFX5100> show app-engine resource-usage
Compute cluster: default-cluster
Compute node: default-node
CPU Usage
=========
15:48:46 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
15:48:46 all 0.30 0.00 1.22 0.01 0.00 0.00 0.00 2.27 96.20
15:48:46 0 0.08 0.00 0.08 0.03 0.00 0.00 0.00 0.00 99.81
15:48:46 1 0.08 0.00 0.11 0.00 0.00 0.00 0.00 0.00 99.81
15:48:46 2 1.03 0.00 4.75 0.01 0.00 0.00 0.00 9.18 85.03
15:48:46 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Memory Usage
============
total used free shared buffers cached
Mem: 7098 3047 4051 0 258 743
Swap: 0 0 0
Disk Usage
==========
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.5G 4.0K 3.5G 1% /dev/shm
/dev/mapper/vg0_vjunos-lv_var
11G 198M 9.7G 2% /var
/dev/mapper/vg0_vjun
os-lv_junos
12G 2.2G 9.1G 20% /junos
/dev/mapper/vg0_vjunos-lv_junos_recovery
4.0G 976M 2.8G 26% /recovery
/dev/sda1 962M 312M 602M 35% /boot
Storage Information
===================
VG #PV #LV #SN Attr VSize VFree
vg0_vjunos 2 3 0 wz--n- 26.53g 0
show app-engine resource-usage is a nice aggregated command showing the utilization of the CPU, memory, disk, and storage information; it’s a very easy way to get a bird’s-eye view of the health of the App Engine.
ISSU
Since the original M Series routers, one of the great Junos features is its ability to support ISSU. With ISSU, the network operating system can upgrade the firmware of the router without having to shut it down and impact production traffic. One of the key requirements for ISSU is that there are two routing engines. During an ISSU, the two engines need to synchronize kernel and control plane state with each other. The idea is that one routing engine is upgraded while the other routing engine is handling the control plane.
Although Juniper QFX5100 switches don’t physically have two routing engines, they are able to carry out the same functional requirements thanks to the power of virtualization. The Juniper QFX5100 series is able to create a second VM running Junos during an ISSU to meet all of the synchronization requirements, as is illustrated in Figure 2-3.
Each Junos VM has three management interfaces. Two of those interfaces, em0 and em1, are used for management and map to the external interfaces C0 and C1, respectively. The third management interface, em2, is used exclusively for communication between the two Junos VMs. For example, control plane protocols such as NSR, NSB, and GRES are required in order for a successful ISSU to complete; these protocols would communicate across the isolated em2 interface as well as an isolated ctrlbr0 bridge table in the Linux host.
Figure 2-3. The QFX5100 Linux KVM and management architecture
The backup Junos VM is only created and running during an ISSU. At a high level, Junos goes through the following steps during an ISSU:
§ The backup Junos VM is created and started.
§ The backup Junos VM is upgraded to the software version specified in the ISSU command.
§ The PFE goes into an ISSU-prepared state in which data is copied from the PFE to RAM.
§ The PFE connects to the recently upgraded backup Junos VM, which now becomes the master routing engine.
§ The PFE performs a warm reboot.
§ The new master Junos VM installs the PFE state from RAM back into the PFE.
§ The other Junos VM is shut down.
§ Junos has been upgraded and the PFE has performed a warm reboot.
Let’s see an ISSU in action:
dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-
domestic-signed.tgz
warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
error: 'Non Stop Routing' not configured
error: aborting ISSU
error: ISSU Aborted!
ISSU: IDLE
Ah, bummer! What happened here? There are some requirements for the control plane that must be enabled before a successful ISSU can be achieved:
§ NSR
§ NSB
§ GRES
§ Commit Synchronization
Let’s configure these quickly and try an ISSU once again.
{master:0}[edit]
dhanks@QFX5100# set chassis redundancy graceful-switchover
{master:0}[edit]
dhanks@QFX5100# set protocols layer2-control nonstop-bridging
{master:0}[edit]
dhanks@QFX5100# set system commit synchronize
{master:0}[edit]
dhanks@QFX5100# commit and-quit
configuration check succeeds
commit complete
Exiting configuration mode
OK, now that all of the software features required for ISSU are configured and committed, let’s try the ISSU one more time:
dhanks@QFX5100> request system software in-service-upgrade flex-13.2X51-D20.2-
domestic-signed.tgz
warning: Do NOT use /user during ISSU. Changes to /user during ISSU may get lost!
ISSU: Validating Image
ISSU: Preparing Backup RE
Prepare for ISSU
ISSU: Backup RE Prepare Done
Extracting jinstall-qfx-5-flex-13.2X51-D20.2-domestic ...
Install jinstall-qfx-5-flex-13.2X51-D20.2-domestic completed
Spawning the backup RE
Spawn backup RE, index 1 successful
GRES in progress
GRES done in 0 seconds
Waiting for backup RE switchover ready
GRES operational
Copying home directories
Copying home directories successful
Initiating Chassis In-Service-Upgrade
Chassis ISSU Started
ISSU: Preparing Daemons
ISSU: Daemons Ready for ISSU
ISSU: Starting Upgrade for FRUs
ISSU: Preparing for Switchover
ISSU: Ready for Switchover
Checking In-Service-Upgrade status
Item Status Reason
FPC 0 Online
Send ISSU done to chassisd on backup RE
Chassis ISSU Completed
ISSU: IDLE
Initiate em0 device handoff
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
pci-stub 0000:01:00.1: transaction is not cleared; proceeding with reset anyway
em0: bus=0, device=3, func=0, Ethernet address 10:0e:7e:b2:2d:78
hub 1-1:1.0: over-current change on port 1
hub 1-1:1.0: over-current change on port 3
hub 1-1:1.0: over-current change on port 5
QFX5100 (ttyd0)
login:
Excellent! The ISSU has completed successfully and no traffic was impacted during the software upgrade of Junos.
One of the advantages of the Broadcom warm reboot feature is that no firmware is installed in the PFE. This effectively makes the ISSU problem a control plane–only problem, which is very easy to solve. When you need to synchronize both the PFE firmware and control plane firmware, there are more moving parts, and the problem is more difficult to solve. Juniper MX Series by Douglas Richard Hanks, Jr. and Harry Reynolds (O’Reilly) thoroughly explains all of the benefits and drawbacks of ISSU in such a platform that upgrades both the control plane firmware in addition to the PFE firmware. The end result is that a control plane–only ISSU is more stable and finishes much faster when compared to a platform such as the Juniper MX. However, the obvious drawback is that no new PFE features can be used as part of a control plane–only ISSU, which is where the Juniper MX would win.
Summary
This chapter walked you through the design of the control plane and how the Juniper QFX5100 is really just a server that thinks it’s a switch. The Juniper QFX5100 has a powerful Intel CPU, standard memory, and SSD hard drives. What was surprising is that the switch boots directly into Linux and uses KVM to virtualize Junos, which is the network operating system. Because Junos is running a VM, it enables the Juniper QFX5100 to support carrier-class features such as ISSU, NSR, and NSB.