Juniper QFX5100 Series (2015)
Chapter 3. Performance and Scaling
One of the more challenging tasks of a network architect is to ensure that a design put forth meets the end-to-end solution requirements. The first step is identifying all of the roles in an architecture; this could be as simple as defining the edge, core, aggregation, and access tiers in the network. Each role has a specific set of responsibilities in terms of functionality and requirements. To map a product to a role in an architecture, the product must meet or exceed the requirements and functionality required by each role for which it’s being considered. Thus, building an end-to-end solution architecture is a bit like a long chain: it’s only as strong as the weakest link.
The most common method for ascertaining the product capabilities, performance, and scale are through datasheets or the vendor’s account team. However, the best method is actually testing by going through a proof of concept or certification cycle. This requires that you build out all of the roles and products in the architecture and measure the end-to-end results; this method quickly flushes out any issues before moving into procurement and production.
This chapter will walk through all of the performance and scaling considerations required to successfully map a product into a specific role in an end-to-end architecture. Attributes such as MAC address, host entries, and IPv4 prefixes will be clearly spelled out. Armed with this data, you will be able to easily map Juniper QFX5100 series switches into many different roles in your existing network.
Design Considerations
Before any good network architect jumps head first into performance and scaling requirements, he will need to make a list of design considerations. Each one places an additional tax on the network that is outside of the scope of traditional performance and scaling requirements.
Overlay Architecture
One of the first design questions that you need to consider when planning a next-generation network is do you need to centrally orchestrate all resources in the data center so that applications can be deployed within seconds? The follow-up question is do you currently virtualize your data center compute and storage with hypervisors and cloud management platforms? If the answer is yes to both these questions, you must consider an overlay architecture when it comes to the data center network.
Given that compute and storage has already been virtualized, the next step is to virtualize the network. By using an overlay architecture in the network, you can decouple physical hardware from the network, which is one of the primary tenets of virtualization. Decoupling the network from the physical hardware allows the network to be programmatically provisioned within seconds. As of this writing, two great examples of products that support overlay architectures are Juniper Contrail and VMware NSX.
Moving to a new network architecture places a different “network tax” on the data center. Traditionally, when servers and virtual machines (VMs) are connected to a network, they each consume a MAC address and host route entry in the network. However, in an overlay architecture, only thevirtual tunnel end points (VTEP) consume a MAC address and host route entry in the network. All VM traffic is now encapsulated between VTEPs and the MAC address, and the host route of each VM isn’t visible to the underlying networking equipment. Now, the MAC address and host route scale has been moved from the physical network hardware to the hypervisor.
Bare-metal servers
It’s rare to find a data center that has virtualized 100 percent of its compute resources. There’s always a subset of servers that you cannot virtualize due to performance, compliance, or any number of other reasons. This raises an interesting question: if 80 percent of the servers in the data center are virtualized and take advantage of an overlay architecture, how do you provide connectivity to the other 20 percent of physical servers?
Overlay architectures support several mechanisms to provide connectivity to physical servers. The most common option is to embed a VTEP into the physical access switch, as demonstrated in Figure 3-1.
Figure 3-1. Virtual to physical data flow in an overlay architecture
In Figure 3-1, each server on the left and right of the IP Fabric has been virtualized with a hypervisor. Each hypervisor has a VTEP within it that handles the encapsulation of data plane traffic between VMs. Each VTEP also handles MAC address learning, provisioning of new virtual networks, and other configuration changes. The server on top of the IP Fabric is a simple physical server but doesn’t have any VTEP capabilities of its own. For the physical server to participate in the overlay architecture, it needs something to encapsulate the data plane traffic and perform MAC address learning. Being able to handle the VTEP role inside of an access switch simplifies the overlay architecture. Now, each access switch that has physical servers connected to it can simply perform the overlay encapsulation and control plane on behalf of the physical server. From the point of view of the physical server, it simply sends traffic into the network without having to worry about anything else.
The Juniper QFX5100 series supports full overlay integration for both Juniper Contrail and VMware NSX in the data plane and control plane. However, the use case isn’t limited to only bare-metal servers; another use case would be to inject physical network services such as load balancing or firewalls into an overlay architecture.
Juniper Architectures versus Open Architectures
The other common design option is to weight the benefits of Juniper architectures with open architectures. The benefits of a Juniper architecture is that it has been designed specifically to enable turnkey functionality, but the downside is that it requires a certain set of products to operate. On the other side are open architectures. The benefit to an open architecture is that it can be supported across a set of multiple vendors, but the downside is that you might lose some capabilities that are only available in the Juniper architectures.
Generally, it boils down to the size of the network. If you know that your network will never grow past a certain size and you’re procuring all of the hardware up front, using a Juniper architecture might simply outweigh all of the benefits of an open architecture, because there isn’t a need to support multiple vendors. Another scenario is that your network is large enough that you can’t build it all at once and want a pay-as-you-grow option over the next five years. A logical option would be to implement open architectures so that as you build out your network, you aren’t limited in the number of options going forward. Another option would be to take a hybrid approach and build out the network in points of delivery (POD). Each POD could have the option to take advantage of proprietary architectures or not.
Each business and network is going to have any number of external forces that weigh on the decision to go with Juniper architectures and open architectures, and more often than not, these decisions change over time. Unless you know 100 percent of these nuances up front, it’s important to select a networking platform that offers both Juniper architectures and open architectures.
The Juniper QFX5100 series offers the best of both worlds. It supports open architectures equally as well as Juniper architectures, as is summarized here:
Juniper Architectures
The Juniper QFX5100 family is able to participate in a Juniper QFabric architecture as a node. You can also use them to build a Virtual Chassis Fabric (VCF) or a traditional Virtual Chassis. In summary, these Juniper architectures give you the ability to build a plug-and-play Ethernet fabric with a single point of management and support converged storage.
Open Architectures
Juniper QFX5100 switches support Multi-Chassis Link Aggregation (MC-LAG) so that downstream devices can simply use IEEE 802.1AX/LACP to connect and transport data. The Juniper QFX5100 series also supports a wide range of open protocols, such as Border Gateway Protocol (BGP), Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), and a suite of Multiprotocol Label Switching (MPLS) technologies.
The Juniper QFX5100 makes a great choice no matter where you place it in your network. You could choose to deploy an open architecture today, and change to a Juniper architecture in the future. One of the best tools in creating a winning strategy is to keep the number of options high.
Over-subscription
There are several different types of chipsets in the Broadcom Trident II family. Each chipset has different performance and over-subscription values. Table 3-1 lists them for you.
Broadcom chipset |
I/O bandwidth |
Core bandwidth |
Over-subscription ratio |
Trident II: option 1 |
1,280 Gbps |
960 Gbps |
4:3 |
Trident II: option 2 |
1,280 Gbps |
720 Gbps |
16:9 |
Trident II: option 3 |
960 Gbps |
960 Gbps |
1:1 |
Trident II: option 4 |
720 Gbps |
720 Gbps |
1:1 |
Table 3-1. Broadcom Trident II family bandwidth and over-subscription options |
All of the Juniper QFX5100 platforms have been designed around Broadcom Trident II option 1, which is the BCM56850 chipset. Out of all of the options available, this chipset represents the most I/O and core bandwidth available. To fully understand the implications of the 4:3 over-subscription, let’s take a closer look at the chipset’s architecture.
Architecture
The BCM56850 is divided into four groups (see Figure 3-2). Each group supports 25% of the available core bandwidth, which in the case of the BCM56850 is 960 Gbps; thus, each group supports 240 Gbps in the core. Each group also has a set of eight cores that are responsible for processing traffic. Each core can handle 40 Gbps of traffic, and because each group has eight cores, the total amount of I/O bandwidth each group can support is 320 Gbps.
Figure 3-2. Block diagram of the BCM58850 chipset
In summary, each group supports 240 Gbps of core bandwidth and 320 Gbps of I/O bandwidth via the eight cores. Simplifying the ratio 320:240 results in the 4:3 over-subscription, as stipulated earlier in Table 3-1.
Figure 3-3. Flow visualization of I/O and core bandwidth
The final result in an over-subscription of the I/O to core bandwidth is that packets of a certain size will be dropped assuming that all of the ports in the switch are running at line rate. Details of the effects of over-subscription are discussed in the “Performance” later in the chapter.
QFX5100-24Q System Modes
As a result of the over-subscription and port channelization features of the BCM56850 chipset, the data center operator is afforded more flexibility in the deployment of the switch. The Juniper QFX5100-24Q is the most flexible platform in the Juniper QFX5100 series, and it supports several system modes in which the switch can operate. Each mode is designed specifically to enable certain capabilities over the others. Understanding what each mode enables is critical because it will be another design consideration in the overall architecture of your network.
WARNING
Any renumbering of interfaces requires a warm Broadcom chipset reboot. For example, changing from one mode to another will cause a small interruption in data plane traffic as the Broadcom chipset performs a warm reboot to reconfigure the number of ports. The only exception is the Flexible QIC mode. Depending on which QIC you use, the number of ports can vary; however, as long as you stay in Flexible QIC mode, no Broadcom chipset reboot is required.
Fully subscribed mode
The fully subscribed mode is the default mode for the Juniper QFX5100-24Q. Because the Juniper QFX5100-24Q has a native bandwidth capacity of 960 Gbps (24 ports of 40 Gbps) without any modules installed, it’s able to provide full line-rate performance for all packet sizes without drops. In this default mode, you cannot use any of the QIC modules; however, you can channelize all of the native 40GbE ports into 4 10GbE interfaces. The port configurations can be summarized as follows:
24 40GbE
In the default configuration, you can use all of the 40GbE interfaces on the Juniper QFX5100-24Q.
96 10GbE
By taking advantage of port channelizing, each of the 40GbE interfaces can be broken out into 4 10GbE interfaces.
In summary, the default mode only supports the 24 40GbE interfaces on the Juniper QFX5100-24Q; you cannot use the two QIC modules.
104-port mode
One of the limitations of the BCM56850 chipset is that the total port count cannot exceed 104. For such a scenario in which you require 104 10GbE interfaces, the Juniper QFX5100-24Q can be put into a 104-port system mode. It’s required that you channelize each of the native 24 40GbE interfaces. In addition, this mode requires a single 4 40GbE QIC be installed in slot 1 and the first two ports be channelized, whereas the remaining two ports are unused. In such a configuration, the native 24 40GbE interfaces are combined with the first 2 40GbE interfaces in the 4 40GbE QIC in slot 1, creating a total of 26×40GE. Each of the 26×40GE interfaces must be channelized into 104 10GbE interfaces. Because the I/O bandwidth is now 1,040 Gbps, the total I/O-to-core bandwidth over-subscription is 13:12. For certain packet sizes, there will be 20 to 30 percent traffic loss, assuming all 104 ports are operating at line rate. Details of the effects of over-subscription are discussed in “Performance”.
QIC mode
The QIC mode is similar to the 104-port mode, except both QIC slots can be used and there’s no requirement to channelize the 40GbE interfaces. However, there are two restrictions:
§ The 8 10GbE QIC isn’t supported in the QIC mode.
§ You cannot channelize the 4 40GbE QIC, only the native 24 40GbE interfaces.
Considering these restrictions, there are two major port configurations:
32 40GbE
All of the native 24 40GbE interfaces are combined with two 4 40GbE QIC modules for a total of 32 40GbE interfaces on the switch.
96 10GbE and 8 40GbE
All of the native 24 40GbE interfaces are channelized into 96 10GbE ports, and the two 4 40GbE QICs provided the 8 40GbE interfaces; this is a sneaky port configuration because it stays within the BCM56850 chipset requirement to not exceed 104 total ports.
In summary, the QIC mode turns the Juniper QFX5100-24Q into a 1RU QFX5100-96S or supports 32 40GbE interfaces. Because the I/O bandwidth exceeds the core bandwidth, this system mode is subject to packet loss for certain packet sizes, assuming that all ports are operating at line rate.
Flexible QIC mode
If all of the other system modes weren’t enough for you, the Juniper QFX5100-24Q offers yet one final mode: flexible QIC mode. This mode makes it possible for you to use any type of QIC in the Juniper QFX5100-24Q. There are two restrictions of which you need to be mindful:
§ You cannot channelize any of the QICs.
§ You cannot channelize ports et-0/0/0 through et-0/0/3 on the Juniper QFX5100-24Q itself, but you can channelize ports et-0/0/4 through et-0/0/23.
Such restrictions create some interesting port configurations, which are presented in Table 3-2.
Native ports |
QIC 0 |
QIC 1 |
Max 40GbE |
Max 10GbE |
24 40GbE |
4 40GbE |
4 40GbE |
32 40GbE |
80 10GbE |
24 40GbE |
8 10GbE |
4 40GbE |
28 40GbE |
88 10GbE |
24 40GbE |
8 10GbE |
8 10GbE |
24 40GbE |
96 10GbE |
Table 3-2. QFX5100-24Q flexible QIC mode port configuration options |
In summary, with the flexible QIC mode, you can support all of the different types of QIC modules, which most commonly will be deployed as the 32 40GbE configuration when building a spine-and-leaf or Clos IP fabric. Although the overall number of ports can change depending on which QIC you use, it doesn’t require a warm reboot as long as you stay in the flexible QIC mode.
Review
The Juniper QFX5100-24Q offers a lot options with respect to port configurations. The general rule of thumb is that the overall number of ports must not exceed 104. There are a total of four system modes and each is unique in the way the switch operates. Table 3-3 summarizes the four system modes and their attributes.
Mode |
I/O-to-core bandwidth ratio |
QIC 0 |
QIC 1 |
Max 40GbE |
Max 10GbE |
Channelize native ports? |
Channelize QICs? |
Fully subscribed |
1:1 |
No |
No |
24 40G |
96 10GbE |
Yes |
No |
104-port |
13:12 |
Channelize first 2 40GbE |
No |
None |
104 10GbEE |
Yes |
Channelize first 2 40GbE |
QIC |
4:3 |
4 40GbE |
4 40GbE |
32 40GbE |
96 10GbE |
Yes |
No |
Flexible |
4:3 |
4 40GbE |
4 40GbE |
32 40GbE |
80 10GbE |
Yes |
No |
Flexible |
4:3 |
8 10GbE |
4 40GbE |
28 40GbE |
88 10GbE |
Yes |
No |
Flexible |
7:6 |
8 10GbE |
8 10GbE |
24×40GbE |
96 10GbE |
Yes |
No |
Table 3-3. The Juniper QFX5100-24Q system modes and attributes |
It’s important to consider what role within the architecture the Juniper QFX5100-24Q fits. Depending on the system mode, it can fit into any number of possibilities. For example, in QIC mode, the Juniper QFX5100-24Q supports 32 40GbE interfaces, which makes a lot sense in the core and aggregation of a network. On the other hand, running the Juniper QFX5100-24Q in 104-port mode offers 104 10GbE interfaces in a 1RU form factor, which makes a lot of sense in the access tier of the network. The Juniper QFX5100 series has been designed from the ground up to give you more options.
Performance
With the critical design considerations out of the way, it’s now time to focus on the performance characteristics of the Juniper QFX5100 series. Previously in this chapter, we explored the BCM56850 chipset and how the I/O and core bandwidth work together in a balancing act of port density versus performance. Performance can be portrayed through two major measurements: throughput and latency. Let’s examine each of them.
Throughput
The throughput Juniper QFX5100 switches will vary depending on system mode in which the device is operating. The fully subscribed (default) mode has an over-subscription of 1:1 and doesn’t have any loss in traffic when all of the ports are operating at line rate. All of the other modes will have some level of I/O and core bandwidth over-subscription (refer to Table 3-3).
The key questions are the following:
§ What conditions cause over-subscription?
§ What packet sizes are affected?
§ How much traffic is dropped?
To over-subscribe the switch, it must be currently processing more traffic than the core bandwidth can handle, which is 960 Gbps. The best way to answer the rest of the questions is with the graph shown in Figure 3-4.
Figure 3-4. 1,280 Gbps throughput versus packet size
There’s a lot happening in the graph in Figure 3-4. It can be summarized as the following:
§ Packet sizes 64B through 86B vary in performance 78 to 99 percent.
§ Packet sizes 87B through 144B offer line-rate performance.
§ Packet sizes 145B through 193B vary in performance 77 to 99 percent.
§ Packet sizes 194B through 12,288B offer line-rate performance.
In summary, only packet sizes between 64B through 86B and 145B through 193B have varying traffic loss of 20 to 1 percent when there is congestion on the switch. Another way to view it is out of 12,228 possible packet sizes, only 0.005 percent suffer traffic loss. If you want to be pedantic and assume only 1,514 possible packet sizes, only 0.05 percent suffer traffic loss.
The reason the chipset is able to forward some packet sizes at line rate and not others is how the stepping in line-rate frequency is required to process some packet sizes versus others. Packet sizes ranging from 64B to 86B and 145B to 193B require a higher frequency to process than other sizes and are subject to a varying amount of traffic loss during switch congestion.
WARNING
Keep in mind that traffic loss is only experienced in system modes other than fully subscribed/default.
Latency
Latency is the measurement of time between when a packet enters the switch on an ingress port and when it leaves the switch on an egress port, as illustrated in Figure 3-5.
Figure 3-5. End-to-end switch latency
With modern hardware such as the Juniper QFX5100 series, the amount of latency continues to decrease. In the vast majority of use cases, latency isn’t a major concern; however, there exists a subsegment in the financial-services markets and high-performance computing that specialize in low latency.
Cut-through and store-and-forward
There are two modes that greatly impact the switch’s overall latency: cut-through and store-and-forward. Each mode is purposely designed to excel in specific use-cases.
Cut-Through
A switch that operates in a cut-through mode will begin to transmit the packet on the egress port at the same time it is receiving it on the ingress port. The benefit here is a reduction in overall latency within the switch because there’s no delay in transmitting the packet to its destination. The drawback is that cut-through mode has no way of discarding a corrupt packet, because the majority of the packet will already be transmitted on the egress port before the FCS is received on the ingress port. In larger networks or with multicast, cut-through mode can cause a lot of unnecessary processing in upstream devices when replicating corrupt packets.
Store-and-Forward
The default setting for the Juniper QFX5100 family is store-and-forward; this mode is how most switches have operated for a long time. The ingress packet must be fully received before the switch will transmit the packet on the egress port. The advantage is that the switch can perform error checks on the packet and discard it if it’s corrupt. The drawback is that store-and-forward requires a buffer within the switch to store the packet while it’s being received; this increases the cost and overall latency.
Unless you’re building a financial trading platform or high-performance computing environment, the default mode of store-and-forward will generally meet and exceed all of your latency requirements.
Conditions for cut-through
By default, the Juniper QFX5100 family operates in store-and-forward mode. To enable cut-through mode, you must issue and commit the following command:
[edit]
dhanks@QFX5100# set forwarding-options cut-through
Don’t be fooled: this command is just the first step to enable cut-through mode. There are many conditions that a packet must meet in order to be eligible for cut-through mode; otherwise, it defaults back to store-and-forward. This decision process is done on a per-packet basis, although the cut-through is a system-wide setting. The first set of requirements is that only matching ingress and egress interface speeds are eligible for cut-through mode, as presented in Table 3-4.
Ingress port |
Egress port |
Cut-through (CT) system mode |
Store-and-forward (SF) system mode |
10GbE |
10GbE |
CT |
SF |
40GbE |
40GbE |
CT |
SF |
10GbE |
40GbE |
SF |
SF |
40GbE |
10GbE |
SF |
SF |
1GbE |
1GbE |
CT |
SF |
1GbE |
10GbE |
SF |
SF |
10GbE |
1GbE |
SF |
SF |
Table 3-4. Forwarding modes based on port speed and system mode |
For example, if the Juniper QFX5100 switch were configured to be in cut-through mode, but a packet arrived on a 40GbE ingress interface and was transmitted on a 10GbE egress interface, that packet would not be eligible for cut-through mode and would default back to store-and-forward.
If the packet meets the conditions specified in Table 3-4, it will be subject to additional conditions before being forwarded via cut-through.
§ The packet must not be destined to the routing engine.
§ The egress port must have an empty queue with no packets waiting to be transmitted.
§ The egress port must not have any shapers or rate limiting applied.
§ The ingress port must be in-profile if it’s subject to rate limiting.
§ For multicast packets, each egress port must meet all conditions. If one egress port out of the set doesn’t meet the conditions, all multicast packets will be transmitted via store-and-forward; the chipset doesn’t support partial cut-through packets.
To further understand the benefits of improved latency of cut-through mode, let’s compare it directly to store-and-forward with different sized packets up to 1,514 bytes, as illustrated in Figure 3-6.
The cut-through latency increases slowly from 64 bytes up to about 600 bytes and remains steady at about 0.73 µs. On the other hand, the store-and-forward is fairly linear from 64 bytes all the way to 1,514 bytes. In summary, cut-through and store-and-forward have less than 1 µs of latency when the packet is less than 1,514 bytes.
Let’s take a look at what happens when you enable jumbo frames. Figure 3-7 starts in the same place at 64 bytes but goes all the way up to 9,216 bytes.
Figure 3-6. Approximate latency for the BCM56850 chipset using 40GbE with frames up to 1,514 bytes
In summary, the store-and-forward continues to stay fairly linear from 64 bytes to 9,216 bytes; however cut-through flattens out at approximately 0.73 µs from 600 bytes to 9,216 bytes. Store-and-forward follows a linear progression simply because the latency is a factor of how large the packet is. The larger the packet, the more memory it takes to buffer it before it’s allowed to be transmitted. Cut-through mode stays flat because it simply begins transmitting the packet as soon as it’s received; thus the packet size is never a factor in the overall latency.
WARNING
These graphs represent approximate latency on the BCM56850 chipset using 40GbE interfaces. Actual values will vary based on firmware, port speed, and other factors. If latency is critical to your environment, you need to evaluate the latency in your lab under controlled conditions.
Figure 3-7. Approximate latency for the BCM56850 chipset using 40GbE with jumbo frames
Scale
Scale can be expressed many different ways. The most common methods are the configuration maximums of the control plane and data plane. It’s also common to peg the scaling maximums to the OSI model, for example Layer 2 versus Layer 3. The Juniper QFX5100 series is unique in the sense that you can adjust the balance of Layer 2 versus Layer 3 data plane scale. Let’s dive into the details.
Unified Forwarding Table
The Juniper QFX5100 series has the unique ability to use a customized forwarding table. The forwarding table is broken into three major tables:
MAC Address Table
In a Layer 2 environment, the switch will learn new MAC addresses and it stores them in the MAC address table.
Layer 3 Host Table
In a Layer 2 and Layer 3 environment, the switch will also learn which IP addresses are mapped to which MAC addresses; these key-value pairs are stored in the Layer 3 host table.
Longest Prefix Match (LPM) Table
In a Layer 3 environment, the switch will have a routing table, and the most specific route will have an entry in the forwarding table to associate a prefix/netmask to a next-hop; this is stored in the LPM table. The one caveat is that all IPv4 /32 prefixes and IPv6 /128 prefixes are stored in the Layer 3 host table.
Traditionally, these tables have been statically defined from the vendor and only support a fixed number of entries, which ultimately limits what role in the architecture into which a traditional switch can fit.
The Unified Forwarding Table (UFT) in the Juniper QFX5100 family allows you to dynamically move around forwarding table resources so that you can tailor the switch to your network. In summary, the UFT offers five preconfigured profiles from heavy Layer 2 to heavy Layer 3 allocations, as shown in Table 3-5.
Profile |
MAC addresses |
L3 hosts |
LPM |
l2-profile-one |
288,000 |
16,000 |
16,000 |
l2-profile-two |
224,000 |
56,000 |
16,000 |
l3-profile-three |
160,000 |
88,000 |
16,000 |
l3-profile |
96,000 |
120,000 |
16,000 |
lpm-profile |
32,000 |
16,000 |
128,000 |
Table 3-5. The Juniper QFX5100 UFT profiles |
The UFT is a very powerful tool that completely changes the personality of the switching, allowing it to move freely throughout the network architecture. Each profile has a linear progression toward a larger Layer 3 host table, as depicted in Figure 3-8.
Using a heavy MAC address table makes it possible for Juniper QFX5100 switches to handle a lot of Layer 2 traffic such as a traditional virtualization environment with servers hosting a large amount of VMs. The last profile gives you the ability to operate Juniper QFX5100 devices in the core of a network architecture or use them as a building block in a large Clos IP fabric; this is because an IP fabric by nature will have a larger routing table than MAC address tables.
Figure 3-8. Juniper QFX5100 series UFT
To check the current forwarding mode the Juniper QFX5100 switch, use the show chassis forwarding-options command:
dhanks@qfx5100> show chassis forwarding-options
--------------------------------------------------------------------------
Current UFT Configuration:
l2-profile-three
You can see from the preceding output that this particular Juniper QFX5100 switch is currently in l2-profile-three mode, which gives the forwarding table 160K MAC addresses, 88K L3 hosts, and 16K LPM entries. The forwarding table can be changed by using the following command:
[edit]
dhanks@qfx5100# set chassis forwarding-options ?
Possible completions:
+ apply-groups Groups from which to inherit configuration data
+ apply-groups-except Don't inherit configuration data from these groups
l2-profile-one MAC: 288K L3-host: 16K LPM: 16K. This will restart PFE
l2-profile-three MAC: 160K L3-host: 144K LPM: 16K. This will restart PFE
l2-profile-two MAC: 224K L3-host: 80K LPM: 16K. This will restart PFE
l3-profile MAC: 96K L3-host: 208K LPM: 16K. This will restart PFE
lpm-profile MAC: 32K L3-host: 16K LPM: 128K. This will restart PFE
WARNING
Be mindful that when you change the UFT profile and commit, the BCM56850 chipset will need to perform a warm reboot, and there will be temporary traffic loss.
Hashing
The Juniper QFX5100 uses a sophisticated hashing algorithm called RTAG7 to determine the next-hop interface for Equal-Cost Multipath (ECMP) routing and Link Aggregation (LAG). Each packet is subject to the following fields when determining the next-hop interface:
§ Source MAC address
§ Destination MAC address
§ Ethernet type
§ VLAN ID
§ Source IP address
§ Destination IP address
§ IPv4 protocol or IPv6 next header
§ Layer 4 source port
§ Layer 4 destination port
§ MPLS label
There are also two additional fields that are used to calculate the hash that are internal to the system:
§ Source device ID
§ Source port ID
The following types of protocols are supported for ECMP on the Juniper QFX5100 as of Junos 13.2X51-D20.2:
§ IPv4
§ IPv6
§ MPLS
§ MAC-in-MAC
Note that additional protocols can be supported with a new software release; please check the release notes for Junos going forward.
NOTE
The hash algorithm for ECMP and LAG use the same packet fields as those just listed, but note that an internal hash index is calculated differently. This method avoids traffic polarization when a LAG bundle is part of an ECMP next-hop.
Resilient Hashing
One of the challenges in the data center when building IP fabrics with stateful devices—such as firewalls—is minimizing the number of next-hop changes during link failures. For example, the Juniper QFX5100 will perform standard RTAG7 hashing on all ingress flows and send out a next-hop as dictated by the hashing algorithm. If a firewall were to fail, the standard RTAG7 hashing algorithm on the QFX5100 switch would be impacted and the egress next-hop for new and existing flows would be assigned next-hops. The end result is that existing flows would be hashed to a new firewall. Because the new firewall doesn’t have a session entry for the rerouted flow, the firewall would simply discard the traffic, as shown in Figure 3-9.
Figure 3-9. Resilient hashing overview
The Juniper QFX5100 supports a new type of hashing called resilient hashing that minimizes the number of next-hop changes during link failures. If a firewall were to fail, the Juniper QFX5100 would keep the existing flows mapped to their existing egress next-hops. The end result is that when a firewall fails, all of the other flows continue to flow through their existing firewalls without impact.
The Juniper QFX5100 series also supports resilient hashing for a LAG interface, as well. In summary, resilient hashing supports both Layer 3 ECMP and LAG ECMP.
Layer 2 LAG
To enable resilient hashing for Layer 2 LAG members, use the following command (replace ae0 with the intended interface name for your environment):
# set interface ae0 aggregated-ether-options resilient-hash
Layer 3 ECMP
To enable resilient hashing for Layer 3 ECMP, use the following command:
# set forwarding-options enanced-hash-key ecmp-resilient hash
Configuration Maximums
The Juniper QFX5100 has a set of configuration maximums that you need to be aware of as you design your network. The Juniper QFX5100 should work just fine in the majority of use cases, but there could be instances for which you might need more scale. Use Table 3-6 as a reference.
Key |
Value |
MAC addresses |
288 K (UFT l2-profile-one) |
ARP entries |
48 K |
Jumbo frame size |
9,216 bytes |
IPv4 unicast routes |
128 K prefixes, 208 K host routes |
IPv4 multicast routes |
104 K |
IPv6 unicast routes |
64 K |
IPv6 multicast routes |
52 K |
VLAN IDs |
4,094 |
FCoE VLANs |
4,094 |
Link aggregation groups |
128 |
Members per LAG |
32 |
Firewall filters |
4 K |
ECMP |
64 |
MSTP instances |
64 |
VSTP instances |
253 |
Mirroring destination ports |
4 |
Mirroring sessions |
4 |
Mirroring destination VLANs |
4 |
Table 3-6. QFX5100 family configuration maximums |
There will be some configuration maximums such as the UFT, MAC addresses, and others that are pinned to the BCM 56850 chipset and can never be increased. However there are other configuration maximums such as ECMP, link aggregation groups, and STP instances that you can increase over time with Junos software updates.
Summary
This chapter covered many of the design considerations that you must take into account before looking at the scale of each role in the architecture. These design considerations are using compute virtualization in the data center and an overlay architecture. Moving to an overlay architecture in the data center changes many of the traditional scaling requirements with which you are familiar.
The Juniper QFX5100-24Q has four different system modes to handle over-subscription to provide a customized personality depending on the use case. The system modes are:
§ Fully subscribed mode (default)
§ 104-port mode
§ QIC mode
§ Flexible QIC mode
Each of the system modes impact how the I/O and core bandwidth are handled, ultimately changing the throughput characteristics of the switch.
The Juniper QFX5100 chipset also has a next generation UFT with which you can choose one of five preconfigured profiles from Layer 2 heavy to Layer 3 heavy; this gives you the freedom to place the Juniper QFX5100 switch anywhere in your network and fine-tune the logical scale to match its role in the network.
Many factors impact the latency of a network switch. The Juniper QFX5100 family offers two forwarding modes: cut-through and store-and-forward. Cut-through gives you the lowest possible latency at the expense of forwarding corrupt frames. Store-and-forward has slightly higher latency, but completely buffers the packet and is able to discard corrupt packets.
In summary, the Juniper QFX5100 family gives you the power of options. When trying to solve complicated problems, the easiest method is to break it down into simple building blocks. The more options that are available to you, the greater your chances are of executing a successful data center strategy and architecture. Let’s review the options the Juniper QFX5100 series affords you to consider in this chapter:
§ Traditional IP network versus overlay architecture
§ VMware NSX versus Juniper Contrail
§ Four system modes to fine-tune the over-subscription in the data plane
§ Five profiles to fine-tune the logical scaling in the data plane
§ Cut-through mode versus store-and-forward mode
Juniper QFX5100 switches are very exciting, and, as of this writing, represent Juniper’s best switches ever created. As you work your way through this book, think about all of the different places in your network where the Juniper QFX5100 series of switches could be used and make it better.
Chapter Review Questions
1. Which overlay control plane protocols does the Juniper QFX5100 family support?
1. Open vSwitch Database
2. Device Management Interface
3. All of the above
4. None of the above
2. How does the Juniper QFX5100 series support bare-metal servers in an overlay architecture?
1. Forward all traffic from the bare-metal server to the SDN controller
2. Forward all traffic from the bare-metal server to the closest hypervisor VTEP
3. Handle all encapsulation and forwarding in the switch’s hardware
4. Implement a VTEP inside of the switch with a control plane protocol
3. What’s the core bandwidth of the BCM56850 chipset?
1. 1,280 Gbps
2. 960 Gbps
3. 720 Gbps
4. 480 Gbps
4. How many system modes does the Juniper QFX5100-24Q have?
1. 2
2. 3
3. 4
4. 5
5. What’s the I/O bandwidth to core bandwidth ratio of the Juniper QFX5100-24Q when using 32 40GbE interfaces?
1. 1:1
2. 13:12
3. 4:3
4. 5:4
6. How many preconfigured profiles are in the Juniper QFX5100 UFT?
1. 1
2. 3
3. 5
4. 7
7. What’s the maximum number of MAC addresses in a Juniper QFX5100 switch?
1. 128K
2. 224K
3. 256K
4. 288K
8. What’s the maximum size of an Ethernet frame in the Juniper QFX5100 series?
1. 2,048 bytes
2. 4,000 bytes
3. 8,192 bytes
4. 9,216 bytes
Chapter Review Answers
1. Answer: C.
Juniper QFX5100 series switches support both OVSDB and DMI control plane protocols.
2. Answer: C and D.
Trick question. The Juniper QFX5100 family handles the data plane encapsulation in hardware and creates a VTEP inside of the switch for MAC address learning and service provisioning.
3. Answer: B.
Juniper QFX5100 switches use the BCM56850 chipset, which has a core bandwidth of 960 Gbps and I/O bandwidth of 1,280 Gbps.
4. Answer: C.
The Juniper QFX5100-24Q has four system modes: (1) fully subscribed, (2) 104 port, (3) QIC, and (4) flexible QIC.
5. Answer: C.
32 40GbE interfaces requires 1,280 Gbps of I/O bandwidth, which creates a 4:3 ratio of I/O bandwidth to core bandwidth.
6. Answer: C.
Juniper QFX5100 switches support five UFT profiles: (1) l2-profile-one (2) l2-profile-two (3) l2-profile-three (4) l3-profile, and (5) lpm-profile.
7. Answer: D.
The Juniper QFX5100 family supports up to 288K MAC addresses in UFT l2-profile-one.
8. Answer: D.
The Juniper QFX5100 series supports jumbo frames up to 9,216 bytes.