Performance and Scaling - Juniper QFX5100 Series (2015)

Juniper QFX5100 Series (2015)

Chapter 3. Performance and Scaling

One of the more challenging tasks of a network architect is to ensure that a design put forth meets the end-to-end solution requirements. The first step is identifying all of the roles in an architecture; this could be as simple as defining the edge, core, aggregation, and access tiers in the network. Each role has a specific set of responsibilities in terms of functionality and requirements. To map a product to a role in an architecture, the product must meet or exceed the requirements and functionality required by each role for which it’s being considered. Thus, building an end-to-end solution architecture is a bit like a long chain: it’s only as strong as the weakest link.

The most common method for ascertaining the product capabilities, performance, and scale are through datasheets or the vendor’s account team. However, the best method is actually testing by going through a proof of concept or certification cycle. This requires that you build out all of the roles and products in the architecture and measure the end-to-end results; this method quickly flushes out any issues before moving into procurement and production.

This chapter will walk through all of the performance and scaling considerations required to successfully map a product into a specific role in an end-to-end architecture. Attributes such as MAC address, host entries, and IPv4 prefixes will be clearly spelled out. Armed with this data, you will be able to easily map Juniper QFX5100 series switches into many different roles in your existing network.

Design Considerations

Before any good network architect jumps head first into performance and scaling requirements, he will need to make a list of design considerations. Each one places an additional tax on the network that is outside of the scope of traditional performance and scaling requirements.

Overlay Architecture

One of the first design questions that you need to consider when planning a next-generation network is do you need to centrally orchestrate all resources in the data center so that applications can be deployed within seconds? The follow-up question is do you currently virtualize your data center compute and storage with hypervisors and cloud management platforms? If the answer is yes to both these questions, you must consider an overlay architecture when it comes to the data center network.

Given that compute and storage has already been virtualized, the next step is to virtualize the network. By using an overlay architecture in the network, you can decouple physical hardware from the network, which is one of the primary tenets of virtualization. Decoupling the network from the physical hardware allows the network to be programmatically provisioned within seconds. As of this writing, two great examples of products that support overlay architectures are Juniper Contrail and VMware NSX.

Moving to a new network architecture places a different “network tax” on the data center. Traditionally, when servers and virtual machines (VMs) are connected to a network, they each consume a MAC address and host route entry in the network. However, in an overlay architecture, only thevirtual tunnel end points (VTEP) consume a MAC address and host route entry in the network. All VM traffic is now encapsulated between VTEPs and the MAC address, and the host route of each VM isn’t visible to the underlying networking equipment. Now, the MAC address and host route scale has been moved from the physical network hardware to the hypervisor.

Bare-metal servers

It’s rare to find a data center that has virtualized 100 percent of its compute resources. There’s always a subset of servers that you cannot virtualize due to performance, compliance, or any number of other reasons. This raises an interesting question: if 80 percent of the servers in the data center are virtualized and take advantage of an overlay architecture, how do you provide connectivity to the other 20 percent of physical servers?

Overlay architectures support several mechanisms to provide connectivity to physical servers. The most common option is to embed a VTEP into the physical access switch, as demonstrated in Figure 3-1.

Figure 3-1. Virtual to physical data flow in an overlay architecture

In Figure 3-1, each server on the left and right of the IP Fabric has been virtualized with a hypervisor. Each hypervisor has a VTEP within it that handles the encapsulation of data plane traffic between VMs. Each VTEP also handles MAC address learning, provisioning of new virtual networks, and other configuration changes. The server on top of the IP Fabric is a simple physical server but doesn’t have any VTEP capabilities of its own. For the physical server to participate in the overlay architecture, it needs something to encapsulate the data plane traffic and perform MAC address learning. Being able to handle the VTEP role inside of an access switch simplifies the overlay architecture. Now, each access switch that has physical servers connected to it can simply perform the overlay encapsulation and control plane on behalf of the physical server. From the point of view of the physical server, it simply sends traffic into the network without having to worry about anything else.

The Juniper QFX5100 series supports full overlay integration for both Juniper Contrail and VMware NSX in the data plane and control plane. However, the use case isn’t limited to only bare-metal servers; another use case would be to inject physical network services such as load balancing or firewalls into an overlay architecture.

Juniper Architectures versus Open Architectures

The other common design option is to weight the benefits of Juniper architectures with open architectures. The benefits of a Juniper architecture is that it has been designed specifically to enable turnkey functionality, but the downside is that it requires a certain set of products to operate. On the other side are open architectures. The benefit to an open architecture is that it can be supported across a set of multiple vendors, but the downside is that you might lose some capabilities that are only available in the Juniper architectures.

Generally, it boils down to the size of the network. If you know that your network will never grow past a certain size and you’re procuring all of the hardware up front, using a Juniper architecture might simply outweigh all of the benefits of an open architecture, because there isn’t a need to support multiple vendors. Another scenario is that your network is large enough that you can’t build it all at once and want a pay-as-you-grow option over the next five years. A logical option would be to implement open architectures so that as you build out your network, you aren’t limited in the number of options going forward. Another option would be to take a hybrid approach and build out the network in points of delivery (POD). Each POD could have the option to take advantage of proprietary architectures or not.

Each business and network is going to have any number of external forces that weigh on the decision to go with Juniper architectures and open architectures, and more often than not, these decisions change over time. Unless you know 100 percent of these nuances up front, it’s important to select a networking platform that offers both Juniper architectures and open architectures.

The Juniper QFX5100 series offers the best of both worlds. It supports open architectures equally as well as Juniper architectures, as is summarized here:

Juniper Architectures

The Juniper QFX5100 family is able to participate in a Juniper QFabric architecture as a node. You can also use them to build a Virtual Chassis Fabric (VCF) or a traditional Virtual Chassis. In summary, these Juniper architectures give you the ability to build a plug-and-play Ethernet fabric with a single point of management and support converged storage.

Open Architectures

Juniper QFX5100 switches support Multi-Chassis Link Aggregation (MC-LAG) so that downstream devices can simply use IEEE 802.1AX/LACP to connect and transport data. The Juniper QFX5100 series also supports a wide range of open protocols, such as Border Gateway Protocol (BGP), Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), and a suite of Multiprotocol Label Switching (MPLS) technologies.

The Juniper QFX5100 makes a great choice no matter where you place it in your network. You could choose to deploy an open architecture today, and change to a Juniper architecture in the future. One of the best tools in creating a winning strategy is to keep the number of options high.

Over-subscription

There are several different types of chipsets in the Broadcom Trident II family. Each chipset has different performance and over-subscription values. Table 3-1 lists them for you.

Broadcom chipset

I/O bandwidth

Core bandwidth

Over-subscription ratio

Trident II: option 1

1,280 Gbps

960 Gbps

4:3

Trident II: option 2

1,280 Gbps

720 Gbps

16:9

Trident II: option 3

960 Gbps

960 Gbps

1:1

Trident II: option 4

720 Gbps

720 Gbps

1:1

Table 3-1. Broadcom Trident II family bandwidth and over-subscription options

All of the Juniper QFX5100 platforms have been designed around Broadcom Trident II option 1, which is the BCM56850 chipset. Out of all of the options available, this chipset represents the most I/O and core bandwidth available. To fully understand the implications of the 4:3 over-subscription, let’s take a closer look at the chipset’s architecture.

Architecture

The BCM56850 is divided into four groups (see Figure 3-2). Each group supports 25% of the available core bandwidth, which in the case of the BCM56850 is 960 Gbps; thus, each group supports 240 Gbps in the core. Each group also has a set of eight cores that are responsible for processing traffic. Each core can handle 40 Gbps of traffic, and because each group has eight cores, the total amount of I/O bandwidth each group can support is 320 Gbps.

Block diagram of the BCM58850 chipset

Figure 3-2. Block diagram of the BCM58850 chipset

In summary, each group supports 240 Gbps of core bandwidth and 320 Gbps of I/O bandwidth via the eight cores. Simplifying the ratio 320:240 results in the 4:3 over-subscription, as stipulated earlier in Table 3-1.

Flow Visualization of I/O and Core Bandwidth

Figure 3-3. Flow visualization of I/O and core bandwidth

The final result in an over-subscription of the I/O to core bandwidth is that packets of a certain size will be dropped assuming that all of the ports in the switch are running at line rate. Details of the effects of over-subscription are discussed in the “Performance” later in the chapter.

QFX5100-24Q System Modes

As a result of the over-subscription and port channelization features of the BCM56850 chipset, the data center operator is afforded more flexibility in the deployment of the switch. The Juniper QFX5100-24Q is the most flexible platform in the Juniper QFX5100 series, and it supports several system modes in which the switch can operate. Each mode is designed specifically to enable certain capabilities over the others. Understanding what each mode enables is critical because it will be another design consideration in the overall architecture of your network.

WARNING

Any renumbering of interfaces requires a warm Broadcom chipset reboot. For example, changing from one mode to another will cause a small interruption in data plane traffic as the Broadcom chipset performs a warm reboot to reconfigure the number of ports. The only exception is the Flexible QIC mode. Depending on which QIC you use, the number of ports can vary; however, as long as you stay in Flexible QIC mode, no Broadcom chipset reboot is required.

Fully subscribed mode

The fully subscribed mode is the default mode for the Juniper QFX5100-24Q. Because the Juniper QFX5100-24Q has a native bandwidth capacity of 960 Gbps (24 ports of 40 Gbps) without any modules installed, it’s able to provide full line-rate performance for all packet sizes without drops. In this default mode, you cannot use any of the QIC modules; however, you can channelize all of the native 40GbE ports into 4 10GbE interfaces. The port configurations can be summarized as follows:

24 40GbE

In the default configuration, you can use all of the 40GbE interfaces on the Juniper QFX5100-24Q.

96 10GbE

By taking advantage of port channelizing, each of the 40GbE interfaces can be broken out into 4 10GbE interfaces.

In summary, the default mode only supports the 24 40GbE interfaces on the Juniper QFX5100-24Q; you cannot use the two QIC modules.

104-port mode

One of the limitations of the BCM56850 chipset is that the total port count cannot exceed 104. For such a scenario in which you require 104 10GbE interfaces, the Juniper QFX5100-24Q can be put into a 104-port system mode. It’s required that you channelize each of the native 24 40GbE interfaces. In addition, this mode requires a single 4 40GbE QIC be installed in slot 1 and the first two ports be channelized, whereas the remaining two ports are unused. In such a configuration, the native 24 40GbE interfaces are combined with the first 2 40GbE interfaces in the 4 40GbE QIC in slot 1, creating a total of 26×40GE. Each of the 26×40GE interfaces must be channelized into 104 10GbE interfaces. Because the I/O bandwidth is now 1,040 Gbps, the total I/O-to-core bandwidth over-subscription is 13:12. For certain packet sizes, there will be 20 to 30 percent traffic loss, assuming all 104 ports are operating at line rate. Details of the effects of over-subscription are discussed in “Performance”.

QIC mode

The QIC mode is similar to the 104-port mode, except both QIC slots can be used and there’s no requirement to channelize the 40GbE interfaces. However, there are two restrictions:

§ The 8 10GbE QIC isn’t supported in the QIC mode.

§ You cannot channelize the 4 40GbE QIC, only the native 24 40GbE interfaces.

Considering these restrictions, there are two major port configurations:

32 40GbE

All of the native 24 40GbE interfaces are combined with two 4 40GbE QIC modules for a total of 32 40GbE interfaces on the switch.

96 10GbE and 8 40GbE

All of the native 24 40GbE interfaces are channelized into 96 10GbE ports, and the two 4 40GbE QICs provided the 8 40GbE interfaces; this is a sneaky port configuration because it stays within the BCM56850 chipset requirement to not exceed 104 total ports.

In summary, the QIC mode turns the Juniper QFX5100-24Q into a 1RU QFX5100-96S or supports 32 40GbE interfaces. Because the I/O bandwidth exceeds the core bandwidth, this system mode is subject to packet loss for certain packet sizes, assuming that all ports are operating at line rate.

Flexible QIC mode

If all of the other system modes weren’t enough for you, the Juniper QFX5100-24Q offers yet one final mode: flexible QIC mode. This mode makes it possible for you to use any type of QIC in the Juniper QFX5100-24Q. There are two restrictions of which you need to be mindful:

§ You cannot channelize any of the QICs.

§ You cannot channelize ports et-0/0/0 through et-0/0/3 on the Juniper QFX5100-24Q itself, but you can channelize ports et-0/0/4 through et-0/0/23.

Such restrictions create some interesting port configurations, which are presented in Table 3-2.

Native ports

QIC 0

QIC 1

Max 40GbE

Max 10GbE

24 40GbE

4 40GbE

4 40GbE

32 40GbE

80 10GbE
12 40GbE

24 40GbE

8 10GbE

4 40GbE

28 40GbE

88 10GbE
8 40GbE

24 40GbE

8 10GbE

8 10GbE

24 40GbE

96 10GbE
4 40GbE

Table 3-2. QFX5100-24Q flexible QIC mode port configuration options

In summary, with the flexible QIC mode, you can support all of the different types of QIC modules, which most commonly will be deployed as the 32 40GbE configuration when building a spine-and-leaf or Clos IP fabric. Although the overall number of ports can change depending on which QIC you use, it doesn’t require a warm reboot as long as you stay in the flexible QIC mode.

Review

The Juniper QFX5100-24Q offers a lot options with respect to port configurations. The general rule of thumb is that the overall number of ports must not exceed 104. There are a total of four system modes and each is unique in the way the switch operates. Table 3-3 summarizes the four system modes and their attributes.

Mode

I/O-to-core bandwidth ratio

QIC 0

QIC 1

Max 40GbE

Max 10GbE

Channelize native ports?

Channelize QICs?

Fully subscribed

1:1

No

No

24 40G

96 10GbE

Yes

No

104-port

13:12

Channelize first 2 40GbE

No

None

104 10GbEE

Yes

Channelize first 2 40GbE

QIC

4:3

4 40GbE

4 40GbE

32 40GbE

96 10GbE
8 40GbE

Yes

No

Flexible

4:3

4 40GbE

4 40GbE

32 40GbE

80 10GbE
12 40GbE

Yes

No

Flexible

4:3

8 10GbE

4 40GbE

28 40GbE

88 10GbE
8 40GbE

Yes

No

Flexible

7:6

8 10GbE

8 10GbE

24×40GbE

96 10GbE
4 40GbE

Yes

No

Table 3-3. The Juniper QFX5100-24Q system modes and attributes

It’s important to consider what role within the architecture the Juniper QFX5100-24Q fits. Depending on the system mode, it can fit into any number of possibilities. For example, in QIC mode, the Juniper QFX5100-24Q supports 32 40GbE interfaces, which makes a lot sense in the core and aggregation of a network. On the other hand, running the Juniper QFX5100-24Q in 104-port mode offers 104 10GbE interfaces in a 1RU form factor, which makes a lot of sense in the access tier of the network. The Juniper QFX5100 series has been designed from the ground up to give you more options.

Performance

With the critical design considerations out of the way, it’s now time to focus on the performance characteristics of the Juniper QFX5100 series. Previously in this chapter, we explored the BCM56850 chipset and how the I/O and core bandwidth work together in a balancing act of port density versus performance. Performance can be portrayed through two major measurements: throughput and latency. Let’s examine each of them.

Throughput

The throughput Juniper QFX5100 switches will vary depending on system mode in which the device is operating. The fully subscribed (default) mode has an over-subscription of 1:1 and doesn’t have any loss in traffic when all of the ports are operating at line rate. All of the other modes will have some level of I/O and core bandwidth over-subscription (refer to Table 3-3).

The key questions are the following:

§ What conditions cause over-subscription?

§ What packet sizes are affected?

§ How much traffic is dropped?

To over-subscribe the switch, it must be currently processing more traffic than the core bandwidth can handle, which is 960 Gbps. The best way to answer the rest of the questions is with the graph shown in Figure 3-4.

Figure 3-4. 1,280 Gbps throughput versus packet size

There’s a lot happening in the graph in Figure 3-4. It can be summarized as the following:

§ Packet sizes 64B through 86B vary in performance 78 to 99 percent.

§ Packet sizes 87B through 144B offer line-rate performance.

§ Packet sizes 145B through 193B vary in performance 77 to 99 percent.

§ Packet sizes 194B through 12,288B offer line-rate performance.

In summary, only packet sizes between 64B through 86B and 145B through 193B have varying traffic loss of 20 to 1 percent when there is congestion on the switch. Another way to view it is out of 12,228 possible packet sizes, only 0.005 percent suffer traffic loss. If you want to be pedantic and assume only 1,514 possible packet sizes, only 0.05 percent suffer traffic loss.

The reason the chipset is able to forward some packet sizes at line rate and not others is how the stepping in line-rate frequency is required to process some packet sizes versus others. Packet sizes ranging from 64B to 86B and 145B to 193B require a higher frequency to process than other sizes and are subject to a varying amount of traffic loss during switch congestion.

WARNING

Keep in mind that traffic loss is only experienced in system modes other than fully subscribed/default.

Latency

Latency is the measurement of time between when a packet enters the switch on an ingress port and when it leaves the switch on an egress port, as illustrated in Figure 3-5.

End-to-end switch latency

Figure 3-5. End-to-end switch latency

With modern hardware such as the Juniper QFX5100 series, the amount of latency continues to decrease. In the vast majority of use cases, latency isn’t a major concern; however, there exists a subsegment in the financial-services markets and high-performance computing that specialize in low latency.

Cut-through and store-and-forward

There are two modes that greatly impact the switch’s overall latency: cut-through and store-and-forward. Each mode is purposely designed to excel in specific use-cases.

Cut-Through

A switch that operates in a cut-through mode will begin to transmit the packet on the egress port at the same time it is receiving it on the ingress port. The benefit here is a reduction in overall latency within the switch because there’s no delay in transmitting the packet to its destination. The drawback is that cut-through mode has no way of discarding a corrupt packet, because the majority of the packet will already be transmitted on the egress port before the FCS is received on the ingress port. In larger networks or with multicast, cut-through mode can cause a lot of unnecessary processing in upstream devices when replicating corrupt packets.

Store-and-Forward

The default setting for the Juniper QFX5100 family is store-and-forward; this mode is how most switches have operated for a long time. The ingress packet must be fully received before the switch will transmit the packet on the egress port. The advantage is that the switch can perform error checks on the packet and discard it if it’s corrupt. The drawback is that store-and-forward requires a buffer within the switch to store the packet while it’s being received; this increases the cost and overall latency.

Unless you’re building a financial trading platform or high-performance computing environment, the default mode of store-and-forward will generally meet and exceed all of your latency requirements.

Conditions for cut-through

By default, the Juniper QFX5100 family operates in store-and-forward mode. To enable cut-through mode, you must issue and commit the following command:

[edit]

dhanks@QFX5100# set forwarding-options cut-through

Don’t be fooled: this command is just the first step to enable cut-through mode. There are many conditions that a packet must meet in order to be eligible for cut-through mode; otherwise, it defaults back to store-and-forward. This decision process is done on a per-packet basis, although the cut-through is a system-wide setting. The first set of requirements is that only matching ingress and egress interface speeds are eligible for cut-through mode, as presented in Table 3-4.

Ingress port

Egress port

Cut-through (CT) system mode

Store-and-forward (SF) system mode

10GbE

10GbE

CT

SF

40GbE

40GbE

CT

SF

10GbE

40GbE

SF

SF

40GbE

10GbE

SF

SF

1GbE

1GbE

CT

SF

1GbE

10GbE

SF

SF

10GbE

1GbE

SF

SF

Table 3-4. Forwarding modes based on port speed and system mode

For example, if the Juniper QFX5100 switch were configured to be in cut-through mode, but a packet arrived on a 40GbE ingress interface and was transmitted on a 10GbE egress interface, that packet would not be eligible for cut-through mode and would default back to store-and-forward.

If the packet meets the conditions specified in Table 3-4, it will be subject to additional conditions before being forwarded via cut-through.

§ The packet must not be destined to the routing engine.

§ The egress port must have an empty queue with no packets waiting to be transmitted.

§ The egress port must not have any shapers or rate limiting applied.

§ The ingress port must be in-profile if it’s subject to rate limiting.

§ For multicast packets, each egress port must meet all conditions. If one egress port out of the set doesn’t meet the conditions, all multicast packets will be transmitted via store-and-forward; the chipset doesn’t support partial cut-through packets.

To further understand the benefits of improved latency of cut-through mode, let’s compare it directly to store-and-forward with different sized packets up to 1,514 bytes, as illustrated in Figure 3-6.

The cut-through latency increases slowly from 64 bytes up to about 600 bytes and remains steady at about 0.73 µs. On the other hand, the store-and-forward is fairly linear from 64 bytes all the way to 1,514 bytes. In summary, cut-through and store-and-forward have less than 1 µs of latency when the packet is less than 1,514 bytes.

Let’s take a look at what happens when you enable jumbo frames. Figure 3-7 starts in the same place at 64 bytes but goes all the way up to 9,216 bytes.

Figure 3-6. Approximate latency for the BCM56850 chipset using 40GbE with frames up to 1,514 bytes

In summary, the store-and-forward continues to stay fairly linear from 64 bytes to 9,216 bytes; however cut-through flattens out at approximately 0.73 µs from 600 bytes to 9,216 bytes. Store-and-forward follows a linear progression simply because the latency is a factor of how large the packet is. The larger the packet, the more memory it takes to buffer it before it’s allowed to be transmitted. Cut-through mode stays flat because it simply begins transmitting the packet as soon as it’s received; thus the packet size is never a factor in the overall latency.

WARNING

These graphs represent approximate latency on the BCM56850 chipset using 40GbE interfaces. Actual values will vary based on firmware, port speed, and other factors. If latency is critical to your environment, you need to evaluate the latency in your lab under controlled conditions.

Figure 3-7. Approximate latency for the BCM56850 chipset using 40GbE with jumbo frames

Scale

Scale can be expressed many different ways. The most common methods are the configuration maximums of the control plane and data plane. It’s also common to peg the scaling maximums to the OSI model, for example Layer 2 versus Layer 3. The Juniper QFX5100 series is unique in the sense that you can adjust the balance of Layer 2 versus Layer 3 data plane scale. Let’s dive into the details.

Unified Forwarding Table

The Juniper QFX5100 series has the unique ability to use a customized forwarding table. The forwarding table is broken into three major tables:

MAC Address Table

In a Layer 2 environment, the switch will learn new MAC addresses and it stores them in the MAC address table.

Layer 3 Host Table

In a Layer 2 and Layer 3 environment, the switch will also learn which IP addresses are mapped to which MAC addresses; these key-value pairs are stored in the Layer 3 host table.

Longest Prefix Match (LPM) Table

In a Layer 3 environment, the switch will have a routing table, and the most specific route will have an entry in the forwarding table to associate a prefix/netmask to a next-hop; this is stored in the LPM table. The one caveat is that all IPv4 /32 prefixes and IPv6 /128 prefixes are stored in the Layer 3 host table.

Traditionally, these tables have been statically defined from the vendor and only support a fixed number of entries, which ultimately limits what role in the architecture into which a traditional switch can fit.

The Unified Forwarding Table (UFT) in the Juniper QFX5100 family allows you to dynamically move around forwarding table resources so that you can tailor the switch to your network. In summary, the UFT offers five preconfigured profiles from heavy Layer 2 to heavy Layer 3 allocations, as shown in Table 3-5.

Profile

MAC addresses

L3 hosts

LPM

l2-profile-one

288,000

16,000

16,000

l2-profile-two

224,000

56,000

16,000

l3-profile-three

160,000

88,000

16,000

l3-profile

96,000

120,000

16,000

lpm-profile

32,000

16,000

128,000

Table 3-5. The Juniper QFX5100 UFT profiles

The UFT is a very powerful tool that completely changes the personality of the switching, allowing it to move freely throughout the network architecture. Each profile has a linear progression toward a larger Layer 3 host table, as depicted in Figure 3-8.

Using a heavy MAC address table makes it possible for Juniper QFX5100 switches to handle a lot of Layer 2 traffic such as a traditional virtualization environment with servers hosting a large amount of VMs. The last profile gives you the ability to operate Juniper QFX5100 devices in the core of a network architecture or use them as a building block in a large Clos IP fabric; this is because an IP fabric by nature will have a larger routing table than MAC address tables.

Juniper QFX5100 series UFT

Figure 3-8. Juniper QFX5100 series UFT

To check the current forwarding mode the Juniper QFX5100 switch, use the show chassis forwarding-options command:

dhanks@qfx5100> show chassis forwarding-options

--------------------------------------------------------------------------

Current UFT Configuration:

l2-profile-three

You can see from the preceding output that this particular Juniper QFX5100 switch is currently in l2-profile-three mode, which gives the forwarding table 160K MAC addresses, 88K L3 hosts, and 16K LPM entries. The forwarding table can be changed by using the following command:

[edit]

dhanks@qfx5100# set chassis forwarding-options ?

Possible completions:

+ apply-groups Groups from which to inherit configuration data

+ apply-groups-except Don't inherit configuration data from these groups

l2-profile-one MAC: 288K L3-host: 16K LPM: 16K. This will restart PFE

l2-profile-three MAC: 160K L3-host: 144K LPM: 16K. This will restart PFE

l2-profile-two MAC: 224K L3-host: 80K LPM: 16K. This will restart PFE

l3-profile MAC: 96K L3-host: 208K LPM: 16K. This will restart PFE

lpm-profile MAC: 32K L3-host: 16K LPM: 128K. This will restart PFE

WARNING

Be mindful that when you change the UFT profile and commit, the BCM56850 chipset will need to perform a warm reboot, and there will be temporary traffic loss.

Hashing

The Juniper QFX5100 uses a sophisticated hashing algorithm called RTAG7 to determine the next-hop interface for Equal-Cost Multipath (ECMP) routing and Link Aggregation (LAG). Each packet is subject to the following fields when determining the next-hop interface:

§ Source MAC address

§ Destination MAC address

§ Ethernet type

§ VLAN ID

§ Source IP address

§ Destination IP address

§ IPv4 protocol or IPv6 next header

§ Layer 4 source port

§ Layer 4 destination port

§ MPLS label

There are also two additional fields that are used to calculate the hash that are internal to the system:

§ Source device ID

§ Source port ID

The following types of protocols are supported for ECMP on the Juniper QFX5100 as of Junos 13.2X51-D20.2:

§ IPv4

§ IPv6

§ MPLS

§ MAC-in-MAC

Note that additional protocols can be supported with a new software release; please check the release notes for Junos going forward.

NOTE

The hash algorithm for ECMP and LAG use the same packet fields as those just listed, but note that an internal hash index is calculated differently. This method avoids traffic polarization when a LAG bundle is part of an ECMP next-hop.

Resilient Hashing

One of the challenges in the data center when building IP fabrics with stateful devices—such as firewalls—is minimizing the number of next-hop changes during link failures. For example, the Juniper QFX5100 will perform standard RTAG7 hashing on all ingress flows and send out a next-hop as dictated by the hashing algorithm. If a firewall were to fail, the standard RTAG7 hashing algorithm on the QFX5100 switch would be impacted and the egress next-hop for new and existing flows would be assigned next-hops. The end result is that existing flows would be hashed to a new firewall. Because the new firewall doesn’t have a session entry for the rerouted flow, the firewall would simply discard the traffic, as shown in Figure 3-9.

Resilient hashing overview

Figure 3-9. Resilient hashing overview

The Juniper QFX5100 supports a new type of hashing called resilient hashing that minimizes the number of next-hop changes during link failures. If a firewall were to fail, the Juniper QFX5100 would keep the existing flows mapped to their existing egress next-hops. The end result is that when a firewall fails, all of the other flows continue to flow through their existing firewalls without impact.

The Juniper QFX5100 series also supports resilient hashing for a LAG interface, as well. In summary, resilient hashing supports both Layer 3 ECMP and LAG ECMP.

Layer 2 LAG

To enable resilient hashing for Layer 2 LAG members, use the following command (replace ae0 with the intended interface name for your environment):

# set interface ae0 aggregated-ether-options resilient-hash

Layer 3 ECMP

To enable resilient hashing for Layer 3 ECMP, use the following command:

# set forwarding-options enanced-hash-key ecmp-resilient hash

Configuration Maximums

The Juniper QFX5100 has a set of configuration maximums that you need to be aware of as you design your network. The Juniper QFX5100 should work just fine in the majority of use cases, but there could be instances for which you might need more scale. Use Table 3-6 as a reference.

Key

Value

MAC addresses

288 K (UFT l2-profile-one)

ARP entries

48 K

Jumbo frame size

9,216 bytes

IPv4 unicast routes

128 K prefixes, 208 K host routes

IPv4 multicast routes

104 K

IPv6 unicast routes

64 K

IPv6 multicast routes

52 K

VLAN IDs

4,094

FCoE VLANs

4,094

Link aggregation groups

128

Members per LAG

32

Firewall filters

4 K

ECMP

64

MSTP instances

64

VSTP instances

253

Mirroring destination ports

4

Mirroring sessions

4

Mirroring destination VLANs

4

Table 3-6. QFX5100 family configuration maximums

There will be some configuration maximums such as the UFT, MAC addresses, and others that are pinned to the BCM 56850 chipset and can never be increased. However there are other configuration maximums such as ECMP, link aggregation groups, and STP instances that you can increase over time with Junos software updates.

Summary

This chapter covered many of the design considerations that you must take into account before looking at the scale of each role in the architecture. These design considerations are using compute virtualization in the data center and an overlay architecture. Moving to an overlay architecture in the data center changes many of the traditional scaling requirements with which you are familiar.

The Juniper QFX5100-24Q has four different system modes to handle over-subscription to provide a customized personality depending on the use case. The system modes are:

§ Fully subscribed mode (default)

§ 104-port mode

§ QIC mode

§ Flexible QIC mode

Each of the system modes impact how the I/O and core bandwidth are handled, ultimately changing the throughput characteristics of the switch.

The Juniper QFX5100 chipset also has a next generation UFT with which you can choose one of five preconfigured profiles from Layer 2 heavy to Layer 3 heavy; this gives you the freedom to place the Juniper QFX5100 switch anywhere in your network and fine-tune the logical scale to match its role in the network.

Many factors impact the latency of a network switch. The Juniper QFX5100 family offers two forwarding modes: cut-through and store-and-forward. Cut-through gives you the lowest possible latency at the expense of forwarding corrupt frames. Store-and-forward has slightly higher latency, but completely buffers the packet and is able to discard corrupt packets.

In summary, the Juniper QFX5100 family gives you the power of options. When trying to solve complicated problems, the easiest method is to break it down into simple building blocks. The more options that are available to you, the greater your chances are of executing a successful data center strategy and architecture. Let’s review the options the Juniper QFX5100 series affords you to consider in this chapter:

§ Traditional IP network versus overlay architecture

§ VMware NSX versus Juniper Contrail

§ Four system modes to fine-tune the over-subscription in the data plane

§ Five profiles to fine-tune the logical scaling in the data plane

§ Cut-through mode versus store-and-forward mode

Juniper QFX5100 switches are very exciting, and, as of this writing, represent Juniper’s best switches ever created. As you work your way through this book, think about all of the different places in your network where the Juniper QFX5100 series of switches could be used and make it better.

Chapter Review Questions

1. Which overlay control plane protocols does the Juniper QFX5100 family support?

1. Open vSwitch Database

2. Device Management Interface

3. All of the above

4. None of the above

2. How does the Juniper QFX5100 series support bare-metal servers in an overlay architecture?

1. Forward all traffic from the bare-metal server to the SDN controller

2. Forward all traffic from the bare-metal server to the closest hypervisor VTEP

3. Handle all encapsulation and forwarding in the switch’s hardware

4. Implement a VTEP inside of the switch with a control plane protocol

3. What’s the core bandwidth of the BCM56850 chipset?

1. 1,280 Gbps

2. 960 Gbps

3. 720 Gbps

4. 480 Gbps

4. How many system modes does the Juniper QFX5100-24Q have?

1. 2

2. 3

3. 4

4. 5

5. What’s the I/O bandwidth to core bandwidth ratio of the Juniper QFX5100-24Q when using 32 40GbE interfaces?

1. 1:1

2. 13:12

3. 4:3

4. 5:4

6. How many preconfigured profiles are in the Juniper QFX5100 UFT?

1. 1

2. 3

3. 5

4. 7

7. What’s the maximum number of MAC addresses in a Juniper QFX5100 switch?

1. 128K

2. 224K

3. 256K

4. 288K

8. What’s the maximum size of an Ethernet frame in the Juniper QFX5100 series?

1. 2,048 bytes

2. 4,000 bytes

3. 8,192 bytes

4. 9,216 bytes

Chapter Review Answers

1. Answer: C.

Juniper QFX5100 series switches support both OVSDB and DMI control plane protocols.

2. Answer: C and D.

Trick question. The Juniper QFX5100 family handles the data plane encapsulation in hardware and creates a VTEP inside of the switch for MAC address learning and service provisioning.

3. Answer: B.

Juniper QFX5100 switches use the BCM56850 chipset, which has a core bandwidth of 960 Gbps and I/O bandwidth of 1,280 Gbps.

4. Answer: C.

The Juniper QFX5100-24Q has four system modes: (1) fully subscribed, (2) 104 port, (3) QIC, and (4) flexible QIC.

5. Answer: C.

32 40GbE interfaces requires 1,280 Gbps of I/O bandwidth, which creates a 4:3 ratio of I/O bandwidth to core bandwidth.

6. Answer: C.

Juniper QFX5100 switches support five UFT profiles: (1) l2-profile-one (2) l2-profile-two (3) l2-profile-three (4) l3-profile, and (5) lpm-profile.

7. Answer: D.

The Juniper QFX5100 family supports up to 288K MAC addresses in UFT l2-profile-one.

8. Answer: D.

The Juniper QFX5100 series supports jumbo frames up to 9,216 bytes.