The layer 2 network technology is extending beyond its traditional local area implementation and finding wider acceptance in provider’s metropolitan area networks and large-scale cloud data center networks. This is mainly due to its plug-and-play capability and native mobility support. Many efforts have been put to increase the bisection bandwidth in a layer 2 network, which has been constrained by the spanning tree protocol that a layer 2 network uses for preventing looping. The recent trend is to incorporate layer 3’s routing approach into a layer 2 network so that multiple paths can be used for forwarding traffic between any source-destination (S-D) node pair. ECMP (equal cost multipath) is one such example. However, ECMP may still be limited in generating multiple paths due to its shortest path (lowest cost) requirement. In this paper, we consider a non-shortest-path routing approach, called EPMP (Equal Preference Multi-Path) that can generate more paths than ECMP. The EPMP is based on the ordered semi-group algebra. In the EPMP routing, paths that differ in traditionally-defined costs, such as hops, bandwidth, etc., can be made equally preferred and thus become candidate paths. We found that, in comparison with ECMP, EPMP routing not only generates more paths, provides higher bisection bandwidth, but also allows bottleneck links in a hierarchical network to be identified when different traffic patterns are applied. EPMP is also versatile in that it can use various ways of path preference calculations to control the number and the length of paths, making it flexible (like policy-based routing) but also objective (like shortest path first routing) in calculating preferred paths.
A layer 2 network by definition is a network where its protocol data units (or frames) can be transported from source to destination by using a data-link layer (the second layer in the Open System Interconnection (OSI) layering model) protocol, without the need of a network layer (the third layer in the OSI model) protocol. Ethernet has been the most popular layer 2 network technology, found not only in LANs (Local Area Networks), like campus networks and office networks, but also as the interconnection technology for metropolitan area networks. Ethernet’s popularity comes from its “plug and play” capability, requiring minimum amount of configuration. In addition, the not-so-sophisticated transport functions provided by Ethernet enable it to be a com- moditized technology with low cost, making it the switching product of choice. However, as Ethernet keeps on growing, with more end hosts from different organizations/ tenants in a network, it becomes necessary to improve its scalability and traffic isolation capabilities.
IEEE 802.1Q [
Although these standards have increased Ethernet scalability, they still rely on the traditional network wide controlled flooding of frames for host discovery. The Spanning Tree Protocol (STP) and its subsequent variants (e.g., RSTP, MSTP) have long been used in preventing flooding loops in Ethernet networks. However, STP has also been known for several disadvantages. For example, it disables some links for preventing loops, thus does not allow source-destination traffic to take multiple paths; with the tree structure, its paths are inefficient for paths that do not end in the root node; and it converges slowly to a new solution if a node or link fails. Such characteristics are rather undesirable for provider networks or data center networks, where both ample bisection bandwidth and fast recovery are critical in their operations.
The routing operation which a layer 3 datagram forwarding relies on then comes to rescue. Layer 3 routers run routing protocols and exchange topological information with neighbour routers. A routing protocol has in itself a routing algorithm which calculates a suitable path based on the collected topological information and stores the result in the routing table (or forwarding table). A layer 3 datagram is delivered to its destination based on the routing table. The well-known Dijkstra shortest path first (SPF) algorithm in the link state routing protocols (OSPF [
In this paper, we concentrate on the multipath routing in layer 2 networks. SPB, TRILL, and most existing data center network routing assume using ECMP for multi- path frame/packet forwarding. However, ECMP routing only allows the use of paths with the same cost and this cost is usually expressed by a metric value (e.g., hops). In many network topologies, SPF-based ECMP does not necessarily provide enough path diversity. Non-SPF routing can open up a new horizon where bisection bandwidth can be increased with a smaller cost and alternative topology designs can be explored for different applications or traffic patterns. In addition, given a set of multipaths to use, how these paths are allocated to different clients/services are still an open problem. In SPB, there is a set of ECT-algorithms [
Non-SPF path discovery and selection of feasible paths in a layer 2 network will be the main subjects of this paper. Our study starts with an ordered semi-group algebra along with simple destination-based hop-by-hop forwarding. In a non-SPF routing algorithm, paths are selected based on an attribute value that grades their preference. Paths with equal preference values might be different in the number of hops, band- width, etc. thus resulting in non-equal-cost multipaths. Paths values are calculated from link attributes that can be seen as policies characterizing the forwarding behaviour. Amaral et. al. proposed a multipath policy based routing in [
Non-SPF policy based routing has long been used in inter-AS routing protocols (BGP). Its concept or model has not been used in calculating multipaths until the past two or three years. There are still works to be done in policy based multipath routing model such as the trade-offs between the flexibility of the model, the amount of multiple paths that can be used simultaneously, and the network restrictions that must be applied. This paper would investigate a non-SPF routing algorithm that brings together the flexibility in policy based routing and the objectiveness in shortest path first routing.
The rest of this paper is organized as follow. In Section 2, the routing algebra funda- mental and link/path attribute assignment are introduced. Based on the preference algebra, Section 3 defines the equal preference multi-path (EPMP) algorithm. Two different flavors of EPMP, EPMP-NH and EPMP-ES are explained. We use two hierar- chical network topologies to evaluate the EPMP routing algorithms in comparison with the ECMP algorithm in Section 5. Path numbers and bisection bandwidth are the main performance measures. Finally, the conclusions are given in Section 13.
Routing in a computer network to most people means finding a shortest path. Indeed, the shortest path routing is used in many routing protocols, e.g., OSPF, RIP, and IS-IS. The term shortest path means a path with the lowest cost, where cost is often a measurable numerical metric like: hop count, path bandwidth, path reliability, latency, and others metrics. Solutions for the shortest path problem are well known [
The Internet is a huge interconnection of autonomous systems (AS) or domains, each one independently administered. We see shortest path routing being run within each domain, so called intra-domain routing. For inter-domain routing, each AS might have its own view of which path should be used, and the path is selected based on policies. Unlike a simple numerical metric, these policies reflect a wider set of characteristics with semantically rich concepts, defining the nature of the paths and their relative preference. Given these characteristics policy routing provides great flexibility in defining route preferences [
Since policy characteristic is not necessary a numerical value, the term best path is used instead of the shortest path. With best path, preference of the paths depends on characteristics such as relationships with other nodes, defining in this way a hierarchical order amongst the paths. However current policy routing models cannot take full advantage of the multiplicity of connections to a given destination and are single path in nature [
With policy routing two paths that are very different according to traditional numeric metrics can have the same policy characteristics and therefore have the same preference and are considered equally good. Multipaths become more available for policy based routing. Single-path routing protocols have a critical interval after a failure until the algorithm converges to a new solution. On the contrary, in the multipath case in the event of a failure, equally good alternative paths might still be available, therefore reducing the importance of the re-convergence process. Having multiple paths also means that traffic engineering can be achieved by carefully-designed distribution of traffic among those paths, instead of having to play with network metrics to obtain the desired result via routing state manipulations [
Let a network be represented by a directed graph
When preference is used as the path attribute, the determination of path attributes can be modelled by a preference algebra. In the preference algebra, a set
As a contrast, the shortest path algorithm can be modelled by (a) setting
Consider metropolitan area provider networks or data center networks. There typically exists a hierarchical structure in these networks. For metropolitan area networks, there are at least two levels. The outer/lower level has access switches which end hosts are connected to and the inner/upper level where core switches form the backbone network. For data center networks, the fat-tree network has at least three levels (or stages). The lowest level is the edge level, the middle level is the aggregation level, and the upper level is the core level.
Given such hierarchical networks, it is natural to define three types of attributes in the set
• D: for links/edges in the downward direction of the hierarchy,
• U: for links/edges in the upward direction of the hierarchy,
• S: for links/edges that connects switches at the same level of the hierarchy.
Two addition attributes are needed to describe trivial paths and invalid/nonexistent paths.
•
•
Therefore, we have
As to the path attribute resulting from the
meaning downward path is preferred over same level path, and same level path is preferred over upward path.
With the above rules, there however still exists a looping scenario. If the two opposite links connecting two same-level nodes are both labelled S as shown in
One solution is to distinguish the two opposite same-level links as R and L, with R being preferred over L. The revised routing algebra is now:
with the composition operation shown in
1 | D | S | U | 0 | |
---|---|---|---|---|---|
1 | 1 | D | S | U | 0 |
D | D | D | 0 | 0 | 0 |
S | S | S | S | 0 | 0 |
U | U | U | U | U | 0 |
0 | 0 | 0 | 0 | 0 | 0 |
When opposite-direction links are distinguished with R and L, loops can be avoided. With the new labelling as shown in
There is another way of preventing loops from forming on same-level links. A same-level link attribute S1 (same-level once) can be used for it to not co-exist with other same-level links (including R and L) on the same path. Attribute S1 can also be used to prevent same level sub-path to extend too long. We will see its use later.
Equal preference multi-path calculation based on the above preference algebra can then be executed as follows [
where
1 | D | R | L | U | 0 | |
---|---|---|---|---|---|---|
1 | 1 | D | R | L | U | 0 |
D | D | D | 0 | 0 | 0 | 0 |
R | R | R | R | 0 | 0 | 0 |
L | L | L | L | 0 | 0 | |
U | U | U | U | U | U | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
where
until it converges, we obtain attributes of all preferred paths from any source node to any destination node. During the process of matrix multiplication, it is also necessary to build the next-hop matrix that records all next-hop nodes to be used in constructing the preferred path. The routing can then be implemented as hop-by-hop routing by each node looking up its next-hop matrix for forwarding a packet (or frame). We call this method EPMP-NH (EPMP Next Hop).
As an example, consider a 5-node topology as shown in
The matrix multiplication converges at
We make two observations: 1. EPMP-NH finds a total of 32 paths while ECMP finds a total of 24 paths. Among them, 21 paths are found by both. 2. There are three valid paths that EPMP-NH fails to find. They are path 3-0-1-2, path 3-2-1-0, and path
4-0-1-2. To explain why these three paths are not found by EPMP-NH, we take the path 3-0-1-2 for example. From the next-hop matrix in (2), we know that for node 3 to reach node 2, node 3 can take nodes 0, 1, 2 as the next hop. For node 0 to reach node 2, node 0 can only take node 2 as the next hop. Therefore the only path from node 3 to node 2 via node 0 is path 3-0-2, which has a path attribute of U. Path 3-0-1-2 is equally good because it also has path attribute of U. Why it was not found is because path 0-2 (attribute R) is preferred over path 0-1-2 (attribute U) during the first matrix multi- plication. This example shows that EPMP-NH may not find all valid paths because certain information is overridden during the path composition process.
It is possible to find all valid EPMP paths by using a brute-force approach. By starting from the destination node and branching out backward towards the source nodes via adjacent links, it allows for all possible paths to be examined. In appending an adjacent link to a sub-path using the algebra such as defined in
Note that with EPMP-ES, next-hop routing is not applicable. Instead, path routing should be used. SPB allows for path routing by utilizing the PATH_ID to B-VID mapping. The PATH_ID is a concatenation of IDs of switches along the path. An ECT- algorithm [
In contrast to single best path (e.g., SPF) routing, where the forwarding decision is unambiguous, multipath routing needs to be concerned with which path is chosen for which packets/frames. To avoid out-of-order packet handling at the receiving end, that packets of the same flow are assigned the same path is commonly adopted. However, there is still the issue of assigning paths to different flows. For ECMP, with which hop-by-hop packet forwarding is the norm, [
For EPMP-ES, since it is to be used in path based routing, routing decision is made at the source switch. The methods for next-hop forwarding (Round-Robin, Modulo-N Hash) can still be adopted for load balancing. However, different flows generate different traffic volumes. Different paths may also have overlapping links. Round- Robin or Modulo-N Hash cannot guarantee traffic will be evenly distributed over all paths/links. Instead, if the source switch can have some information about the current loading of candidate paths, it can choose the least loaded path for a newly arrived flow.
There are several studies on how link loading can be measured. For example, each switch continuously accumulates the byte count of packets that it has forwarded as a basis for calculating link loading [
We consider a distributed architecture where no centralized controller is available. When a switch in the core network detects that one of its link is congested (say, near 90 percent in utilization), it will broadcast the link congestion information to all other switches in the network. All switches that receive this congestion notification can then invalidate the paths that contain the congested link. When the link becomes uncon- gested, (say, below 80 percent in utilization), another notification message will be broadcast so that previously affected paths can be restored to be valid.
The congestion notification can be signalled via IS-IS extension [
We next use two topology examples to evaluate the Equal Preference Multi-Path routing as described in the previous section, in particular in comparison with ECMP.
The first network topology example is a frequently referenced constellation topology in Layer 2 network discussions. As shown in
The second network topology example is meant to simulate a small-scale data center network topology with symmetric structure. As shown in
For the constellation topology, EPMP-NH finds 2592 paths while ECMP finds 1760 paths. There are however a total of 3840 valid paths available for EPMP, which are all found by EPMP-ES. Out of the 240 source-destination (S-D) pairs, 64 pairs have only single path by ECMP, while all S-D pairs have at least 2 paths by EPMP-NH and EPMP-ES. For EPMP-ES, every S-D pair has at least 12 paths. For the cluster topology, EPMP-NH finds 16,704 paths while ECMP finds 9900 paths. There are however a total of 49,248 valid paths available for EPMP, which are found by EPMP-ES. Out of the 1260 S-D pairs, 648 pairs have only single path by ECMP, while all S-D pairs have at least 2 paths by EPMP-NH and EPMP-ES. For EPMP-ES, every S-D pair has at least 12 paths.
The SPF based ECMP not only produces less paths, it also does not guarantee multiple paths between all S-D pairs.
1 | D | R | L | U | 0 | |
---|---|---|---|---|---|---|
1 | 1 | D | R | L | U | 0 |
D | D | D | 0 | 0 | 0 | 0 |
R | R | R | R | 0 | 0 | 0 |
L | L | L | L | 0 | 0 | |
U | U | U | U | U | U | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
Throughput is usually the most important performance measure of a network. Network topology, routing algorithm, link capacity, etc. all influence the network throughput. In this section, we use the bisection bandwidth [
First, we assume that two hosts are connected to each edge switch. A fixed number of flows are generated at each source hosts and terminated at some destination hosts. Various flow patterns are considered [
・ Stride (i): flows from host x are destined to host
・ Random: flows from host x are destined randomly to one of other
・ Hotspot(n): flows from host x are destined to one of n hosts, where
To evaluate the bisection bandwidth, we note that the concept of a non-blocking network needs to be clarified. In the circuit switching fabric (or interconnection network) design, a non-blocking fabric refers to the one that owns the capability of always finding a path connecting an idle input port and an idle output port. Cross-bar and Clos network [
The demand estimation in [
The flow demand matrix will go through the next round of modification by considering the incoming bandwidth contention at the destination hosts. The bandwidth demands on the receiving end of host 0 and host 1 both exceed the receiving interface capacity, one unit. Three flows compete for the bandwidth in both cases. Therefore flows that demand more than one third of the bandwidth will be shaped so that their demands are reduced to one third unit, again based on the TCP fairness mechanism. The results are shown with boldface in the flow matrix on the right of
The flow demand matrix then goes through a new round of modification by considering the outgoing bandwidth contention at the source hosts. For this example, since the flow from host 2 to host 0 is constrained by host 0's interface capacity, and the two flows from host 2 do not use up its outgoing interface capacity, the flow from host 2 to host 3 can then increase its demand to two-third unit, as shown in
The flow demand matrix iteratively checks the constraints set by the source host interface and the destination host interface until no more demand modification occurs. The converged flow demands are then used to test the bisection bandwidth of the network.
From
EPMP-ES (least-loaded) always results in higher bisection bandwidth than the other three. This is because it uses the longer paths with discretion. In certain traffic patterns, it achieves the same result as that of the non-blocking network. For Stride (4) and Stride (8), there is a bigger gap between them. This is mainly due to the intrinsic weak- ness in the 36-node constellation topology, where it does not have adequate links/paths for Stride (4) and Stride (8) traffic patterns. The Stride (i) traffic patterns in combination with EPMP-ES (Least Loaded) routing method serve as a useful tool to identify the bottleneck links in network topology design.
For hotspot traffic patterns, the differences in bisection bandwidth between different routing algorithms are smaller. This is because flow demands have been shaped to lower values due to traffic contention. The networks have relatively more bandwidths to carry the flow demands, making the routing algorithm a less significant factor.
With the random traffic pattern, the average bisection bandwidth for EPMP-ES (least loaded path routing), EPMP-NH, ECMP are 43.87, 38.03, and 37.88, respectively. EPMP- ES can achieve 15 percent more bisection bandwidth than ECMP.
EPMP in general should generate more paths than ECMP, but not always. Consider a 4-node network topology with 2 levels as shown in
In the first iteration of matrix multiplication, we obtain
We make two observations: 1. There is no valid path between node 2 and node 3 based on EPMP. 2. EPMP-NH finds a total of 13 paths while ECMP finds a total of 14 paths. They both find the following 10 paths: 1-0, 2-0, 3-0, 0-1, 2-1, 3-1, 0-2, 1-2, 0-3, and 1-3. EPMP-NH allows additional 3 paths: path 1-2-0, path 0-2-1, and 1-0-2; while ECMP finds additional 4 paths: path 3-0-2, path 3-1-2, path 2-0-3, and path 2-1-3.
What is special about this example is that node 2 needs to rely on other same-level nodes (nodes 0 and 1) to reach a higher level node 3. In addition, the paths in consideration can end in different levels, unlike what we have been assuming in the constellation and cluster networks where paths end on the lowest level (edge switches). This is to illustrate a potential deficiency in EPMP if paths in consideration do not terminate on edge switches.
We have investigated the Equal Path Multi-Path routing algorithm, which is a non-SPF routing algorithm based on ordered semi-group preference algebra. We showed that its use in hierarchical networks, like the data center networks, provides several benefits. EPMP can provide higher throughput (bisection bandwidth) than ECMP because it allows more paths to be used. By comparing EPMP with ECMP, we showed that EPMP not only provides more paths (up to 2 times for the constellation topology and 5 times for the cluster topology) to increase network bisection bandwidth (by 10 percent on average for the constellation topology and 15 percent on average for the cluster topology), allows a variety of policies to be exercised (by manipulating how same-level links are used), but also can be used to identify bottleneck links in the network topology as different traffic patterns are applied.
However, the use of EPMP needs certain caution. For example, EPMP may not find any path for an S-D pair when paths are allowed to terminate at non-edge switches in a hierarchical network. In addition, the original EPMP algorithm (EPMP-NH, which was called multipath policy-based routing in [
For future work, the flexibility of EPMP can be further investigated. In a hierarchical network, the labelling of same-level links (R, L, S1) and the order of their preference in the composition operation can create a lot of routing policy possibilities for the network administrator to maximize the network bisection bandwidth. In addition, EPMP can be a helpful tool to network topology design in identifying weak links in the existing data center networks and metropolitan networks.
This work was supported by grants from MOST (Most 105-2221-E-194-022, MOST 104-3115-E-194-001, MOST 104-2218-E-194-008), Taiwan.
Hou, T.-C. and Tsai, H.-C. (2016) Equal Preference Multi-Path Routing for L2 Hierarchical Networks. Journal of Computer and Communications, 4, 37-56. http://dx.doi.org/10.4236/jcc.2016.414004