16 Abstract 18 The Multi-Chassis Link Aggregation Group (MC-LAG) technology enables 19 establishing a logical link-aggregation connection with a redundant 20 group of independent nodes. The purpose of multi-chassis LAG is to 21 provide a solution to achieve higher network availability, while [nit] please remove the comma after "availability" 22 providing different modes of sharing/balancing of traffic. RFC7432 23 defines EVPN based MC-LAG with single-active and all-active [nit] s/EVPN based/EVPN-based 24 multi-homing load-balancing mode. The current draft expands on [nit] s/current draft/this document/g - applies to other references to "draft" 25 existing redundancy mechanisms supported by EVPN and introduces 26 support for port-active load-balancing mode. 85 1. Introduction 87 EVPN, as per [RFC7432], provides all-active per flow load-balancing [nit] s/per flow/per-flow/g 88 for multi-homing. It also defines single-active with service carving 89 mode, where one of the PEs, in redundancy relationship, is active per [nit] s/in redundancy/in a redundancy 90 service. 92 While these two multi-homing scenarios are most widely utilized in [minor] Would be good to give the reference to RFC7432? Suggestion: ... two multi-homing scenarios (speficied in [RFC7432) are ... 93 data center and service provider access networks, there are scenarios 94 where active-standby per interface multi-homing load-balancing is 95 useful and required. The main consideration for this mode of [minor] Suggestion: ... for this new mode of ... 96 load-balancing is the determinism of traffic forwarding through a 97 specific interface rather than statistical per flow load-balancing 98 across multiple PEs providing multi-homing. The determinism provided 99 by active-standby per interface is also required for certain QOS [minor] Suggestion: ... provided by this per-interface active-standby mode is also ... [nit] s/per interface/per-interface/g 100 features to work. While using this mode, customers also expect 101 minimized convergence during failures. [major] The terms "active-standby per-interface", "per-interface active-standby" and "port-active" are used through the document interchangeably. Is it possible to converge on one term that is used consistently? Perhaps define the term in this Sec 1 and then use just "port-active" through the rest of the document maybe? [minor] "minimized" sounds a bit odd. Did you mean "fast convergence" perhaps? 103 A new type of load-balancing mode, port-active load-balancing, is 104 defined. This draft describes how the new load-balancing mode can be 105 supported via EVPN. The new mode may also be referred to as per 106 interface active/standby. [minor] Text seems a bit fragmented. Suggestion: This document defines a new type of multi-homing mode called port-active load-balancing, and describes how this new mode can be supported via EVPN. [major] The new mode does provide multi-homing, but I am not sure that it provides load-balancing of traffic in the true sense. Can you please clarify what is meant by load-balancing? 108 +-----+ 109 | PE3 | 110 +-----+ 111 +-----------+ 112 | MPLS/IP | 113 | CORE | 114 +-----------+ 115 +-----+ +-----+ 116 | PE1 | | PE2 | 117 +-----+ +-----+ 118 | | 119 I1 I2 120 \ / 121 \ / 122 +---+ 123 |CE1| 124 +---+ 126 Figure 1: MC-LAG Topology 128 Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are [nit] s/a MC-LAG/an MC-LAG 129 part of the same redundancy group providing multi-homing to CE1 via 130 interfaces I1 and I2. Interfaces I1 and I2 are members of a LAG 131 running LACP protocol. The core, shown as IP or MPLS enabled, 132 provides wide range of L2 and L3 services. MC-LAG multi-homing [nit] s/provides wide/provides a wide 133 functionality is decoupled from those services in the core and it 134 focuses on providing multi-homing to the CE. With per-port active/ 135 standby load-balancing, only one of the two interface I1 or I2 would [nit] s/two interface/two interfaces 136 be in forwarding, the other interface will be in standby. This also [nit] s/forwarding, the/forwarding and the 137 implies that all services on the active interface are in active mode 138 and all services on the standby interface operate in standby mode. 140 1.1. Requirements Language 142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 144 "OPTIONAL" in this document are to be interpreted as described in BCP 145 14 [RFC2119] [RFC8174] when, and only when, they appear in all 146 capitals, as shown here. 148 2. Multi-Chassis Link Aggregation [minor] Is this mode only applicable for MC-LAG or other types of access as well? 150 When a CE is multi-homed to a set of PE nodes using the 151 [IEEE.802.1AX_2014] Link Aggregation Control Protocol (LACP), the PEs 152 must act as if they were a single LACP speaker for the Ethernet links 153 to form and operate as a Link Aggregation Group (LAG). To achieve 154 this, the PEs connected to the same multi-homed CE must synchronize 155 LACP configuration and operational data among them. Interchassis 156 Communication Protocol (ICCP) [RFC7275] has been used for that 157 purpose. EVPN LAG simplifies greatly that solution. Along with the 158 simplification come a few assumptions: [major] Are these assumptions or requirements/constraints? Please consider using normative language for such operational requirements as done in Sec 3. 160 * a CE device connected to multi-homing PEs may have a single LAG 161 with all its active links i.e. links in the LAG operate in all- 162 active load-balancing mode. [major] Why "may have"? Is it not a requirement that the CE considers all links to both the PEs as active and it is the PEs who would set the link down/out-of-sync on their side based on the EVPN signaling? 164 * Same LACP parameters MUST be configured on peering PEs such as 165 system id, port priority and port key. [nit] s/priority and/priority, and 167 Any discrepancies from this list are out of the scope of this [minor] If both the above are made normative MUST, then it is not really out of scope, right? The handling of mis-configurations/mis-wiring can be out of scope. 168 document, as are mis-configuration and mis-wiring detection across [nit] misconfiguration & miswiring 169 peering PEs. 171 3. Port-active Load-balancing Procedure 173 Following steps describe the proposed procedure with EVPN LAG to [nit] The following 174 support port-active load-balancing mode: 176 a. The Ethernet-Segment Identifier (ESI) MUST be assigned per access 177 interface as described in [RFC7432], which may be auto derived or [nit] auto-derived 178 manually assigned. Access interface MAY be a Layer-2 or Layer-3 [nit] The access 179 interface. The usage of ESI over Layer-3 interface is newly [nit] over a Layer-3 180 described in this document. 182 b. Ethernet-Segment (ES) MUST be configured in port-active 183 load-balancing mode on peering PEs for specific access interface. 185 c. Peering PEs MAY exchange only Ethernet-Segment (ES) route 186 (Route Type-4) when ESI is configured on a Layer-3 interface. 188 d. PEs in the redundancy group leverage the DF election defined in 189 [RFC8584] to determine which PE keeps the port in active mode and 190 which one(s) keep it in standby mode. While the DF election [nit] one keeps 191 defined in [RFC8584] is per [ES, Ethernet Tag] granularity, for 192 port-active mode of multi-homing, the DF election is done per [nit] the port-active 193 . The details of this algorithm are described in Section 4. 195 e. DF router MUST keep corresponding access interface in up and 196 forwarding active state for that Ethernet-Segment 198 f. Non-DF routers will by default implement a bidirectional blocking 199 scheme for all traffic in line with [RFC7432] Single-Active 200 blocking scheme, albeit across all VLANS. [nit] VLANs 202 * Non-DF routers MAY bring and keep peering access interface 203 attached to it in operational down state. [nit] an operational 205 * If the interface is running LACP protocol, then the non-DF PE 206 MAY also set the LACP state to OOS (Out of Sync) as opposed to 207 interface state down. This allows for better convergence on [nit] an interface down state 208 standby to active transition. 210 g. For EVPN-VPWS service, the usage of primary/backup bits of EVPN 211 Layer-2 attributes extended community [RFC8214] is highly 212 recommended to achieve better convergence. 214 4. Designated Forwarder Algorithm to Elect per Port-active PE 216 The ES routes, running in port-active load-balancing mode, are 217 advertised with the new Port Mode Load-Balancing capability in the DF 218 Election Extended Community defined in [RFC8584]. Moreover, the ES 219 associated to the port leverages existing procedure of Single-Active, [nit] associated with [nit] leverages the existing 220 and signals Single-Active Multihomed site redundancy mode along with 221 Ethernet-AD per-ES route (Section 7.5 of [RFC7432]). Finally the 222 ESI-label based split-horizon procedures in Section 8.3 of [RFC7432] [nit] ESI label-based 223 should be used to avoid transient echo'ed packets when Layer-2 224 circuits are involved. 226 The various algorithms for DF Election are discussed in Sections 4.2 227 to 4.5 for completeness, although the choice of algorithm in this [nit] completeness eventhough the choice of the algorithm 228 solution doesn't affect complexity or performance as in other load- 229 balancing modes. 231 4.1. Capability Flag 233 [RFC8584] defines a DF Election extended community, and a Bitmap 234 field to encode "capabilities" to use with the DF election algorithm 235 in the DF algorithm field. Bitmap (2 octets) is extended by the 236 following value: [major] The extension is only the P bit. The text gives a wrong impression that the D and AC-DF bits are also being extended by this document. Please consider changing this text to clarify that D and AC-DF bit are existing bits that are also used by this mode. 238 1 1 1 1 1 1 239 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 241 |D|A| |P| | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 Figure 2: Amended Bitmap field in the DF Election Extended Community 246 Bit 0: D bit or 'Don't Preempt' bit, as explained in 247 [I-D.ietf-bess-evpn-pref-df]. 249 Bit 1: AC-DF Capability (AC-Influenced DF election), as explained 250 in [RFC8584]. 252 Bit 5: (corresponds to Bit 29 of the DF Election Extended 253 Community and it is defined by this document): 'Port Mode [minor] Suggest to remove this "Bit 29" - I don't see similar counting of bits within the entire ExtComm being done anywhere. The "Bit 5" of the field is clear enough. 254 Load-Balancing' Capability (P bit hereafter), determines [nit] the use of quote seems odd 255 that the DF-Algorithm should be modified to consider the 256 port ES only and not the Ethernet Tags. [major] Seems odd to call this "port mode load-balancing" when there is no load-balancing? Wouldn't "port active mode multihoming" be more accurate? 258 4.2. Modulo-based Algorithm 260 The default DF Election algorithm, or modulus-based algorithm as in 261 [RFC7432] and updated by [RFC8584], is used here, at the granularity 262 of ES only. Given that ES-Import Route Target extended community may 263 be auto-derived and directly inherits its auto-derived value from ESI 264 bytes 1-6, many operators differentiate ESI primarily within these 265 bytes. As a result, bytes 3-6 are used to determine the designated 266 forwarder using Modulo-based DF assignment, achieving good entropy 267 during Modulo calculation across ESIs: 268 Assuming a redundancy group of N PE nodes, the PE with ordinal i is 269 the DF for an when (Es mod N) = i, where Es represents bytes 3-6 270 of that ESI. 272 4.3. HRW Algorithm 274 Highest Random Weight (HRW) algorithm defined in [RFC8584] MAY also 275 be used and signaled, and modified to operate at the granularity of 276 rather than per . 278 Section 3.2 of [RFC8584] describes computing a 32 bit CRC over the [nit] 32-bit 279 concatenation of Ethernet Tag and ESI. For port-active 280 load-balancing mode, the Ethernet Tag is simply removed from the CRC 281 computation. 283 DF(Es) denotes the DF and BDF(Es) denote the BDF for the ESI es; Si 284 is the IP address of PE i; and Weight is a function of Si, and Es. 286 1. DF(Es) = Si| Weight(Es, Si) >= Weight(Es, Sj), for all j. In the 287 case of a tie, choose the PE whose IP address is numerically the 288 least. Note that 0 <= i,j < number of PEs in the redundancy 289 group. 291 2. BDF(Es) = Sk| Weight(Es, Si) >= Weight(Es, Sk), and Weight(Es, 292 Sk) >= Weight(Es, Sj). In the case of a tie, choose the PE whose 293 IP address is numerically the least. 295 Where: 297 * DF(Es) is defined to be the address Si (index i) for which 298 Weight(Es, Si) is the highest; 0 <= i < N-1. 300 * BDF(Es) is defined as that PE with address Sk for which the 301 computed Weight is the next highest after the Weight of the DF. j 302 is the running index from 0 to N-1; i and k are selected values. 304 4.4. Preference-based DF Election 306 When the new capability 'Port-Mode' is signaled, the algorithm is 307 modified to consider the port only and not any associated Ethernet 308 Tags. Furthermore, the "port-based" capability MUST be compatible 309 with the "Don't Preempt" bit. When an interface recovers, a peering 310 PE signaling D-bit will enable non-revertive behaviour at the port [nit] behavior 311 level. 313 4.5. AC-Influenced DF Election 315 The AC-DF bit MUST be set to 0 when advertising Port Mode Load- 316 Balancing capability (P=1). When an AC (sub-interface) goes down, it 317 does not influence the DF election. The peer's Ethernet A-D per EVI 318 is ignored in all Port Mode DF Election algorthms. [nit] algorithms 320 Upon receiving AC-DF bit set (A=1) from a remote PE, it MUST be [nit] the AC-DF bit set 321 ignored when performing Port-Mode DF Election. 323 5. Convergence considerations 325 To improve the convergence, upon failure and recovery, when [nit] when the 326 port-active load-balancing mode is used, some advanced 327 synchronization between peering PEs may be required. Port-active is 328 challenging in a sense that the "standby" port is in down state. It [nit] in the sense [nit] in a down 329 takes some time to bring a "standby" port in up-state and settle the [nit] port to an up state 330 network. For IRB and L3 services, ARP / ND cache may be 331 synchronized. Moreover, associated VRF tables may also be 332 synchronized. For L2 services, MAC table synchronization may be 333 considered. 335 Finally, for members of a LAG running LACP the ability to set the 336 "standby" port in "out-of-sync" state a.k.a "warm-standby" can be 337 leveraged. 339 5.1. Primary / Backup per Ethernet-Segment 341 The EVPN Layer 2 Attributes Control Flags extended community SHOULD 342 be advertised in Ethernet A-D per ES route for fast convergence. 344 Only the P and B bits are relevant to this document, and only in the 345 context of Ethernet A-D per ES routes: [minor] Please consider providing references for the ExtComm and the bits on their first use. 347 * When advertised, the EVPN Layer 2 Attributes Control Flags 348 extended community SHALL have only P or B bits set and all other 349 bits and fields MUST be zero. 351 * A remote PE receiving the optional EVPN Layer 2 Attributes Control 352 Flags extended community in Ethernet A-D per ES routes SHALL 353 consider only P and B bits. [minor] In other words, the other bits are ignored and this is not considered an error/malformed, right? 355 For EVPN Layer 2 Attributes Control Flags extended community sent and 356 received in Ethernet A-D per EVI routes used in [RFC8214], [RFC7432] 357 and [I-D.ietf-bess-evpn-vpws-fxc]: 359 * P and B bits received are overridden by "parent" bits on Ethernet 360 A-D per ES above. 362 * Other fields and bits of the extended community are used according 363 to the procedures of those documents. 365 5.2. Backward Compatibility 367 Implementations that comply with [RFC7432] or [RFC8214] only (i.e., 368 implementations that predate this document) will not advertise the [nit] predate this specification 369 EVPN Layer 2 Attributes Control Flags extended community in Ethernet 370 A-D per ES routes. That means that all remote PEs in the ES will not 371 receive P and B bit per ES and will continue to receive and honour [major] Don't we need normative language to this effect in Sec 4 or 5 above? [nit] honor 372 the P and B bits received in Ethernet A-D per EVI route(s). 373 Similarly, an implementation that complies with [RFC7432] or 374 [RFC8214] only and that receives an EVPN Layer 2 Attributes Control 375 Flags extended community will ignore it and will continue to use the 376 default path resolution algorithm. [minor] The Sec Cons section touches upon this, but it would be good to describe here in brief the multi-homing/load-balancing mode that would result with some reference pointers. 378 6. Applicability [minor] Suggestion: Consider rolling in the first half of this section into the section 1 to give a better context to the reader and the 2nd half in section 2. 380 A common deployment is to provide L2 or L3 service on the PEs 381 providing multi-homing. The services could be any L2 EVPN such as 382 EVPN VPWS, EVPN [RFC7432], etc. L3 service could be in VPN context [nit] a VPN 383 [RFC4364] or in global routing context. When a PE provides first hop [nit] in a global 384 routing, EVPN IRB could also be deployed on the PEs. The mechanism 385 defined in this document is used between the PEs providing L2 and/or 386 L3 services, when per interface single-active load-balancing is 387 desired. 389 A possible alternate solution is the one described in this draft is 390 MC-LAG with ICCP [RFC7275] active-standby redundancy. However, ICCP 391 requires LDP to be enabled as a transport of ICCP messages. There 392 are many scenarios where LDP is not required e.g. deployments with 393 VXLAN or SRv6. The solution defined in this draft with EVPN does not 394 mandate the need to use LDP or ICCP and is independent of the 395 underlay encapsulation. 397 7. Overall Advantages [minor] Suggestion: Consider moving this text up front to give reader a better context on the benefits/reason for introduction of this mode. 399 The use of port-active multi-homing brings the following benefits to 400 EVPN networks: 402 a. Open standards based per interface single-active load-balancing [nit] standards-based 403 mechanism that eliminates the need to run ICCP and LDP (e.g. they [nit] e.g., 404 may be running VXLAN or SRv6 in the network). 406 b. Agnostic of underlay technology (MPLS, VXLAN, SRv6) and 407 associated services (L2, L3, Bridging, E-LINE, etc). 409 c. Provides a way to enable deterministic QOS over MC-LAG attachment 410 circuits. 412 d. Fully compliant with [RFC7432], does not require any new protocol 413 enhancement to existing EVPN RFCs. 415 e. Can leverage various DF election algorithms e.g. modulo, HRW, 416 etc. 418 f. Replaces legacy MC-LAG ICCP-based solution, and offers following [nit] the following 419 additional benefits: 421 * Efficiently supports 1+N redundancy mode (with EVPN using BGP 422 RR) where as ICCP requires full mesh of LDP sessions among PEs 423 in redundancy group. [nit] whereas [nit] requires a full [nit] in the redundancy 425 * Fast convergence with mass-withdraw is possible with EVPN, no 426 equivalent in ICCP. 428 8. IANA Considerations 430 This document solicits the allocation of the following values: [major] Please specify that this is from the "BGP Extended Communities" registry group 432 * Bit 5 in the [RFC8584] DF Election Capabilities registry, with 433 name "P" for Port Mode Load-Balancing. [minor] consider naming "P bit - XXX" so it is more descriptive. 435 9. Security Considerations 437 The same Security Considerations described in [RFC7432] and [RFC8584] 438 are valid for this document. 440 By introducing a new capability, a new requirement for unanimity (or 441 lack thereof) between PEs is added. Without consensus on the new DF 442 election procedures and Port Mode, the DF election algorithm falls 443 back to the default DF election as provided in [RFC8584] and 444 [RFC7432]. This behavior could be exploited by an attacker that 445 manages to modify the configuration of one PE in the ES so that the 446 DF election algorithm and capabilities in all the PEs in the ES fall 447 back to the default DF election. If that is the case, the PEs will 448 be exposed to the same unfair load balancing, service disruption, and 449 possibly black-holing or duplicate traffic mentioned in those 450 documents and their security sections. [minor] If we are talking about attackers modifying configs, then would they not do more harm by making the configs on the dual-home PEs to be not consistent? Without detection mechanism, the service impact may be far greater in this case? 452 10. Acknowledgements 509 12.2. Informative References 511 [I-D.ietf-bess-evpn-vpws-fxc] 512 Sajassi, A., Brissette, P., Uttaro, J., Drake, J., 513 Boutros, S., and J. Rabadan, "EVPN VPWS Flexible Cross- 514 Connect Service", Work in Progress, Internet-Draft, draft- 515 ietf-bess-evpn-vpws-fxc-05, 8 February 2022, 516 . 519 [IEEE.802.1AX_2014] 520 IEEE, "IEEE Standard for Local and metropolitan area 521 networks -- Link Aggregation", IEEE 802.1AX-2014, 522 DOI 10.1109/IEEESTD.2014.7055197, 24 December 2014, 523 . [major] Should the reference to MC-LAG not be normative since the document talks about setting port in "out-of-sync" state? 526 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 527 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 528 2006, .