Hello, I have been selected as the Routing Directorate reviewer for this draft. The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see: http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Early Review/Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft. Document: draft-ietf-bess-evpn-per-mcast-flow-df-election-09 Reviewer: Acee Lindem Review Date: 07/20/2023 IETF LC End Date: N/A Intended Status: Standards Track Summary: This document describes a per-multicast-flow DF election mechanism which support per multicast flow load-balancing of the EVPN ES forwarding amongst PEs in a redundancy group. While the document describes a fairly straightforward function, it really needs some editing and never should have been adopted as a WG document in this condition. Consequently, I have entered a “Not Ready” disposition for the review. Major Issues: 1. Precisely explain the hashing algorithm in section 4.1 and 4.2. As written, they subject to multiple interpretations. Provide a reference to CRC_32() and expand acronyms on first use (e.g., MSB). 2. In addition to explaining the hashing algorithm, the document should provide a discussion on why this hashing algorithm provides a good distribiution of flows. 3. While this is a minor comment, it also pertains to the hashing algorithm. To better distribute the flows, why not exclude the current BUM DF from the list of PEs from which to choose a per-flow DF?? Minor Issues: 1. Acronyms from RFC 7432 and RFC 8584 used without first expansion. For example, none of the acronyms in the figures are defined. I'd suggest adding a glossary with terms from other documents. 2. The acronym "Es" is used for Ethernet Segment when ES is used in other EVPN documents. 3. Missing articles make the text unwieldy to read. 4. Multiple problems with agreement of subject and verb. 5. Define what is referred to by DFn. Presumably, this is the selected PE in the redundancy group. 5. Number 5 in section 5 doesn't make sense as written. I was trying to fix it but it needs attention from the author. 6. The abstract cannot include RFC references from the draft. However, the RFCs may be referenced without the braces. 7. The security considerations for RFC 8584 are also applicable. Additionally, you that you are going to be asked for a discussion of how the existing security mechanisms apply to per-flow DF selection, so you might as well provide it now. Nits: See diff below. Thanks, Acee *** draft-ietf-bess-evpn-per-mcast-flow-df-election-09.txt.orig Wed Jul 19 11:43:37 2023 --- draft-ietf-bess-evpn-per-mcast-flow-df-election-09.txt Wed Jul 19 12:21:57 2023 *************** *** 18,33 **** Abstract ! [RFC7432] describes mechanism to elect designated forwarder (DF) at the granularity of (ESI, EVI) which is per VLAN (or per group of VLANs in case of VLAN bundle or VLAN-aware bundle service). However, the current level of granularity of per-VLAN is not adequate for some ! applications.[RFC8584] improves base line DF election by introducing ! HRW DF election. [RFC9251] introduces applicability of EVPN to ! Multicast flows, routes to sync them and a default DF election. This ! document is an extension to HRW base draft [RFC8584] and further enhances HRW algorithm for the Multicast flows to do DF election at ! the granularity of (ESI, VLAN, Mcast flow). Status of This Memo --- 18,33 ---- Abstract ! RFC7432 describes mechanism to elect a designated forwarder (DF) at the granularity of (ESI, EVI) which is per VLAN (or per group of VLANs in case of VLAN bundle or VLAN-aware bundle service). However, the current level of granularity of per-VLAN is not adequate for some ! applications. RFC8584 improves the base DF election by introducing ! Highest Random Weigth (HRW) DF election. RFC9251 introduces applicability of EVPN to ! Multicast flows, routes to sync them, and a default DF election. This ! document is an extension to HRW base draft and further enhances HRW algorithm for the Multicast flows to do DF election at ! the granularity of (ESI, VLAN, Multicast flow). Status of This Memo *************** *** 91,104 **** deployments as well as service provider access/aggregation networks. [RFC7432] defines the role of a designated forwarder as the node in the redundancy group that is responsible to forward Broadcast, ! Unknown unicast, Multicast (BUM) traffic on that Ethernet Segment (CE ! device or network) in All-Active multi-homing. The default DF election mechanism allows selecting a DF at the granularity of (ES, VLAN) or (ES, VLAN bundle) for BUM traffic. ! While [RFC8584] improve on the default DF election procedure, some service provider residential applications require a finer ! granularity, where whole multicast flows are delivered on a single VLAN. --- 91,104 ---- deployments as well as service provider access/aggregation networks. [RFC7432] defines the role of a designated forwarder as the node in the redundancy group that is responsible to forward Broadcast, ! Unknown unicast, Multicast (BUM) traffic on that Ethernet Segment ! (Customer Edge (CE) device or network) in All-Active multi-homing. The default DF election mechanism allows selecting a DF at the granularity of (ES, VLAN) or (ES, VLAN bundle) for BUM traffic. ! While [RFC8584] improves on the default DF election procedure, some service provider residential applications require a finer ! granularity, where specific multicast flows are delivered on a single VLAN. *************** *** 154,161 **** Consider the above topology, which shows a typical residential deployment scenario, where multiple receivers are behind an all- ! active multihoming segments. All of the multicast traffic is ! provisioned on EVI-1. Assume PE-2 get elected as DF. According to [RFC7432], PE-2 will be responsible for forwarding multicast traffic to that Ethernet segment. --- 154,161 ---- Consider the above topology, which shows a typical residential deployment scenario, where multiple receivers are behind an all- ! active multihoming segment. All of the multicast traffic is ! provisioned on EVI-1. Assume PE-2 gets elected as DF. According to [RFC7432], PE-2 will be responsible for forwarding multicast traffic to that Ethernet segment. *************** *** 172,194 **** * Forcing sole data plane forwarding responsibility on PE-2 is a limitation in the current DF election mechanism. The topology at ! Figure 1 would always have only one of the PE to be elected as DF ! irrespective of which current DF election mechanism is in use ! defined in [RFC7432] or [RFC8584]. * The problem may also manifest itself in a different way. For example, AC1 happens to use 80% of its available bandwidth to forward unicast data. And now there is need to serve multicast ! receivers where it would require more than 20% of AC1 bandwidth. In this case, AC1 becomes oversubscribed and multicast traffic drop would be observed even though there is already another link ! (AC2) present in network which can be used more efficiently load balance the multicast traffic. ! In this document, we propose an extension to the HRW base draft to allow DF election at the granularity of (ESI, VLAN, Mcast flow) which ! would allow multicast flows to be better distributed among redundancy ! group PEs to share the load. 2. Terminology --- 172,194 ---- * Forcing sole data plane forwarding responsibility on PE-2 is a limitation in the current DF election mechanism. The topology at ! Figure 1 would always have only one of the PEs elected as DF ! irrespective of which DF election mechanism (defined in [RFC7432] ! or [RFC8584]) is in use. * The problem may also manifest itself in a different way. For example, AC1 happens to use 80% of its available bandwidth to forward unicast data. And now there is need to serve multicast ! receivers where it would require more than 20% of AC1's bandwidth. In this case, AC1 becomes oversubscribed and multicast traffic drop would be observed even though there is already another link ! (AC2) present in network which can be used more efficiently to load balance the multicast traffic. ! In this document, we define an extension to the HRW base [RFC8584] to allow DF election at the granularity of (ESI, VLAN, Mcast flow) which ! would allow multicast flows to be better distributed among PEs in a ! redundancy group to share the load. 2. Terminology *************** *** 201,212 **** 3. The DF Election Extended Community ! [RFC8584] defines an extended community, which would be used for PEs in redundancy group to reach a consensus as to which DF election procedure is desired. A PE can notify other participating PEs in ! redundancy group about its willingness to support Per multicast flow base DF election capability by signaling a DF election extended ! community along with Ethernet-Segment Route (Type-4). The current proposal extends the existing extended community defined in [RFC8584]. This draft defines new a DF type. --- 201,212 ---- 3. The DF Election Extended Community ! [RFC8584] defines an extended community, which is used by PEs in redundancy group to reach a consensus as to which DF election procedure is desired. A PE can notify other participating PEs in ! the redundancy group as to its willingness to support per-multicast-flow base DF election capability by signaling a DF election extended ! community along with an Ethernet-Segment Route (Type-4). The current proposal extends the existing extended community defined in [RFC8584]. This draft defines new a DF type. *************** *** 229,254 **** - Type 5: HRW base per (*,G) multicast flow DF election (explained in this document) ! * The [RFC8584] describes encoding of capabilities associated to the ! DF election algorithm using Bitmap field. When these capabilities bits are set along with the DF type-4 and type-5, they need to be ! interpreted in context of this new DF type-4 and type-5. For example, consider a scenario where all PEs in the same redundancy ! group (same ES) can support both AC-DF, DF type-4 and DF type-5 and receive such indications from the other PEs in the ES. In this scenario, if a VLAN is not active in a PE, then the DF election procedure on all PEs in the ES should factor that in and ! exclude that PE in the DF election per multicast flow. ! * A PE SHOULD attach the DF election Extended Community to ES route ! and Extended Community MUST be sent if the ES is locally ! configured for DF type Per Multicast flow DF election. Only one ! DF Election Extended community can be sent along with an ES route. * When a PE receives the ES Routes from all the other PEs for the ES, it checks if all of other PEs have advertised their desire to ! proceed by Per multicast flow DF election. If all peering PEs ! have done so, it performs DF election based on Per multicast flow procedure. But if: - There is at least one PE which advertised route-4 ( AD per ES --- 229,254 ---- - Type 5: HRW base per (*,G) multicast flow DF election (explained in this document) ! * [RFC8584] describes encoding of capabilities associated to the ! DF election algorithm using a Bitmap field. When these capabilities bits are set along with the DF type-4 and type-5, they need to be ! interpreted in context of the DF type-4 and type-5. For example, consider a scenario where all PEs in the same redundancy ! group (same ES) can support both AC-DF, DF type-4, and DF type-5 and receive such indications from the other PEs in the ES. In this scenario, if a VLAN is not active in a PE, then the DF election procedure on all PEs in the ES should factor that in and ! exclude that PE in the per-multicast-flow DF election. ! * A PE SHOULD attach the DF election Extended Community to an ES route ! and the Extended Community MUST be sent if the ES is locally ! configured for DF type Per-Multicast-flow DF election. Only one ! DF Election Extended community can be sent with an ES route. * When a PE receives the ES Routes from all the other PEs for the ES, it checks if all of other PEs have advertised their desire to ! proceed with Per-multicast-flow DF election. If all peering PEs ! have done so, it performs DF election based on the Per-multicast-flow procedure. But if: - There is at least one PE which advertised route-4 ( AD per ES *************** *** 258,264 **** - There is at least one PE signaling single active in the AD per ES route ! it MUST be considered as an indication to support of only Default DF election [RFC7432] and DF election procedure in [RFC7432] MUST be used. --- 258,264 ---- - There is at least one PE signaling single active in the AD per ES route ! it MUST be considered as an indication to support of only the Default DF election [RFC7432] and DF election procedure in [RFC7432] MUST be used. *************** *** 268,276 **** repeat the description of HRW algorithm itself. EVPN PE does the discovery of redundancy groups based on [RFC7432]. ! If redundancy group consists of N peering EVPN PE nodes, after the ! discovery all PEs build an unordered list of IP address of all the ! nodes in the redundancy group. The procedure defined in this draft does not require the list of PEs to be ordered. Address [i] denotes the IP address of the [i]th EVPN PE in redundancy group where (0 < i <= N ). --- 268,276 ---- repeat the description of HRW algorithm itself. EVPN PE does the discovery of redundancy groups based on [RFC7432]. ! If a redundancy group consists of N peering EVPN PE nodes, after the ! discovery all PEs, build an unordered list of IP address of all the ! nodes in the redundancy group. The procedure defined in this document does not require the list of PEs to be ordered. Address [i] denotes the IP address of the [i]th EVPN PE in redundancy group where (0 < i <= N ). *************** *** 284,290 **** 4.1. DF election for IGMP (S,G) membership request ! The DF is the PE who has maximum weight for (S, G, V, Es) where * S - Multicast Source --- 284,290 ---- 4.1. DF election for IGMP (S,G) membership request ! The DF is the PE who has maximum weight for (S, G, V, ES) where * S - Multicast Source *************** *** 292,309 **** * V - VLAN ID. ! * Es - Ethernet Segment Identifier Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant. 1. Weight ! * The weight of PE(i) to (S,G,VLAN ID, Es) is calculated by ! function, weight (S,G,V, Es, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i. ! * Weight (S,G,V, Es, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(S,G,V,ESI))+12345) (mod 2^31) --- 292,309 ---- * V - VLAN ID. ! * ES - Ethernet Segment Identifier Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant. 1. Weight ! * The weight of PE(i) to (S,G,VLAN ID, ES) is calculated by ! function, weight (S,G,V, ES, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i. ! * Weight (S,G,V, ES, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(S,G,V,ESI))+12345) (mod 2^31) *************** *** 312,333 **** 2. Digest ! * D(S,G,V, Es) = CRC_32(S,G,V, Es) ! * Here D(S,G,V,Es) is the 31-bit digest (CRC_32 and discarding ! the MSB) of the Source IP, Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network byte order (big-endian). 4.2. DF election for IGMP (*,G) membership request ! The DF is the PE who has maximum weight for (G, V, Es) where * G - Multicast Group * V - VLAN ID. ! * Es - Ethernet Segment Identifier --- 312,333 ---- 2. Digest ! * D(S,G,V, ES) = CRC_32(S,G,V, ES) ! * Here D(S,G,V,ES) is the 31-bit digest (CRC_32 and discarding ! the MSB) of the Source IP, Group IP, Vlan ID and ES. The CRC MUST proceed as if the architecture is in network byte order (big-endian). 4.2. DF election for IGMP (*,G) membership request ! The DF is the PE who has maximum weight for (G, V, ES) where * G - Multicast Group * V - VLAN ID. ! * ES - Ethernet Segment Identifier *************** *** 343,353 **** 1. Weight ! * The weight of PE(i) to (G,VLAN ID, Es) is calculated by ! function, weight (G,V, Es, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i. ! * Weight (G,V, Es, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(G,V,ESI))+12345) (mod 2^31) --- 343,353 ---- 1. Weight ! * The weight of PE(i) to (G,VLAN ID, ES) is calculated by ! function, weight (G,V, ES, Address(i)), where (0 < i <= N), PE(i) is the PE at ordinal i. ! * Weight (G,V, ES, Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(G,V,ESI))+12345) (mod 2^31) *************** *** 356,376 **** 2. Digest ! * D(G,V, Es) = CRC_32(G,V, Es) ! * Here D(G,V,Es) is the 31-bit digest (CRC_32 and discarding the ! MSB) of the Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network byte order (big-endian). 4.3. Default DF election procedure ! Per multicast DF election procedure would be applicable only when ! host behind Attachment Circuit (of the Es) start sending IGMP ! membership requests. Membership requests are synced using procedure ! defined in [RFC9251], and each of the PE in redundancy group can use ! per flow DF election and create DF state per multicast flow. The HRW DF election "Type 1" procedure defined in [RFC8584] MUST be used for ! the Es DF election and SHOULD be performed on Es even before learning multicast membership request state. This default election procedure MUST be used at port level but will be overwritten by Per flow DF election as and when new membership request state are learnt. --- 356,376 ---- 2. Digest ! * D(G,V, ES) = CRC_32(G,V, Es) ! * Here D(G,V,ES) is the 31-bit digest (CRC_32 and discarding the ! MSB) of the Group IP, Vlan ID and ES. The CRC MUST proceed as if the architecture is in network byte order (big-endian). 4.3. Default DF election procedure ! Per-multicast-flow DF election procedure would be applicable only when ! host behind the Attachment Circuit (of the ES) starts sending IGMP ! membership requests. Membership requests are synced using the procedure ! defined in [RFC9251], and each of the PEs in a redundancy group can use ! per-multicast-flow DF election and create DF state per multicast flow. The HRW DF election "Type 1" procedure defined in [RFC8584] MUST be used for ! the ES DF election and SHOULD be performed on ES even before learning multicast membership request state. This default election procedure MUST be used at port level but will be overwritten by Per flow DF election as and when new membership request state are learnt. *************** *** 394,400 **** Internet-Draft Per multicast flow Designated Forwarder July 2023 ! Multicast Source | | | --- 394,400 ---- Internet-Draft Per multicast flow Designated Forwarder July 2023 ! Multicast Source | | | *************** *** 436,446 **** Route. This draft does not change any of this procedure, it still uses the procedure defined in [RFC7432]. ! 2. Each of the PEs in redundancy group advertise Ethernet segment ! route with extended community indicating their ability to ! participate in per multicast flow DF election procedure. Since ! Per multicast flow would not be applicable unless PE learns about ! membership request from receiver, there is a need to have the default DF election among PEs in redundancy group for BUM --- 436,446 ---- Route. This draft does not change any of this procedure, it still uses the procedure defined in [RFC7432]. ! 2. Each of the PEs in the redundancy group advertise an Ethernet segment ! route with an extended community indicating their ability to ! participate in per-multicast-flow DF election procedure. Since ! Per multicast flow would not be applicable unless the PE learns about ! muilticast membership from a receiver, there is a need to have the default DF election among PEs in redundancy group for BUM *************** *** 450,484 **** Internet-Draft Per multicast flow Designated Forwarder July 2023 ! traffic. Until multicast membership state are learnt, we use the ! the DF election procedure in Section 4.3, namely HRW per (v,Es) as defined in [RFC8584] . 3. When a receiver starts sending membership requests for (s1,g1), ! where s1 is multicast source address and g1 is multicast group address, CE-1 could hash membership request (IGMP join) to any of the PEs in redundancy group. Let's consider it is hashed to PE- 2. [RFC9251] defines a procedure to sync IGMP join state among ! redundancy group of PEs. Now each of the PE would have ! information about membership request (s1,g1) and each of them run ! DF election procedure Section 4.1 to elect DF among participating ! PEs in redundancy group. Consider PE-2 gets elected as DF for ! multicast flow (s1,g1). 1. PE-1 forwarding state would be nDF for flow (s1,g1) and DF for rest other BUM traffic. ! 2. PE-2 forwarding state would be DF for flow (s1,g1) and nDF for rest other BUM traffic. 3. PE-3 forwarding state would be nDF for flow (s1,g1) and rest other BUM traffic. ! 4. As and when new multicast membership request comes, same ! procedure as above would continue. 5. If Section 3 has DF type 4, For membership request (S,G) it MUST ! use Section 4.1 to elect DF among participating PEs. And membership request (*,G) MUST use Section 4.2 to elect DF among participating PEs. --- 450,485 ---- Internet-Draft Per multicast flow Designated Forwarder July 2023 ! traffic. Until multicast membership state is learnt, we use the ! the DF election procedure in Section 4.3, namely HRW per (v,ES) as defined in [RFC8584] . 3. When a receiver starts sending membership requests for (s1,g1), ! where s1 is a multicast source address and g1 is a multicast group address, CE-1 could hash membership request (IGMP join) to any of the PEs in redundancy group. Let's consider it is hashed to PE- 2. [RFC9251] defines a procedure to sync IGMP join state among ! PEs in a redundancy group. Now each of the PE would have ! information about the membership request (s1,g1) and each of them would run ! the DF election procedure (refer to Section 4.1)( to elect ! a DF among participating PEs in the redundancy group. Consider PE-2 ! gets elected as DF for multicast flow (s1,g1). 1. PE-1 forwarding state would be nDF for flow (s1,g1) and DF for rest other BUM traffic. ! 2. PE-2 forwarding state would be the DF for flow (s1,g1) and nDF for rest other BUM traffic. 3. PE-3 forwarding state would be nDF for flow (s1,g1) and rest other BUM traffic. ! 4. When a new multicast membership request arrives, the same ! procedure as above would used to selected a nDF for the ! multicast flow. 5. If Section 3 has DF type 4, For membership request (S,G) it MUST ! use Section 4.1 to elect a DF among participating PEs. And membership request (*,G) MUST use Section 4.2 to elect DF among participating PEs. *************** *** 487,494 **** There are multiple triggers which can cause DF re-election. Some of the triggers could be ! 1. Local ES going down due to physical failure or configuration ! change triggers DF re-election at peering PE. 2. Detection of new PE through ES route. --- 488,495 ---- There are multiple triggers which can cause DF re-election. Some of the triggers could be ! 1. Local ES going down due to physical failure or a configuration ! change that triggers DF re-election at peering PE. 2. Detection of new PE through ES route. *************** *** 509,515 **** 6. Local configuration change of DF election Type and peering PE consensus on new DF Type ! This document does not provide any new mechanism to handle DF re- election procedure. It uses the existing mechanism defined in [RFC7432]. Whenever either of the triggers occur, a DF re-election would be done. and all of the flows would be redistributed among --- 510,516 ---- 6. Local configuration change of DF election Type and peering PE consensus on new DF Type ! This document does not provide any new mechanisms to handle DF re- election procedure. It uses the existing mechanism defined in [RFC7432]. Whenever either of the triggers occur, a DF re-election would be done. and all of the flows would be redistributed among