TI-LFA Draft Version 11 review: Review Result: Serious Issues This draft has serious issues and is not ready for publication. The interactions between TI-LFA and uLoop avoidance needs to be detailed in the specification. I have expanded on this topic in detail. Also clearly state what post convergence meaning as well as how this draft is extension to IP FRR base RFC 5286 replacing T-LDP RLFA RFC 7490 with SR based TI-LFA RLFA. Below is draft overview and description of issues found in the draft that should be addressed that I hope will improve the draft readiness for publication: The basic high level concept behind TI-LFA is to provide post convergence RLFA (DFA) that is guaranteed loopfree & stateless and not a “stateful” pre-convergence “pre-computed” “pre-programmed” backup path that is IGP LSDB SPF topology tree dependent based on IGP metrics and possibly other constraints. By making the TI-LFA use a “post convergence” RLFA calculation for the RLFA PQ node we can provide 100% full coverage of all prefixes as compare to T-LDP based RLFA by using SR’s statelessness making it “topology IGP and metric & constraints independent” by using a static SID list to build the path which in most cases is a single node sid which can leverage ECMP or a node sid + adjacency sid or at most 4 SIDs for the backup path. TI-LFA works in conjunction with uLoop Micro loop avoidance which I will discuss later in this review and the interaction between the two and why I believe it is important to reference the interaction between the two specifications and possibilities of combining the specifications. Digging deep dive into the draft and details of TI-LFA specification. IP FRR (Base LFA) RFC 5286, T-LDP based RLFA, TI-LFA all provide per prefix “Local Protection” Link, Node and SRLG protection. This is similar to RSVP-TE FRR RFC 4090 providing similar Link, Node and SRLG protection. RSVP-TE can provide path protection where all LFA styles are “Local Protection” only. TI-LFA is an extension of LFA where T-LDP based RLFA is replaced with Segment Routing based RLFA using static sid list from extended P space to RLFA PQ node post convergence calculation. This concept of TI-LFA being an extension of LFA should be expressed throughout the draft that the original base “IP FRR” LFA RFC 5286 is not changing and exists with existing standards based pre programmed backup path. We are just replacing T-LDP based RLFA pre convergence backup path algorithm for RLFA PQ node calculation with now TI-LFA post convergence backup path RLFA PQ node calculation. T-LDP based RLFA is stateful with MPLS data plane – underlying topology dependent IGP metric & constraint based backup path calculation. TI-LFA based RLFA is stateless with SR data plane SR-MPLS or SRv6 – completely topology independent using SR mechanics via static SID list. AFAKI TI-LFA uses SR mechanics to build the static sid based backup path which makes it local “RIB/FIB independent” making it IGP LSDB link state topology metrics & constraints independent. Is that true and if so that should be added to provide clarity as to what is meant by “topology independent”. IP FRR (LFA-Base) comes into play when there you have multiple uplinks and downlinks criss-cross links beteen nodes (more redundancy) RLFA (TI-LFA RLFA / T-LDP RLFA) comes into play with loop or ring topology where you have single uplink and single downlink and less redundancy which is a very common situation with Service Providers physical fiber plant infrastructure. Thus TI-LFA + uLoop play and import role in link, node & srlg protection to provide sub 50ms convergence. Capacity planning always has to be taken into account with link & node failures when traffic reroutes across a failover path that is may not be sized appropriately and so just as with IP FRR Base RFC 5286 and T-LDP RLFA RFC 7490 this goal of TI-LFA is to provide link, node and srlg protection and sub 50ms convergence to a failover backup path. So if capacity issues exist before turning on any LFA flavor those same capacity issues would exist with any flavor of LFA including TI-LFA. The advantages of T-LDP based RLFA versus TI-LFA related to distribution of traffic over ECMP paths using single node sid exists as well with T-LDP RLFA so the use of ECMP with TI-LFA distribution is no different then T-LDP RLFA. IP FRR LFA Base RFC 5286 and T-LDP RLFA RFC 7490 requires tiebreaking rules for LFA style LFA, RLFA and which to be prioritized which now is not required with TI-LFA. That provides much simplicity and optimization for TI-LFA> TI-LFA as well now has configurable tiebreaking knob for implementation specific protection schemes link, node, srlg where link is default protection and link, node, srlg can be prioritized to prefer one protection mechanism over another. An implementation consideration which should be discussed in detail in the draft is that if all nodes have TI-LFA enabled which is typical in a service provider network and protected SID is used and TI-LFA occurs on the protected SIDs, the complexity and pros and cons of using protected versus unprotected sids for the repair path and any issues that could arise with link congestion with nested TI-LFAs being triggered. Also any limitations on having many TI-LFAs and multi layer nested TI-LFAs due to protected SIDs along a path with SR policy. It maybe a implementation and deployment consideration to not use or maybe not allow protected SIDs for the repair path. Possibility of using stateful PCE / SDN controller to instantiate the triggered TI-LFAs and deeply nested TI-LFAs if they occur which the controller could be used to manage the bandwidth and constraints along the repair paths. Applicability of RFC 7916 to TI-LFA. This should be discussed in detail and I think even a separate section related to applicability. RFC 7916 is mentioned but not in detail on the applicability. SR Policy can have link coloring similar to RSVP-TE IGP extension and I AFAIK should be able to take advantage of link coloring to control the choice of TI-LFA paths for RLFA PQ node calculation to include or exclude links based on link colors based on administrative groups to help aid building and optimizing the repair path. Also with PCE being able to gather the link speeds for bandwidth based TI-LFA to avoid congested links as well as congestion caused by FRR activation. The “no transit” condition on LFA computing node described in RFC 7916 is applicable to TI-LFA and should be included in operational considerations section using ISIS overload bit or OSPF R bit. What I mentioned in the above paragraphs is AFAIK how TI-LFA works and its inner workings and I believe should be discussed in the abstract, introduction and throughout the document. Major issues: TI-LFA by itself is optimized to use ECMP and least number of SID’s for IGP ECMP path. The problem is during convergence uLoop exists in diagram below on R1, R4, R5 until the nodes are converged. With SR uLoop avoidance a timer is set at time T to replace the TI-LFA backup path with uLoop path by installing a static sid list to statically route traffic across the nodes that are not converged avoiding the local FIB programming on R1, R4, R5 until they are completely converged after which time the TI-LFA post convergence path can be reverted back to using optimized loose prefix sid path can be build and stateless flows can be built on the post convergence LSP path. So now with uLoop avoidance a temporary static sid list strict path is created along path from extended P space to RLFA PQ node and timer is set which pops in T seconds once all nodes are converged and then the static strict sid list is removed reverting back to the original TI-LFA post convergence backup path. If you do not have a uLoop static sid list tunnel to statically tunnel across all the non converged nodes in this case R1, R4, R5 have microloops with the result would be an outage and black hole of traffic until all P & Q space nodes are converged. This is a problem as I see it with TI-LFA working independently without SR uLoop avoidance. R1, R4, R5 –Extended P Space R4, R5, R6 – Q Space R4,R5 – PQ Space In this example the link between R2-R3 has link failure CE –R1- R2 – R3-CE | | R4 – R5 -R6 AFAIK, TI-LFA cannot work without uLoop as you need static sid list tunnel across the entire path. TI-LFA plus uLoop avoidance would be a better comparison to RLFA T-LDP tunnel and/or RSVP TE tunnel link and node protection. With RLFA T-LDP case we are creating a tunnel from extended P space node to RLFA calculated PQ Space node and with RSVP-TE creating bypass tunnel from PLR node to merge point. Because both RLFA T-LDP & RSVP-TE FRR utilize a tunnel with additional labels to tunnel across the intermediate nodes thus are not using IGP FIB entries for forwarding as traffic is being tunneled from S-F over from PLR to merge point. With TI-LFA as the SID list is optimized to a single prefix-sid or prefix-sid + adj-sid the intermediate nodes are ECMP forwarding using IGP FIB / LFIB entries thus the uloop and subsequent outage occurs until all intermediate nodes along the path to destination are converged. Of course that is the main reason why SR uloop avoidance is most critical and why TI-LFA cannot be used without uLoop avoidance. After reading through this document many times I think the concept of uLoop draft should be merged with TI-LFA as they are both so closely coupled that they should be integrated into the TI-LFA solution. The other alternative is that when the TI-LFA RLFA PQ node is calculated to build the static sid list with adj-sid only from the PLR node to TI-LFA calculated PQ node. That would eliminate the need for an extra step with the separate uLoop avoidance specification as well and operator complexity of having to configure both. The issue with that is MSD and having an optimized SID list. However using a PCE/SDN controller along with IGP MSD signaling could be used to signal MSD and manage the SID list platform limitations could be a possibility just as is done today with SR policy on the head end, MSD limitations could be handled by the PCE as well. Minor issues: Better clarity on post convergence backup path and what is meant by topology independent per my description above. Also better clarity that TI-LFA is an extension of base specification IP FRR RFC 5286. Mention in the draft that even though TI-LFA uses post convergence backup path as TI-LFA is an extension of LFA the semantics of pre-programmed backup path exists at time T1 once configured, and then when a failure occurs at time T2 the backup path is updated with the post convergence backup path information. This maybe implementation specific but I think should be included in the specification. Recommendation to remove any marketing language and /or subjective language and keep to the details of the specification. Below I have given rewrite of language recommendations to make the document more clear. Section 1 & 3 I recommend should be combined Abstract Tried to make this section more clear with my rewrite Why are we saying between two networks? Is it talking about the source and destination that the TI-LFA coverage. I think of coverage as one of the improvement from LFA to TI-LFA that it’s providing 100% coverage for all prefixes where base IP LFA does not. I think the abstract is lengthy and could be made more brief removing the last sentence to the introduction. I have rewritten the abstract including the last sentence which should be moved to the introduction. Old This document presents Topology Independent Loop-free Alternate Fast Re-route (TI-LFA), aimed at providing protection of node and adjacency segments within the Segment Routing (SR) framework. This Fast Re-route (FRR) behavior builds on proven IP-FRR concepts being LFAs, remote LFAs (RLFA), and remote LFAs with directed forwarding (DLFA). It extends these concepts to provide guaranteed coverage in any two connected networks using a link-state IGP. A key aspect of TI-LFA is the FRR path selection approach establishing protection over the expected post-convergence paths from the point of local repair, reducing the operational need to control the tie-breaks among various FRR options.¶ New This document defines Topology Independent Loop-free Alternate Fast Re-route (TI-LFA), aimed at providing link, Node and SRLG protection using prefix and adjacency segments within the Segment Routing (SR) framework. TI-LFA Fast Re-route (FRR) behavior is an extension to the base IP-FRR framework using LFAs, remote LFAs (RLFA), and remote LFAs with directed forwarding (DLFA). It extends these LFA concepts by now providing 100% full coverage to all prefixes. A key aspect of TI-LFA extension to base IP FRR (LFA) is that now a tiebreaker is not required for LFA and RLFA. Introduction Old By relying on SR this document provides a local repair mechanism for standard link-state IGP shortest path capable of restoring end-to-end connectivity in the case of a sudden directly connected failure of a network component. Non-SR mechanisms for local repair are beyond the scope of this document. Non-local failures are addressed in a separate document [I-D.bashandy-rtgwg-segment-routing-uloop]. The term topology independent (TI) refers to the ability to provide a loop free backup path irrespective of the topologies used in the network. This provides a major improvement compared to LFA [RFC5286] and remote LFA [RFC7490] which cannot provide a complete protection coverage in some topologies as described in [RFC6571].¶ For each destination in the network, TI-LFA pre-installs a backup forwarding entry for each protected destination ready to be activated upon detection of the failure of a link used to reach the destination. New Using SR this document provides a local repair mechanism for standard SPF path calculation capable of restoring end-to-end connectivity in the case of a sudden directly connected failure of a network component. Non-SR mechanisms for local repair are beyond the scope of this document. Micro Loop avoidance is a critical component of TI-LFA post convergence by providing a temporary SR policy across intermediate P and Q space nodes that have not converged [I-D.bashandy-rtgwg-segment-routing-uloop]. The term topology independent (TI) refers to the ability to provide a loop free backup path that is independent of underlying link state database IGP metric & constraints which is now based on segment routing policy. This solution is an extension of the base IP FRR (LFA) [RFC5286] and replaces T-LDP based remote LFA [RFC7490] which cannot provide a complete protection coverage in some topologies as described in [RFC6571]. For each destination in the network, TI-LFA pre-installs a backup forwarding entry for each protected destination ready to be activated after a failure and updated based on current SPF for post computation backup path upon detection of the failure of a link, node or slrg used to reach the destination. **This next few paragraphs are rewritten to make clear that TI-LFA is an extension to base IP FRR and that T-LDP based RLFA is replaced with TI-LFA based RLFA** Old By using SR, TI-LFA does not require the establishment of TLDP sessions (Targeted Label Distribution Protocol) with remote nodes in order to take advantage of the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs with directed forwarding (DLFA)[RFC5714]. All the Segment Identifiers (SIDs) are available in the link state database (LSDB) of the IGP. As a result, preferring LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or DLFA repair nodes is not required anymore.¶ By using SR, there is no need to create state in the network in order to enforce an explicit FRR path. This relieves the nodes themselves from having to maintain extra state, and it relieves the operator from having to deploy an extra protocol or extra protocol sessions just to enhance the protection coverage. New TI-LFA replaces RLFA provided by the establishment of TLDP sessions (Targeted Label Distribution Protocol) with remote nodes in order to take advantage of the applicability of remote LFAs (RLFA) [RFC7490][RFC7916] or remote LFAs with directed forwarding (DLFA)[RFC5714] with SR based RLFA. All the Segment Identifiers (SIDs) are available in the link state database (LSDB) of the IGP. Thus, tiebreaking of LFAs over RLFAs or DLFAs, as well as minimizing the number of RLFA or DLFA repair nodes is no longer required as SR framework advertises the SIDs via IGP extension. **Below sentence which is promoting the technology and is well known to anyone familiar with SR reading this document so I think can be excluded** By using SR, there is no need to create state in the network in order to enforce an explicit FRR path. This relieves the nodes themselves from having to maintain extra state, and it relieves the operator from having to deploy an extra protocol or extra protocol sessions just to enhance the protection coverage. Section Terminology: *Is there any change in the definition for P space, Extended P space & Q space from RFC 7490 and if so that should be specified. Similar to [RFC7490], we use the concept of P-Space and Q-Space for TI-LFA. *R,X please define R which is the Source node but define what is R within the topology which is the R=PLR, Also the resource X is the link, node, srlg -this is not completely clear so I think should be explicitly spelled out The P-space P(R,X) of a router R with regard to a resource X (e.g. a link S-F, a node F, or a SRLG) is the set of routers reachable from R using the pre-convergence shortest paths without any of those paths (including equal-cost path splits) transiting through X. *The union of the P spaces of the neighbors of R but how is that the reduced set of neighbors – the end of the sentence is confusing Consider the set of neighbors of a router R and a resource X. Exclude from that set of neighbors that are reachable from R using X. The Extended P-Space P'(R,X) of a node R with regard to a resource X is the union of the P-spaces of the neighbors in that reduced set of neighbors with regard to the resource X. *should this be Q(D,X) where D-Destination and is the set of routers reachable from D The Q-space Q(R,X) of a router R with regard to a resource X is the set of routers from which R can be reached without any path (including equal-cost path splits) transiting through X. *What does EP mean– “explicit path?” path from P node to Q node – explicitly define EP *AFAIK – the explicit path would that not be from the PLR to which is R to the RLFA PQ node calculated by SPF EP(P, Q) is an explicit SR-based path from a node P to a node Q. *should we mention asymmetric is where the metric is not the same on both ends of link. A symmetric network is a network such that the IGP metric of each link is the same in both directions of the link. Section 5 Should section 5.2 Q space computation be all the nodes reachable by Destination D w/o using resource X Should there be a section on TI-LFA calculation to find the intersection of the extended P space & Q space which would be the PQ space RLFA node calculated? Section 5.1 defines the extended P space set of nodes – how is that any different then extended P space in RFC 7490 (this is just a set of nodes that are part of the P space) Section 5.2 defines the extended Q space set of nodes – how is that any different then extended P space in RFC 7490 (this is just a set of nodes that are part of the Q space) So would the static SID list be built from the from the PLR to the RLFA PQ space node calculated and would traverse intermediate nodes with the section 5.1 extended P space an 5.2 extended Q space Section 5.3 Scaling considerations With RFC 7490 RLFA we are just building a T-LDP tunnel from the PLR to RLFA calculated PQ space node which is used for each prefix – so its an RLFA tunnel per prefix With TI-LFA we have a static sid list that is built per prefix from the PLR to RLFA calculated PQ space node so that seems like a lot of heavy lifting building that many steered paths static LSPs Is it computing the RLFA PQ space per destination and not the Q space per destination. We need the intersection of the P space & Q space – which is the RLFA node in the calculation to build the SID list from PLR to Destination node? Section 6 – I think should be renamed TI-LFA node protection S and D should be defined as Source & Destination R = PLR = S  we should try to keep the terminology and naming conventions for nodes consistent throughout the document. *Below sentence I would rewrite since header is SRv6 related to SRH or SRv6 compression C-SID – uSID or GSID carrier related where SR-MPLS is a label stack – no header Old The TI-LFA repair path (RP) consists of an outgoing interface and a list of segments (repair list (RL)) to insert on the SR header in accordance with the dataplane used. New The TI-LFA repair path (RP) consists of an outgoing interface and a list of segments (segment list (SL)) to SRv6 endpoint behavior insert T.Insert endpoint behavior or depending on hardware capabilities T.encap and SR-MPLS SID list represented by a label stack. /s/repair//sid list replace repair list with sid list everywhere in the document I understand the reason to use RL-repair list is that RP-Repair path nomenclature matches however you could call the repair path “bypass loop” which is common term used for local repair. Repair list is referring to sid list but is saying list of what but does not say explicitly where sid list is really what you are after so I think sid list is more appropriate term that repair list. SL (SID list) – refers to Active path but can also refer to the backup path as well as SL is referring to the SID list. This can be added to the terminology section. *Below sentence is not entirely true as it depends on implementation as well most implantations to avoid scalability issues with MSD will prefer to use a mix of node-sid & adj-sid and not use all adj-sid. When building the static SID list from the PLR node to the TI-LFA calculated RLFA PQ-node if adj-sid is not used for the entire path then the intermediate nodes will rely on ECMP / IGP programmed FIB entries resulting in P and Q space nodes not converged “microloops” which is one of the major issues I see in the specification that uLoop must be included as a dependency for this specification. “The repair list encodes the explicit post-convergence path to the destination, which avoids the protected resource X and, at the same time, is guaranteed to be loop-free irrespective of the state of FIBs along the nodes belonging to the explicit path.” not true unless you use all adj-sid which is not scalable and the only solution is uLoop Old As an example, in Figure 1, we are interested by the TI-LFA backup from S to D considering the failure of node N1. New Figure 1, example we are interested in the TI-LFA backup from S to D considering the failure of node N1. Section 6.1 Would this be IP FRR base RFC 5286 Section 6.2 FRR path using a PQ node In section 6 was that not describing PQ node scenario ? I would call remote node something different then R since R is reserved for PLR / S node Maybe call it Y = remote node This is comparable to a post-convergence RLFA repair tunnel. **this goes back to a point I made throughout the draft regarding TI-LFA is an extension of base IP FRR (LFA) RFC 5286 and T-LDP RLFA is replaced by TI-LFA RLFA so TI-LFA is an “RLFA” its just an RLFA that uses SR  I would not say comparable… I would say .. This is a post-convergence RLFA repair tunnel. Section 6.3 FRR path with P node and Q node adjacent Here also I would not say “comparable” This is comparable to a post-convergence DLFA (LFA with directed forwarding) repair tunnel. This is a post-convergence DLFA (LFA with directed forwarding) repair tunnel. Section 6.4 Connecting distand P & Q nodes I think you should say P and Q space is not adjacent since the P space represents all P nodes and Q space represents all Q nodes but within those 2 spaces there is no intersecting common nodes So here we are saying there is no intersecting P and Q space nodes and so not adjacent For RLFA my understanding is you should have a intersection of P & Q space which is the PQ space and that is is what is calculated as your RLFA PQ node to build the tunnel or static sid list. If P & Q space are not adjacent to each other which I guess is possible I guess as mentioned its still possible to create a static sid list from P to Q Section 7. Building TI LFA repair list So the crux of this section is explaining the procedure of how to insert the TI LFA repair path sid list (repair list) into an existing SID list inserting into the SR-MPLS label stack or SRv6 SRH header and once you exit the bypass loop (repair path) you are back to the original SR policy path. I think the procedures may vary since MPLS processing has PUSH, NEXT, CONTINE operations and SRv6 SRH we are moving the SL pointer SL=SL-1 as we process the SIDs. Also for C-SID, Next SID uSID flavor or Replace SID G-SID flavor the operation happens within the uSID carrier. My suggestion is that as this process of building the repair list is different for SR-MPLS label switching & SRv6 Programming endpoint behaviors processing & SRH SID list processing. I think it would make sense to take section 7 and add it to section 8 data plane sections. Also in doing so I think it would make sense to separate out SR-MPLS & SRv6 into separate sections. The sections show the scenario of node sid and adjacency sid but I think it should have prefix-sid as well. Node sid is a type of prefix sid but the prefix sid would be an intermediate steering hop versus the final destination would be the node sid of the egress PE FEC Loopback0. So I think the prefix-sid scenario should be added as its more pertinent then node sid since TI-LFA link, node, srlg protection could occur on any transit node along the path. I think you could replace the node sid with prefix sid. Section 7.1 Active segment is a node segment Wherever mentioned SR header just to make better readability and accurate we should replace header with sid list and sid list is applicable to both SR-MPLS & SRv6. “The active segment MUST be kept on the SR header unchanged and the repair list MUST be added.” Also maybe mention the reason why the active segment must be kept in the header (segment list) unchanged is that once the TI-LFA segment list is popped the original active segment can now be processed and the SR policy steering can continue where it left off. I am not following why the active segment should be the first segment of the repair list. This question applies to the other sections as well. My thought process on TI-LFA is that lets say the head end node has an SR Policy with a single sid only to steer the traffic and the active segment is a node sid in this case the node sid is for the egress PE FEC loopback0 so an ECMP path to the final destination, however the S-F link is down so now we should be trying to steer to the node sid along the repair path bypass loop. So the PLR Source router does a CONTINUE swap operation on the active sid which now packet forwarded to the next router which processes 1st segment in the repair list for the repair path which takes you back to the merge point back end of the bypass loop. Once you have completed the processing the all the sids in the repair path the repair list is empty. What happens next and how do you forward to the egress PE using the original SR policy which had the active prefix-sid. In this example the link between R2-R3 has link failure CE –R1- R2- –R3-CE | | R4 – R5 -R6 R1 is the PLR source node So R1-R4-R5-R6 -R3 is the bypass loop that the repair path takes using the repair list of sids. R1 is the source nod and has SR policy with label 16001 for the single node sid in the SR policy which is bound to prefix 3.3.3.3 on node R3. The link R1-R2 goes down. R1 has its pre programmed backup path already configured with TI-LFA and now when R-R2 goes down it now calculates the post convergence backup path to PQ node and installs in FIB node sid 16002 label binding to R6 6.6.6.6 so now we perform CONTINUE label swap across R4 R5 and when arrive at R6 merge point we are back on the SR Policy pre failure path. However now we don’t have the original SR Policy node sid to R3 16001 node sid to forward the traffic to R3. My thoughts on how this should work is the if SR Policy has a node sid and TI-LFA failure occurs that the repair path would be processed and then popped and then the active node sid would be processed afterwards. So in this case the repair list during the S-F failure would be 16002 followed by 16001 in the label stack. So 16002 would take you along the bypass loop repair path to R6 and then 16001 CONTINUE would be processed and take you to the egress PE R3. Section 7.2 Active segment is the adjacency segment Trying to understand this paragraph The simplest approach for link protection of an adjacency segment S-F is to create a repair list that will carry the traffic to F. To do so, one or more “PUSH” operations are performed. If the repair list, while avoiding S-F, terminates on F, S only pushes segments of the repair list. Otherwise, S pushes a node segment of F, followed by the segments of the repair list. For details on the "NEXT" and "PUSH" operations, refer to [RFC8402].¶ “If the repair list, while avoiding S-F, terminates on F, S only pushes segments of the repair list.” So this makes sense here that it pushes the repair list which in this use case is all adj-sid – correct? “Otherwise, S pushes a node segment of F, followed by the segments of the repair list.” So its saying S only pushes repair list which is all adj-sid otherwise it pushes node sid followed by repair list which is all adj sid. This does not make sense as it runs into the same issue that I described in section 7.1 where one the repair path sid list is processed how do you get to the egress PE final destination which is what the node sid / adj. So I would think that S would push repair list of adj sid followed by the node sid to f I think what I maybe missing which is unclear is that when the repair path list of sid is generated that should be just the sid list needed to get to the pre failure sr policy path “merge point” so once we get there then the sr policy should take over. So I think whats happening is we are describing just how to get to the S-F to F Destination node of the link for the TI-LFA and not the final destination. So once the repair path sid list is processed and we are back on the pre failure path then the remaining sid list in the label stack is then processed. When the SR policy is in place and how single or multiple TI-LFAs occur where we have multiple switchovers this really needs to be addressed but in the context from the SR policy perspective and the end to end steering works and what happens once the TI-LFA repair path sid lists are processed and we are back onto the pre failure path Destination router S-F F router how is the rest of the sid list label stack or SRH sid list processed. A picture of the SR-MPLS label stack and SRv6 SRH header and SRv6 CSID Next SID uSID or Replace SID G-SID diagram would be very helpful in understanding the overall sid list processing along the repair path. Old The simplest approach for link protection of an adjacency segment S-F is to create a repair list that will carry the traffic to F. To do so, one or more “PUSH” operations are performed. If the repair list, while avoiding S-F, terminates on F, S only pushes segments of the repair list. Otherwise, S pushes a node segment of F, followed by the segments of the repair list. For details on the "NEXT" and "PUSH" operations, refer to [RFC8402]. New The simplest approach for link protection of an adjacency segment S-F is to create a repair list that will carry the traffic to F. To do so, one or more “PUSH” operations are performed. If the repair list, while avoiding S-F, terminates on F, S only pushes segments of the repair list in this use case only an adj-sid(s). Otherwise, as described in section 7.1 S pushes a node segment of F, followed by the segments of the repair list. For details on the "NEXT" and "PUSH" operations, refer to [RFC8402]. It is not clear why the node sid has to be pushed before the repair path Section 7.2.1 Protecting [Adj, Adj] segment list This is an important consideration to use protected versus unprotected sids for the repair path. I think that should be addresses why to use protected sids and the pros and cons of using protected versus unprotected sids. Use of protected sids could result in complex failure scenarios and can go many layers deep of TI-LFA which can get over complicated. I don’t think the description is accurately describing the scenario with the protected sid as it seems its describing the TI-LFA activation path from S-F but not if a protected SID fails and now that triggers and nested TI-LFA activation on that sid for a new bypass loop repair path. Section 7.2.2 Protecting [Adj, Node] segment list This is an important consideration to use protected versus unprotected sids for the repair path. I think that should be addresses why to use protected sids and the pros and cons of using protected versus unprotected sids. . Use of protected sids could result in complex failure scenarios and can go many layers deep of TI-LFA which can get over complicated. I don’t think the description is accurately describing the scenario with the protected sid as it seems its describing the TI-LFA activation path from S-F but not if a protected SID fails and now that triggers and nested TI-LFA activation on that sid for a new bypass loop repair path. Section 8.1 MPLS data plane considerations I recommend combining Section 7 & Section 8 related to SR-MPLS data plane into a new SR-MPLS data plane section 1. Section How is the active segment signaled by PHP implicit null value 3. The egress PE node must signal PHP per RFC 3032 to the PHP node and then the PHP node performs the POP operation. How is the active segment signaled by PHP? I think we are talking about S-F the S node PLR node so what we are saying is that if PHP is signaled by the active segment and repair list ends with adj sid, then on the PLR source node S the active segment must be popped before pushing the repair list. The TI-LFA activation could happen on any transit node along the path and there could theoretically be many TI-LFA and even nested TI-LFA activations occurring simultaneously. So why does PHP come into play here. AFAKI PHP implicit null signaling should only come into play at the PHP node which has been signaled by the egress PE with implicit null value 3 POP to Pop the topmost label at the PHP mode. What if TI-LFA activation happened on the PHP node? 2. Section There is only 1 condition, what other conditions are we referring to other then the signaling active segment on the source node S being signaled by PHP. So here we are saying that if the other conditions which I don’t know what other conditions – please specify – our met – then the active segment is popped on the source node S and then pushed again with a label from the SRGB representing Q where Q is the endpoint of the repair list. Is this the RLFA Q node or PQ node intersection of the P & Q space. If it’s the PQ node then we should say explicitly which use case that this is that type of node and here am guessing the PQ node. If its different for each use case then I we should specify each use case from section 6, direct, PQ node, P and Q that are adjacent, distant P & Q nodes. Section 8.2 SRv6 data plane considerations I recommend combining Section 7 & Section 8 related to SRv6 data plane into a new SRv6 data plane section. This section does shed some light on the reason why data plane dependency that with SR-MPLS need node-sid followed by adj-sid but with SRv6 you can just to adj sid since with SRv6 the adj sids are advertised in IGP both adj sid & node sid (locator) is advertised in IGP With SR-MPLS both adj sid and node sid are advertised in igp and SRGB can be used for both adj sid and node sid come out of same label block for all nodes so is globally advertised. Since the adj sid is advertised by the IGP it is dynamically learned, however for persistence across reboots generally used by operators a static manual adj -sid must be added where prefix / node sid is always static and global advertised. I think below should be noted as it is a critical part of the TI-LFA implementation Full SID T.insert and T.encap reduced and that T.insert is recommended implementation if the hardware supports otherwise T.encap reduced for the repair path repair list SRv6 compression CSID Next SID uSID T.insert recommended and T.encap reduced and that T.insert is recommended implementation if the hardware supports otherwise T.encap reduced for the repair path repair list. SRv6 compression CSID Replace SID GSID T.insert recommended and T.encap reduced and that T.insert is recommended implementation if the hardware supports otherwise T.encap reduced for the repair path repair list. If there any special cases where T.encap should be used instead of T.insert that should be noted. If there any special considerations of TI-LFA for PSP, USP endpoint operation which should be noted. Section 9 TI-LFA and SR Algorithms (Flex Algo) Since we are talking about TI-LFA with SR Algo here I don’t think we need to reference RFC 8402 so I would remove this line SR allows an operator to bind an algorithm to a prefix SID (as defined in [RFC8402]. I would put this sentence at the beginning of this section. [RFC9350] defines a flexible algorithm (FlexAlgo) framework to be associated with Prefix SIDs. FlexAlgo allows a user to associate a constrained path to a Prefix SID rather than using the regular IGP shortest path. I think this entire paragraph can be removed below. The entire document is talking about the default algo 0 so as the entire document pertains to default algo 0 this can be placed further up in the document in the introduction would be appropriate. When TI-LFA uses node sid with default algo or any algo there is no guarantee that the path will be loop free as local policy may have overridden the expected path I think is appliable to any algo AFAIK. The SR default algorithm allows an operator to override the IGP shortest path by using local policies. When TI-LFA uses Node-SIDs associated with the default algorithm, there is no guarantee that the path will be loop-free as a local policy may have overriden the expected IGP path. These last two sentences should be placed under a new operational considerations section in the draft as local policy is applicable to any algo and not just algo0. As the local policies are defined by the operator, it becomes the responsibility of this operator to ensure that the deployed policies do not affect the TI-LFA deployment. It should be noted that such situation can already happen today with existing mechanisms as remote LFA. Why would the Adj-SID have to be unprotected? Please add verbiage explaining the reason why. Also is it trying to say node & adj sid that are part of the flex algo sub topology and if so then the sentence should be rewritten below. Old An implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that are unprotected bound to the FlexAlgo to build the repair list. New An implementation MUST only use Node-SIDs bound to the FlexAlgo and/or Adj-SIDs that are unprotected to build the repair list. Section 10 Usage of Adjacency Segments in repair list Why would TI-LFA be only for single planned failure. Would it also be for unplanned failures which is the major benefit of TI-LFA. Here we are confusing two different scenarios. At the beginning we mention that adj sid can be protected and not protected. So that is referring to a single TI-LFA activation where a protected SID can have failure and now it can have a TI-LFA nested failure. The topic of protected SID and multiple nested failures should be added to this section. In this section we are talking about multiple simultaneous failures along a path from S-F There maybe cases where you have a very long path from S-F and many intermediate nodes that have TI-LFA configured and so now how do you pick and chose which node to enable TI-LFA. Also how can you guarantee that you will not have multiple FRR activations simultaneously unplanned failures once configured. Also you could have nested failures within the same TI-LFA activation if the SID is protected which I mentioned above. I don’t understand and this needs to be explained why TI-LFA activation will not work if you have multiple unplanned failures along a path from S-F. TI-LFA is providing an protection optimization for FRR so lets say if the you had multiple link failures and FRR was not configured the failover would still work but just take much longer to recover. In a large SP network where you have a very long path and you have many link failures simultaneously going across bypass loops, without FRR enabled the network would still recover but the recovery would take much longer, however with FRR it would be instantaneous with the link, node, srlg protection. Example below here we have 2 simultaneous FRR activations. This can work and I don’t see any issue with it w/ or w/o FRR enabled. You can extrapolate this same scenario to a 100 FRR activations happening simultaneously and the network should still recover w/ or w/o FRR. Here we have 2 bypass loops “repair paths” R2-R4-R3 and R5-R8-R6 and Link R2-R3 fails and link R5-R6 fails. R1 -R2 – R3 – R5 - R6 – R7 | | | | R4 R8 Section 11 Advantages of using expected post convergence path during FRR Capacity planning is always a consideration when designing an network and AFAIK as far as the specification capacity planning is something that operators have to be cognizant of regardless of FRR but for any type of link, node, slrg failure. That being said I don’t think that capacity planning under or over capacity planning is any different w/ or w/o FRR as the failure path will S-F will be close to the same along a bypass loop. ECMP exists for LFA and RLFA as well as now TI-LFA to distribute the load during a failure so that does not change. I think what I have stated here should be mentioned as far as capacity planning perspective. Section 12 Analysis based on real world topologies Section 12 provides a similar analysis based on real world topologies similar to RFC 7490 RLFA. I agree that this data is important to the specification and should remain in the body of the document. I recommend mentioning that T1-T9 represents 9 Service Provider network topology use cases studied. As the number of links and nodes are specified per topology is plenty, I don’t think its pertinent and maybe not even possible to provide the actual topology as its NDA information. I each of the tables should the number of SIDs % columns all add up to 100%? I noticed that in the tables the % is % of prefixes that fall in each category so if you total all the columns it falls short of 100%. I thought that TI-LFA should yield 100% prefix coverage due to post convergence and static sid list so all prefixes should be covered. Why would it not be 100% full coverage as that is one of the main advantages of TI-LFA used for RLFA as opposed to T-LDP RLFA? The tables are very confusing and hard to follow and I am not seeing in the tables that 1 SID or 2 SID repair path yields 99% coverage in all topology cases. Does the comment below apply to all tables and if so I am not seeing 99% for the 1 SID or 2 SID column. “The measurements listed in the tables indicate that for link and local SRLG protection, 1 SID repair path is sufficient to protect more than 99% of the prefix in almost all cases. For node protection 2 SIDs repair paths yield 99% coverage.” Section 13 Security considerations No issues Recommendation for Operational Considerations section I think a operational considerations section should be added to the draft. Detailing the possible caveats with possible layers of nested TI-LFAs within a single repair path and complexity as well as scale. Also within a single SR policy path the scalability of the number of TI-LFAs and nested LFAs within a single SR policy path. Detailing out graphical representation drawing of the SR-MPLS data plane and repair path and SRv6 data plane repair path in the case where you have multi layer nested TI-LFAs and multiple TI-LFAs in a single SR policy using all protected sids. Also use of PCE/SDN controller for SR policy & TI-LFA to aid in FRR activation instantiation and management of bandwidth and capacity. Nits: There are minor grammatical errors which I addressed in rewrites discussed in Minor issues section.