DC-MPLS High Availability

Home » Products » Networking Protocols » Products » MPLS » High Availability

Many networks supporting MPLS function today require exceptionally high availability. This high availability is notoriously hard to "back-apply" to old software, but has been architected into DC-MPLS from the very first designs, based on our 25 years' experience in building carrier class systems and in close consultation with our customers. There are two categories of high availability features:

  • Network Availability features, whereby traffic is automatically and rapidly diverted around link or node failures in the network.

  • Network Element availability features, whereby a node has redundant data plane and optionally redundant control plane capacity and is able to recover from failures without other network elements having to react.




Network Availability Features

To provide a very resilient network in both packet and optical networks, it is necessary to provide protected LSPs that will ensure data can flow, even if some devices completely fail. There are multiple mechanisms supported by DCL to address protection switching and fast restoration, which are highlighted below.

  • Backups per-LSP, which may mean dedicated backup, or shared backups. Such backups may cover the entire path, or just part of it. This allows protection switching: failing over to a backup path at the ingress of the backup path. Backup LSPs are available with Data Connection MPLS for both optical and packet network environments.

  • Backup paths that protect a part of the network, for example, certain links or nodes, and therefore all LSPs through these protected parts of the network are protected. Possibly, only some of the LSPs through these protected parts of the network will need to be protected. Backup paths are only available with Data Connection packet-based MPLS.

  • Fast Reroute of LSPs per RFC 4090. Both detour and facility backup methods are supported and interoperable. Fast Reroute enables LSP recovery in 10s of milliseconds as traffic is redirected as close to the failure as possible and the backup LSPs/tunnels are signalled in advance. The Fast Reroute extensions are available with DC-RSVP-TE for packet-based MPLS networks.

  • End-to-End (e2e) LSP recovery for optical-based GMPLS networks. Data Connection's e2e recovery support is an extension of DC-RSVP-TE which is fully compliant to the IETF's draft-ietf-ccamp-gmpls-recovery-e2e-signaling specification. This feature enables end-to-end (ingress to egress) LSP recovery in a variety of modes:

    • 1+1 protection
    • 1:N protection with pre-planned rerouting
    • 1:N protection with shared mesh
    • Full rerouting

E2e is fully supported with hot software upgrades/downgrades, Graceful Restart and DCL's fault tolerant state replication functionality.




Network Element Availability Features

In addition to the high availability features for the network, Data Connection offers switch and router OEMs enhanced functionality to increase the availability of their network elements. The common goal of these features is to contain and resolve planned and unplanned system downtime within the individual nodes.

Data Connection's Network Element availability features include full state replication, hot software upgrade/downgrade, and Graceful Restart.




Full State Replication

DC-MPLS is designed to support systems that require carrier-class availability, such that data forwarding is unaffected and MPLS signaling is only affected minimally by failure of software or hardware. DC-MPLS achieves this by replicating state and configuration information to a hot standby, which can quickly take over in case of failure of the primary. The key design goals in providing this function are as follows.

Minimize the performance impact.

  • This is achieved by choosing a minimal number of replication points in the mainline processing and processing these replication requirements efficiently.

Ensure that all active LSPs are maintained across a hardware or software failure.

  • The availability of LSPs for data flow need not be affected by the MPLS signaling software - data can flow regardless of the state of the control plane. However, it is necessary for the control plane to correctly reflect the data plane to ensure that LSPs are refreshed correctly (in RSVP-TE) and that they can be torn down cleanly. DC-MPLS replicates sufficient information to achieve this.

Ensure that no resources are lost.

  • Once a failover has occurred, DC-MPLS audits neighboring components that all parts of the system are in sync. Where there are discrepancies, DC-MPLS cleans up appropriately.

DC-MPLS supports all high availability features even in combination with distribution.

  • Controller or line cards can fail over independently.

  • 1:1 or N:1 redundancy.

The following diagram shows the replication and auditing interfaces in DC-MPLS.


DC-MPLS High Availability



Hot Software Upgrade/Downgrade

There may be a need to upgrade the MPLS software running on a device, either to add new function or bug fixes. In highly-available systems, it is often not acceptable to restart the software, which would cause all LSPs to be terminated. Instead, it is necessary to upgrade the software without a significant interruption in service. DC-MPLS provides this function, largely using the state replication mechanisms described above.

  • A backup instance of the software is started using the up-level version. This can be on a redundant piece of hardware, or on the same hardware as the primary back-level software.

  • The high availability function on the back-level primary replicates the current state to the up-level backup, and the up-level software translates these requests such that it can populate its control blocks.

  • The primary is then stopped and the up-level back-up takes over, using the same mechanism as high availability.

  • The up-level version can also be downgraded by the same mechanism, with the up-level version being responsible for translating the replication requests into a format understood by the down-level version.

In this way, devices can upgrade and downgrade the MPLS software without a significant interruption to service.




Graceful Restart

Data Connection supports Graceful Restart for both Packet and Optical based MPLS and GMPLS networks and is compliant with RFCs 3473 and 3478.

Graceful Restart, which is synonymous with hitless and control plane restart, is the ability of the control plane to restart and re-learn the state information through the data plane, safely-stored configuration, or peer nodes. This allows LSPs to remain in service and unaffected during certain network element failures and provides a method to alert peer nodes of a failure. This technique can typically be used to also provide hot software upgrade.



For more information about Data Connection's MPLS products and expertise contact dcmpls@dataconnection.com.