On September 6, 2024, a critical network incident occurred in the Luxembourg-1 (ED-9) data center and the Cloud Platform services. The issue arose due to multiple instances of MAC address flapping across some of the production VLANs, which triggered the EVPN protocol to blacklist the affected MAC addresses, disrupting network traffic.
The root cause of the incident was the excessive movement of MAC addresses between network interfaces, likely caused by misconfigurations or instability in the network topology. As a result, several MAC addresses were blacklisted and subsequently recovered as the network stabilized. The service disruption lasted approximately from 12:46 to 14:10 UTC, affecting API availability and related network services.
Immediate actions were taken to stop the flapping behavior, and automatic recovery of the blacklisted MAC addresses helped restore network traffic. Corrective actions are being implemented to prevent future occurrences, including removing list of VLANs, improving network isolation, and working with Arista to ensure stability configuration in EVPN configurations.
This incident highlights the need for enhanced network policy enforcement, stricter separation of production and development environments, and better monitoring to detect early signs of network instability.
Key Points:
Date: September 6, 2024
Impact: Cloud Platform services in Luxembourg-1 (ED-9) experienced disruption due to MAC address blacklisting.
Root Cause: MAC addresses flapping on VLANs caused by preliminary switch misbehavior. To be confirmed or rejected by Arista.
Mitigation: Automatic recovery of blacklisted MAC addresses and subsequent network reconfiguration.
Next Steps: VLAN cleanup, and network isolation to prevent recurrence