Gcore - 服务 | 地点 事件详情 – 故障详情

所有系统运行中

服务 | 地点 事件详情

已解决
严重故障
开始于 12 天前持续 大约 13 小时

受到影响

捷核系统

严重故障 从 7:24 PM 至 8:15 PM, 运行正常 从 8:05 PM 至 8:41 AM

API

严重故障 从 7:24 PM 至 8:05 PM, 运行正常 从 8:05 PM 至 8:41 AM

计费系统

严重故障 从 7:24 PM 至 8:15 PM, 运行正常 从 8:15 PM 至 8:41 AM

客户门户

严重故障 从 7:24 PM 至 8:15 PM, 运行正常 从 8:15 PM 至 8:41 AM

网站

严重故障 从 7:24 PM 至 8:15 PM, 运行正常 从 8:15 PM 至 8:41 AM

内容分发网络 (CDN)

严重故障 从 7:24 PM 至 8:15 PM, 运行正常 从 8:05 PM 至 8:15 PM, 严重故障 从 8:15 PM 至 8:23 PM, 运行正常 从 8:15 PM 至 8:23 PM, 部分故障 从 8:23 PM 至 8:27 PM, 运行正常 从 8:23 PM 至 8:41 AM

更新
  • 事后分析
    事后分析

    Public RCA about the Incident:

    Date: May 13, 2026 | Duration: 19:08 – 20:14 UTC (1 hour 6 minutes)


    Summary

    On May 13, 2026, from 19:08 to 20:14 UTC, part of Gcore's CDN service experienced a global disruption. CDN edges worldwide failed to process requests and returned HTTP 502 errors for a subset of client resources. The disruption affected gcore.com, the Customer Portal, public API endpoints, and CDN delivery on part of the infrastructure. 


    Impact

    • gcore.com and portal.gcore.com were unreachable.

    • api.gcore.com returned 502 errors, affecting API-based operations across CDN, Cloud, DNS, Streaming, Storage, WAAP, and IAM services.

    • SSO/SAML-based authentication to the portal was disrupted during and briefly after the window.

    • Customers with CDN resources served through the affected infrastructure saw 502 errors for their end-user traffic.


    Root Cause

    This was a stacked-defect incident — three independent gaps in the CDN configuration pipeline combined to turn a single configuration change into a global edge failure. Any one of the three defects, had it been absent, would have prevented the outage.

    1. API input validation gap: An internal origin routing field, originally intended as an admin-only configuration knob, lost its access restriction in a 2023 API rewrite and was later published in the public API documentation (March 2026) without specifying its allowed values. This allowed a non-standard value to be submitted and accepted via the API.

    2. Configuration generation logic error: When the CDN configuration pipeline processed the resource with the non-standard value, a bug in the rule-level config generation silently dropped all origin servers — producing a configuration with an empty upstream list. 

    3. Edge initialization crash: When a CDN edge node received a configuration with an empty upstream list, an edge-side script crashed during the initialization phase. Because the configuration file is global (shared across all resources on a node), this single malformed entry caused the entire node to fail initialization — returning HTTP 502 for all traffic, not just the affected resource. This crash propagated across all edge nodes on the affected infrastructure.


    Timeline (UTC)

    Time

    Event

    19:08

    CDN configuration containing the malformed resource pushed to edge nodes globally

    19:08–19:14

    Edge nodes begin returning HTTP 502 globally

    19:15

    P1 incident declared

    19:24

    Public status page incident posted

    19:42

    Mitigation begins: critical services routed via alternative edge infrastructure

    19:53

    Customer portal migrated to alternative infrastructure

    20:01

    Additional resources migrated

    20:14

    Fix applied, Offending resource disabled via API; edge nodes recover

    22:05

    API-level validation fix merged

    23:06

    API fix deployed to production


    Resolution

    Service was restored at 20:14 UTC by disabling the resource containing the malformed configuration. Engineers had been mitigating the impact since 19:42 UTC by routing critical control-plane services (API, portal) through alternative edge infrastructure.


    Corrective Actions

    #

    Action

    Status

    1

    API-level input validation: reject non-allowed values for the origin routing field

    Deployed 

    2

    Fix config generation logic to correctly handle inherited origin groups, eliminating the silent origin-drop bug

    In progress

    3

    Harden edge initialization to degrade gracefully (rule-level 502) instead of crashing the entire node on empty upstream configuration

    In progress

    4

    Audit related API fields from the 2023 rewrite for similar access control regressions

    In progress

    5

    Review and update API documentation to clearly specify allowed values for all origin configuration fields

    In progress


    We sincerely apologize for the disruption this caused. We are committed to completing the remaining fixes and implementing additional safeguards to prevent a similar configuration pipeline failure from causing a global impact in the future.

  • 已解决
    已解决

    我们很高兴地通知您,我们网站、全球CDN交付、所有服务的API访问以及客户门户的重大故障已得到解决。但是,如果您仍然遇到任何问题,请随时联系我们的支持团队。我们的团队将竭诚为您提供帮助,并确保及时解决您提出的任何其他问题。

    我们还将在获得详细的根本原因分析 (RCA) 后提供该分析报告。

    感谢您在此次事件中的耐心和理解,也感谢您的合作。

    如需进一步帮助,请通过support@gcore.com联系我们的支持团队。

  • 持续监控中
    持续监控中

    The issue has been resolved, and all services have fully recovered. We are continuing to monitor the situation to ensure stability. We apologise for the inconvenience.

  • 更新
    更新

    We are recovering, and all services are up and running. CDN service has mostly recovered; however, it may still be partially unavailable for some users. We are continuing to monitor the situation and working on the fix.

  • 更新
    更新

    Website and Customer Portal are up again. We are continuing to fix other services.

  • 已确认问题
    已确认问题

    The API access has been recovered, and we are continuing to fix other services. We will keep you updated.

  • 调查中
    调查中

    我们的[插入具体服务或操作详情]目前发生严重故障,导致服务完全不可用。对于由此造成的不便,我们深表歉意,并非常感谢您在这段关键时刻的耐心和理解。

    我们的工程团队正在积极努力找出根本原因,并尽快实施解决方案。我们将在收到更多有关解决方案进展的信息后定期更新。

    感谢您的理解与合作。