Gcore - Streaming | API Incident Details – Incident details

System Under Maintenance

Streaming | API Incident Details

Resolved
Partial outage 0 %
Started 2 months agoLasted about 1 hour

Affected

Streaming

Partial outage from 9:52 AM to 10:49 AM, Operational from 10:49 AM to 11:11 AM

API

Partial outage from 9:52 AM to 10:49 AM, Operational from 10:49 AM to 11:11 AM

Updates
  • Postmortem
    Postmortem

    During a scheduled maintenance update on April 21, our Streaming service experienced an unplanned outage lasting approximately 1 hour and 9 minutes. The disruption caused our public API to return errors and made the service temporarily unavailable to customers. The issue was fully resolved by 10:36 UTC, and our team has taken steps to prevent recurrence.
    A routine database update was deployed as part of a planned maintenance window. During the final step of that update, two database operations conflicted with each other at the same moment, causing the database cluster to lose synchronization across all its nodes. The database became unavailable, which cascaded into API errors visible to customers.

    Timeline

    09:22 Scheduled database maintenance started as planned.
    09:27 Database cluster became unavailable due to a conflict during the final step of the update.
    09:29 Automated monitoring alerts fired. DevOps team notified.
    09:32 Engineering team confirmed the database failure.
    09:33 Public API began returning errors. Investigation started immediately.
    09:47 Formal incident declared. Status page updated to maintenance.
    09:54 Root cause confirmed from production logs.
    10:19 DB recovery is underway. 
    10:30 Database confirmed operational in single-node mode.
    10:36 Streaming API fully restored. Customer-facing service operational.
    10:45 Incident closed. Status page updated to resolved.

    Mitigation

    • Isolated the database. To safely restore service, the database cluster was reconfigured to run in single-node mode, removing the multi-node synchronization that had failed.

    • Verified stability. The database was reloaded and its operational status confirmed before any traffic was re-routed, ensuring a clean and stable recovery.

    • Restored customer traffic. With the database stable, the Streaming API was brought back online and monitored closely to confirm all customer-facing services had fully recovered.

    Action points:

    • Added pre-deployment safeguards that detect and block high-risk operation sequences before they reach production.

    • Improved automated recovery procedures to reduce the time between detection and database restoration.

    • Reviewed and updated our database migration process to prevent conflicting operations from running simultaneously during updates.

  • Resolved
    Resolved

    We are happy to inform you that the partial outage in our Streaming API service has been resolved. However, if you continue to experience any issues, please do not hesitate to contact our support team. Our team will be happy to assist you and ensure that any further concerns are addressed promptly. We will also provide a detailed Root Cause Analysis (RCA) once it becomes available.

    We appreciate your patience and understanding throughout this incident, and we thank you for your cooperation.

    For further assistance, please contact our support team via support@gcore.com

  • Monitoring
    Monitoring

    We are pleased to inform you that our engineering team has implemented a fix to resolve the partial outage in our Streaming API service. However, we are still closely monitoring the situation to ensure stable performance.

    We will provide you with an update as soon as we have confirmed that the issue has been completely resolved.

  • Investigating
    Investigating

    We are currently experiencing a partial outage in our Streaming API service, which may result in partial unavailability for users. We apologize for any inconvenience this may cause and appreciate your patience and understanding during this time.

    We will provide updates as soon as more information becomes available on the progress of the resolution. Thank you for your understanding and cooperation.