Cloud | Amsterdam Incident Details - Incident details

Updates

Postmortem
11 March 2026 at 14:23 GMT+0
Postmortem
11 March 2026 at 14:23 GMT+0
Root Cause Analysis

Issue
On March 4, 2026, a storage capacity condition in one of the storage clusters within the Amsterdam region resulted in temporary write operations being blocked. This caused virtual machines relying on the affected storage system to experience disk I/O failures and service disruption.

The issue occurred during a maintenance activity involving a storage platform upgrade. While the upgrade completed successfully, a combination of background data redistribution and unusually high write activity caused storage utilization to exceed a critical threshold, triggering a protective mechanism that temporarily blocked new write operations.
Service functionality was restored after emergency actions were taken to reclaim storage capacity.

Timeline (UTC)
- March 4, 2026 – 15:50 : Storage platform upgrade completes and background data redistribution begins
- March 4, 2026 – 16:20 : Storage utilization reaches critical threshold; cluster transitions to write-blocking state
- March 4, 2026 – 16:21 : Monitoring system becomes unavailable as dependent virtual machines experience storage errors
- March 4, 2026 – 16:23 : Multiple customer reports indicate widespread virtual machine availability issues
- March 4, 2026 – 17:25 : Investigation identifies storage capacity exhaustion as the underlying cause
- March 4, 2026 – 17:28 : Storage engineering team initiates emergency capacity recovery procedure
- March 4, 2026 – 17:40 : Write operations restored and affected virtual machines begin recovering
- March 4, 2026 – 18:00 : Platform functionality verified through system tests
- March 4, 2026 – 18:59 : Remaining virtual machines recovered and incident closed
Root-cause
The incident occurred due to a temporary spike in storage utilization caused by two simultaneous conditions. After a storage system restart during maintenance, an automated data redistribution process began, temporarily increasing storage usage while data was being rebalanced across nodes.

At the same time, unusually high write activity from a workload consumed additional storage capacity. Because the cluster was already operating at relatively high utilization, the combined effect pushed storage usage beyond the system’s safety threshold. As a result, the storage platform automatically blocked write operations to protect data integrity, causing virtual machines using the affected storage to experience disk I/O failures.

Action-Items
- Implement a mandatory pre-maintenance validation step to ensure automated data redistribution mechanisms are disabled or controlled during storage service restarts or upgrades.
- Introduce capacity safeguards in maintenance procedures to prevent upgrades from being executed when storage utilization exceeds safe operational thresholds.
- Increase available storage capacity within the affected region to maintain sufficient headroom for background data operations and workload spikes.
- Review operational procedures for scheduling maintenance activities to ensure adequate system capacity and visibility during upgrades.
Resolved
4 March 2026 at 19:28 GMT+0
Resolved
4 March 2026 at 19:28 GMT+0
We are happy to inform you that the major outage in Amsterdam affecting our Cloud services has been resolved. However, if you continue to experience any issues, please do not hesitate to contact our support team. Our team will be happy to assist you and ensure that any further concerns are addressed promptly.
We appreciate your patience and understanding throughout this incident, and we thank you for your cooperation.
A formal Root Cause Analysis (RCA) is currently being prepared and will be published once available.
For further assistance, please contact our support team via support@gcore.com
Monitoring
4 March 2026 at 18:00 GMT+0
Monitoring
4 March 2026 at 18:00 GMT+0
We are pleased to inform you that our engineering team has implemented a fix to resolve the major outage in the Cloud service. However, we are still closely monitoring the situation to ensure stable performance.
We will provide you with an update as soon as we have confirmed that the issue has been completely resolved.
Identified
4 March 2026 at 17:32 GMT+0
Identified
4 March 2026 at 17:32 GMT+0
We are continuing to work on a fix for this incident.
Update
4 March 2026 at 17:10 GMT+0
Update
4 March 2026 at 17:10 GMT+0
We are currently investigating this incident.
Investigating
4 March 2026 at 16:47 GMT+0
Investigating
4 March 2026 at 16:47 GMT+0
We are currently experiencing a major outage in our Cloud service, resulting in the complete unavailability of the service. We sincerely apologize for any inconvenience this may cause and greatly appreciate your patience and understanding during this critical time.
Our engineering team is actively working to identify the root cause and implement a resolution as quickly as possible. We will provide regular updates as we receive more information on the progress of the resolution
Thank you for your understanding and cooperation.

Gcore - Cloud | Amsterdam Incident Details – Incident details

All systems operational

Cloud | Amsterdam Incident Details