Storage | SFTP Frankfurt incident details
Incident Report for Gcore
Postmortem

Several days ago we successfully performed DRF SFTP Storage (drf.origin.gcdn.co) expansion. We added four new same-type disk enclosures and new disk drives. The problem started 2 days later with kernel panic of both cluster storage nodes simultaneously. Servers reboot didn't help. So we requested datacenter support to perform servers and new disk enclosures power drain. This also didn't help. After several tries, we booted servers into servicel OS and found version difference of the firmware for the new and old disk enclosures expanders (controllers). This seems caused some kind of SAS bus failure. We upgraded the firmwares and after disk enclosures power drain the system booted normally, service became available. Upon troubleshooting we also got memory module failure at one storage node and requested server repair. As a result we've added a firmware version check to storage expansion manual to prevent such outages in future.

We will need to perform additional maintenance in the near future. Please follow our Status page for details.

Posted Aug 29, 2023 - 09:24 UTC

Resolved
We'd like to inform you that the issue has been resolved, and we are closely monitoring the performance to ensure there are no further disruptions. We will provide a Root Cause Analysis (RCA) report in the coming days to help you understand what caused the incident and the steps we have taken to prevent it from happening again in the future.

We apologize for any inconvenience this may have caused you, and want to thank you for your patience and understanding throughout this process.
Posted Aug 29, 2023 - 09:23 UTC
Monitoring
We are pleased to inform you that our Engineering team has implemented a fix to resolve the issue. However, we are still closely monitoring the situation to ensure stable performance.

We will provide you with an update as soon as we have confirmed that the issue has been completely resolved.
Posted Aug 25, 2023 - 07:07 UTC
Investigating
We are currently experiencing a degradation in performance, which may result in service unavailability. We apologize for any inconvenience this may cause and appreciate your patience and understanding during this time.

We will provide you with an update as soon as we have more information on the progress of the resolution. Thank you for your understanding and cooperation.
Posted Aug 24, 2023 - 20:30 UTC
This incident affected: Object Storage (SFTP Frankfurt).