Incidents
Full history of incidents.
March 2019
We are investigating an elevated error rate and elevated response times on Cellar. Only some buckets / files are affected by this issue.
EDIT 14:01 UTC: Error rate is back to normal. Response times are going down, we are still watching the situation closely.
EDIT 15:40 UTC: We are seeing an elevated error rate again, this was caused by a restart of a node which triggered a very high load on other nodes (which is not supposed to happen). We are investigating.
EDIT 16:30 UTC: The error rate went down significantly but it's not over yet. We sadly cannot give any meaningful ETA as of now.
EDIT 16:55 UTC: The error rate is close to normal. One node is still in trouble and it's causing a few errors; it should resolve quickly.
EDIT 17:15 UTC: The failing node went back to normal at 17:02. We are still seeing a few errors for write requests as of now.
EDIT 17:23 UTC: The error rate is back to normal. A few nodes are still a bit slower than usual so performance is a bit hit or miss but it should go completely back to normal in up to an hour.
February 2019
We are experiencing TLS issues on some HTTPS requests.
EDIT 15:33UTC: fixed.
Intermittent network issues have been identified affecting several systems. Those issues resulted in various timeouts or longer than expected connections to databases or applications.
We didn't see any new timeout since 23:45 UTC but we continue to monitor the service.
Live logs are currently unavailable. Newer logs should be available by refreshing the logs panel. Logs drains may be impacted to.
EDIT 10:30UTC: fixed.
Due to a incident one Online Datacenter: https://status.scaleway.com/incident/286, we are experiencing issues.
EDIT 21:00 UTC: fixed.
An Hypervisor restarted. Applications have been redeployed and add-ons are restarted.
EDIT 15:28 UTC: All add-ons should be back online, some of them took longer than expected to recover. The cause of the reboot will be investigated.
Deployments are currently unavailable due to an ongoing issue with our deployment system
EDIT 16:42 UTC: The root cause has been found. We are redeploying core components to clean everything.
EDIT 16:50 UTC: Deployments are available since a few minutes now. We are still cleaning things up. Sorry about the issue
We are experiencing issues on one hypervisors which can impact addons.
EDIT 20:50UTC: we are hard rebooting the hypervisor.
EDIT 20:55UTC: the hypervisor is up, the addons hosted on it are starting.
EDIT 20:58UTC: fixed.
Some deployments are having a delay to start.
10:27 UTC: The issue is now fixed
Applications in the console are reporting an "unknown" state.
We are investigating the issue. Deployments are stopped until we find the root cause.
EDIT 16:35 UTC: Applications state should now be OK. Deployments are still stopped until we figure out the issue.
EDIT 16:40 UTC: Problem has been identified. We will resume deployments in a few minutes. All deployments action were queued and will be consumed.
EDIT 16:42 UTC: Deployments are enabled again. It may take a few minutes before your actions are handled. We consider this incident over.
We have an abnormal amount of 503 errors served by our reverse proxies. We are investigating.
EDIT 11:31 UTC: Cause has been identified, we are currently fixing the issue on our reverse proxies.
EDIT 11:34 UTC: All reverse proxies now have a consistent state. The issue is fixed.
The issue happened after a configuration error made during a manual operation on some of the reverse proxies. Applications that redeployed since 11:08 UTC were impacted by that issue. Other applications were fine. The changes were rollbacked and will again be tested thoroughly on our test infrastructure.
A maintenance is scheduled on Monday 2019-02-18 at 11:00 UTC (12:00 noon, Paris time (CET)); it will affect the main API, deployments and GIT repositories.
The maintenance will last at least 5 minutes but no more than 20 minutes.
Different parts of the system will be affected throughout this maintenance, please wait until the end of the maintenance before reporting any issues you may be having.
EDIT 11:00 UTC: The maintenance will start in a few minutes. Deployments and GIT repositories will be unavailable. The console might report an "unknown" or not up-to-date state for applications. This is expected.
EDIT 11:05 UTC: Maintenance is starting, deployments are down and so are GIT repositories (push actions will be rejected)
EDIT 11:09 UTC: Deployments are available again. Push actions on GIT repositories are still disabled.
EDIT 11:10 UTC: Our main API is entering read-only mode. 500 errors might appear during this time.
EDIT 11:11 UTC: Git repositories are now available. You might need to clear your DNS cache to be able to push again.
EDIT 11:20 UTC: Our main API should be fully available again. We are looking if everything looks fine.
EDIT 11:23 UTC: Everything is looking fine. The maintenance is over. You might experience git push errors up until 45 minutes. To avoid that, please clear your DNS cache.
The main API is having issues with its databases connections, it only ever replies with a 500 error to most requests.
EDIT 07:11 UTC: The issue is resolved
Our deployment system will be unavailable from 11:00 UTC up to 13:00 UTC on Wednesday, 6th of February.
All deployments actions will be queued and started once the deployment stack is back up. The maintenance shouldn't last longer than 2 hours.
Feel free to ask any question on our support regarding this maintenance.
EDIT 11:03 UTC: the maintenance will start soon. Deployments will be shutdown in a few minutes. Push actions on our GIT repositories are disabled.
EDIT 11:06 UTC: Deployments are shutdown
EDIT 11:20 UTC: Deployments should be back, we are still cleaning up things
EDIT 12:20 UTC: We have been keeping a close eye on deployments, everything is going smoothly. Maintenance is over.
Deployments on the Paris zone are currently unavailable. We are investigating the issue and working on bringing them back
UPDATE 9:40 UTC: deployments are back since 20 minutes, we are still cleaning things up.
UPDATE 10:30 UTC: Everything is back to normal, sorry for the issue.
January 2019
We have problems on our deployments systems. We are investigating.
EDIT 8:21 UTC: fixed.
Deployments are currently slowed down. We are working to bring back them to their regular speed
EDIT 18:10: Deployments should be back to normal.
We are running maintenance on live logs and logs drains. There will be some unavailability of these services.
EDIT 11:05 UTC: maintenance is finished.
We will disable MongoDB addon creation and deletion while we are doing a maintenance to add new features. We will edit this post when the maintenance will be finished.
EDIT 16:29 UTC: the new addon dashboard is available. We are continuing the maintenance.
EDIT 17:30 UTC: maintenance finished.
We are experiencing issues on our API which is impacting the console. We are investigating.
EDIT 18/01/19 00:53 UTC: Root issue is most probably identified. The issue was coming from an internal tool. We will investigate this further. In the meantime, the tool has been deactivated and shouldn't cause any harm.