Incidents
Full history of incidents.
August 2019
Logs are currently partially unavailable through the console or CLI. Logs are still collected but display might not show current logs. They may also be out of order.
EDIT 06:22 UTC: Logs are now available again. No logs should have been lost but they might be out of order until 06:15 UTC.
We are currently seeing elevated error rates on the old Cellar cluster. A few nodes went down making operations longer than usual, leading to timeouts or 500 / 503 errors. Nodes are already getting back up.
EDIT 00:21 UTC: The cluster is getting back to normal, errors have already significantly decreased and most of the requests should now be successful. We keep monitoring failed requests.
EDIT 03:00 UTC: No more failed request over the last 30 minutes, the incident is closed. We are still in the process of migrating this cluster data to the new cluster. Until we automatically migrate your buckets, you can migrate them yourself. Feel free to contact our support for more information
The new Cellar cluster (cellar-c2.services.clever-cloud.com) had a brief interruption between 19:31:30 and 19:33:20 UTC on 23/08/2019 where most of the requests couldn't be handled or were dropped if already started. The problem has been identified and automatic actions have restored access to the cluster. The main issue will be investigated.
Deployments are delayed, we are looking into it.
12:11: An orchestrator was experiencing intermittent network issues. The issue is now fixed.
From 20:30 UTC to 21:30 UTC, 16% of the hypervisors of the Paris zone failed to resolve the monitoring service domain name.
Applications which had instances on these hypervisors have been redeployed automatically because the monitoring could not reach them (even though they were available).
The MySQL c4 shared cluster of EU zone is experiencing issues. We are investigations.
EDIT 9:29 UTC: fixed.
July 2019
Metrics are unavailable, we are looking into it. Write requests are still processed.
13:27 UTC: Issue fixed.
One of our hypervisor is experiencing issues and is unresponsive, we are restarting it. The applications on it have been redeployed on other hypervisors. Addons will be down during the restart.
EDIT 22:05UTC: the hypervisor is restarted.
EDIT 22:20UTC: incident fixed.
Our API will go under maintenance at 19:40 UTC. Deployments will be disabled for a few minutes and the Dashboard won't be available either.
EDIT 19:43 UTC: The maintenance is starting, API will be shortly unavailable.
EDIT 19:49 UTC: The maintenance is over!
An exceptional Network Maintenance is planned today at 20:00 UTC. Network interruptions are expected to happen from time to time during a few hours. They shouldn't last long. All applications will be redeployed on non-impacted servers, some add-ons will be unreachable at some point. Unfortunately, we couldn't postpone it due to calendar issues. Do not hesitate to ping us on the support if you have any questions.
EDIT 20:03 UTC: The maintenance should start shortly. We will keep you updated on its progress.
EDIT 20:53 UTC: The maintenance is still ongoing. Nothing unusual to report as of now
EDIT 21:20 UTC: Everything is going smoothly as seen in our tests. Nothing unusual to report as of now
EDIT 21:42 UTC: The maintenance is over. No network interruptions have been noticed by our monitoring systems. Everything is back to normal.
Our payment processor currently has troubles leading our calls to their API to sometimes fail. Multiple endpoints on our API request our payment processor's API and some of them will fail.
Here is a non exhaustive list of affected actions (some of them will succeed):
- Application or add-ons creation
- invoices payment
- credit cards management
EDIT 23:30 UTC: Our payment processor issues should now be resolved. Everything should be back to normal on our side too.
A maintenance on components used by the main API will take place on 2019-07-11 at 10:00 UTC (12:00 CEST). The main API will be unavailable for a few minutes (up to 20).
10:02 UTC: Deployments queued now will be post-poned until the end of the maintenance.
10:04 UTC: The main API is now unavailable.
10:06 UTC: The main API is restarting.
10:09 UTC: Maintenance is over. The main API is available, pending deployments are starting.
A human error caused a configuration error on all public PAR reverse proxies which prevented them from reloading their configuration from 09:20:35 UTC to 09:24:40 UTC.
An automatic restart at 09:21:48 UTC made them unavailable until the configuration was re-generated without the error at 09:24:40 UTC.
Steps will be taken to prevent this error from happening again.
June 2019
We are suffering network loss on some of our servers in one of our datacenters. We are currently aware of the root cause and working on it.
EDIT 22:49 UTC: Our API is also down for now, that's expected. The console is therefore down too. Clients websites remain accessible.
EDIT 23:11 UTC: Network came back 5 minutes ago, we are currently checking if everything is ok
EDIT 23:26 UTC: Applications with fs-bucket (including PHP applications) may have issues loading because their connection to the fs-bucket server, if this server was on the datacenter who lost the connection.
EDIT 00:26 UTC: Applications with fs-buckets are currently redeploying. Most of them successfully reconnected (sometimes after several minutes) to their bucket server. The incident is over.
Deployments will be disabled for up to 15 minutes on Thursday 2019-06-27 at 19:00 UTC (21:00 CEST).
We will perform a migration of the Git repositories. Once deployments are enabled again, you may have to wait a few more minutes depending on your DNS cache.
19:00 UTC: Maintenance is starting, deployments are now disabled (except for Github deployments).
19:13 UTC: The maintenance will last longer than initially planned, we are experiencing an issue and are looking into it.
19:15 UTC: The issue is fixed. We are making sure that everything is indeed fine. Some deployments may now go through, depending on your DNS cache.
19:30 UTC: Maintenance is over; if you encounter an issue, please refresh your DNS cache.
Deployments will be disabled for up to 15 minutes on Thursday 2019-06-27 at 10:00 UTC (12:00 CEST).
We will perform a migration of the Git repositories. Once deployments are enabled again, you may have to wait a few more minutes depending on your DNS cache.
EDIT: This has been postponed.
Deployments will be disabled for up to an hour on Thursday 2019-06-20 starting at 10:00 UTC (12:00 CEST).
It should be quicker than that but if you do have deployments planned, make sure to start them well before the beginning of the maintenance.
EDIT 10:01 UTC: Maintenance is starting now, deployments are disabled.
EDIT 10:19 UTC: Deployments are enabled again.
EDIT 10:31 UTC: Deployments are disabled again. Dedicated reverse proxies for Clever Cloud APIs are out of sync, our APIs are down at the moment. We are working on it.
EDIT 10:39 UTC: Main API is back online.
EDIT 10:47 UTC: Reverse proxies are in sync, deployments are enabled again. We are cleaning up.
EDIT 10:53 UTC: Maintenance is over.
This cluster is experiencing elevated error rates and response times. It is currently a bit overloaded following the restart of a few nodes which crashed.
It should go back to normal in 30 to 60 minutes.
EDIT 8:56 UTC: There are still clean-up operations in progress which slow down the cluster. Error rate is going down though.
EDIT 9:55 UTC: Incident over since 9:40
Some cleverapps.io domains are experiencing time outs, we are investigating the issue
EDIT 18:41 UTC: The problem should now be fixed since a couple of minutes. We gathered information as to why this problem happened and will try to narrow it down.
A human error triggered a lot of false positives regarding applications status. This in turn queued hundreds of automatic deployments.
The issue is now fixed, but deployments will take a little while longer to start until the queue is consumed.
EDIT: Incident over at 09:40 UTC