Clever Cloud Status

Incident History

Full history of incidents.

Newest first

July 2023

DNS migration
Fixed · cleverapps.io domains · Global

We have asked our provider to transfer the domain name cleverapps.io. The transfer ends at 12:30 UTC and we saw that records are missing or have not the right value.

EDIT 15:00 UTC : we have found that NS records and SOA records was not good, we have updated it. EDIT 16:00 UTC: everything is back to normal.

Fixed · cleverapps.io domains · Global

Following the yesterday deployment, we had issues with http and tcp redirections which cause infinite loop and timeouts. We are investigating the issue.

EDIT 09:00 UTC The issue was found and fixed

Fixed · cleverapps.io domains · Global

Some apps are not availables

21h11: only apps with redirect_https enabled are impacted

21h56: we rollback to the old cleverapps loadbalancers

Fixed · Infrastructure · Global

20h15: We have lost our hypervisors on SYD region 20h30: Our infrastructure provider on SYD lost its connectivity 21h00: hypervisors are back online

Fixed · Reverse Proxies · Global

We are currently experiencing issues on reverse proxies of the JED region. We are investigating them.

EDIT 16:44 UTC: The root cause has been identified and a fix has been applied. We are monitoring the results.

EDIT 16:50 UTC: The service is now operational.

Fixed · FS Buckets · Global

An FSBucket server is currently being investigated for connection timeouts when mounting buckets. The problem has been partially identified and a first fix has been applied. Additional steps will be taken shortly to make sure everything is working as intended.

EDIT 10:49 UTC: The underlying issue has been fixed. Some applications may have had troubles mounting FSBuckets, writing or reading files stored on that server between 08:50 UTC and 10:25 UTC. Impacted applications are currently being redeployed out of caution (most of them successfully reconnected to the server after the fix has been issued).

June 2023

Fixed · Access Logs · Global

We are detecting some errors on our storage layer responsible for storing metrics and access logs data. Queries were unavailable.

Edit 14:10 UTC: query is re-open

We continue to investigate.

Fixed · Infrastructure · Global

The monitoring system has detected that an hypervisor is unreachable. We are investigating.

EDIT 16:27 UTC: The hypervisor took some time to reboot but it is now up and running. We are making sure services are working fine following this incident.

EDIT 17:10 UTC: The incident is now over. The underlying problem has been identified but the hypervisor is currently in the upgrade queue.

Fixed · Global

An hypervisor on the Paris region needs to be rebooted due to a kernel issue. The reboot will take place tonight (June 21, 2023) at 18:00 UTC. Services on that hypervisor are already migrated apart for a few of them. Impacted customers will shortly receive an email with more details.

EDIT 18:14 UTC: The maintenance is starting

EDIT 22:00 UTC: The maintenance is now over

Fixed · Global

An hypervisor on the Paris region needs to be rebooted due to a kernel issue. The reboot will take place tonight (June 21, 2023) at 20:00 UTC. Services on that hypervisor will be migrated starting at 18:00 UTC. Impacted users will shortly receive an email with more details.

EDIT 18:13 UTC: The maintenance is starting

EDIT 23:11 UTC: The maintenance is now over

Fixed · Deployments · Global

A deployment issue has been identified, we are working on a fix.

EDIT 20:43 UTC - fixed.

Fixed · Access Logs · Global

We are detecting some errors on our storage layer responsible for storing metrics and access logs data. We are investigating.

Edit 04:58 PM UTC: A storage node had a hardware issue, it has been rebooted.

Fixed · Access Logs · Global

We will start a maintenance this Tuesday designed to improve performance on our storage layer for metrics and access-logs. During the maintenance, you may not see latest datapoints and access-logs.

Maintenance will start 20 of June, at 03:30 PM UTC.

Edit 03:45 PM UTC: maintenance is starting.

Edit 04:58 PM UTC: maintenance is over.

Fixed · API · Global

MySQL add-on API started to timeout while trying to create add-ons. Currently created add-ons still work, though.

We are investigating the issue.

EDIT 09:00 PM UTC: the root cause has been corrected.

Fixed · Access Logs · Global

We will start a maintenance this Sunday designed to improve performance on our storage layer for metrics and access-logs. During the maintenance, you may not see latest datapoints and access-logs.

Maintenance will start 18 of June, at 02:30 PM UTC.

EDIT 02:36 PM UTC: maintenance is starting.

Edit 08:21 PM UTC: maintenance is still on-going, storage layer is a few minutes late on average.

EDIT 08:51 PM UTC: maintenance is over, we are catching up lag

EDIT 08:00 PM UTC. An error during catching up the lag has put the storage layer into an inconsistent state. Queries are disabled for now

EDIT 11:00 PM UTC: storage layer is still inconsistent

EDIT 00:47 PM UTC D+1: storage layer is (finally?) consistent. We are catching up the lag

EDIT 04:30 PM UTC D+1: We have catch up the lag.

EDIT 07:29 AM UTC D+1: storage layer got inconsistencies. We are investigating the reason why.

EDIT 08:10 AM UTC D+1: storage layer is up and running. We are consuming the lag. Queries are disable during this phase.

EDIT 08:45 AM UTC D+1: We have consumed the lag. Queries are available.

Fixed · Infrastructure · Global

The monitoring system has difficulties to reach some services. We are investigating...

EDIT 00:50 UTC : The monitoring do not see network issues anymore.

EDIT 01:00 UTC : The monitoring has detected connectivity issues, we are fixing.

EDIT 01:30 UTC : The monitoring has detected new connectivity issues, we are on it.

Fixed · Infrastructure · Global

We are impacted by our infrastructure provider incident, you can get more details by following their incident website : https://network.status-ovhcloud.com/incidents/9vzvvwrm69ps

Fixed · SSH Gateway · Global

SSH connections may fail with the message 'Error: This application has no instances you can ssh to' or may ask you a password during the connection initialization. We are currently investigating this issue.

08:10 UTC : We have found the component causing this issue and restarted it. We are still investigating the root cause.

21/06 : The problem was most likely caused by the network instability observed at this time. We haven't detected any problems since.

Fixed · Infrastructure · Global

One hypervisor only responds to ping. It does not take new VMs anymore and does not delete VMs that should be deleted.

19:57 UTC: We are going to reboot it. Some databases (that run on this hypervisor) will become unresponsive for a few minutes.

20:18 UTC: Hypervisor has been rebooted. All services hosted on it have been checked: everything is up and running.

Logs show a kernel panic.

Fixed · Services Logs · Global

Live logs system storage layer falls in read-only mode. we are investigating the issue.

EDIT 09:30 UTC : Following the incident https://www.clevercloudstatus.com/incident/669, the storage layer did not perform scheduled tasks.

EDIT 09:45 UTC : The storage layer is accepting write. Logging system is operating normally.