Clever Cloud Status

Incidents

Full history of incidents.

Oldest first

March 2023

Fixed · API · Global

Clever Cloud Core API is currently experiencing performance issues. We are investigating it.

EDIT 16:03 UTC: We are seeing improvements, we continue to monitor the situation and keep investigating the root cause. We continue to add more data collection around the various points of contention.

Fixed · Infrastructure · Global

An hypervisor went down, we are investigating. Applications are being redeployed.

Update 11:11 AM UTC: The hypervisor has been rebooted, add-ons should be reachable. Root cause of the issue will be determined later. In the meantime, applications hosted on that hypervisor are still redeploying. We continue to monitor the situation.

Update 03:13 PM UTC: the same hypervisor went down again. It has been rebooted. Add-ons should be reachable. In the meantime, applications hosted on that hypervisor are still redeploying. We continue to monitor the situation.

Fixed · API · Global

Clever Cloud Core API is currently experiencing performance issues. We are investigating it.

EDIT 14:37 UTC: We are seeing improvements, we continue to monitor the situation.

EDIT 16:23 UTC: The incident is now over.

Fixed · Deployments · Global

We are facing a network issue between MTL and our control plane causing some deployment issues. A workaround has been found and deployments are, as of now, OK on this region. A ticket has been opened in our subcontractor to solve the root cause.

EDIT 03/03 02:15 PM UTC: Connectivity between MTL and our control is to fully restored.

Fixed · Deployments · Global

We are experiencing failures when deploying apps to RBX. We are investigating.

EDIT 10:32 AM UTC: a connectivity issue have been detected between RBX and our control-plane. The issue is now fixed.

February 2023

Fixed · Global

A maintenance has been planned on our Ticket Center tool February 28th, 2023 at 19:00 UTC. Users will need to refresh their Clever Cloud Console (https://console.clever-cloud.com) to complete the update. Otherwise, the Ticket Center might display an authentication error. During that time, actions on tickets (creation, comment, ..) might fail.

The maintenance is expected to last 5 minutes. If you urgently need to contact us, you can send an email to support@clever-cloud.com

EDIT 19:38 UTC: The maintenance is now over. Actions on the ticket center should be fully available. If you encoutner any problems following this update, please email us at support@clever-cloud.com

Fixed · Global

We need to conduct an update on our Jeddah hypervisors on February 28th, 2023. Services of impacted users will be migrated starting at 20:00 UTC before the update begins.

Impacted users will receive an email for each impacted service.

EDIT 2023-02-28 20:25 UTC: The maintenance is starting

EDIT 2023-02-28 22:18 UTC: The maintenance is now over.

Fixed · Infrastructure · Global

We are currently experiencing degraded performances towards github.com services from our Paris infrastructure. We are investigating the issue. Tools relying on GitHub (composer, go, ...) might take longer than usual to fetch their dependencies or experience connections timeouts / instabilities.

EDIT 15:48 UTC: We are seeing improvements and the situation is currently back to normal. The root cause seemed to be a BGP announce change from GitHub's side that made our traffic go through suboptimal routes, leading to degraded performances. We keep monitoring the situation.

EDIT 16:30 UTC: The incident is fully resolved.

Fixed · API · Global

Clever Cloud Core API is currently experiencing performance issues. We are investigating it.

Fixed · Global

This is a follow up for the various hypervisors incidents we had those last weeks. A first batch of hypervisors will be updated to try and fix the issue. Impacted users will shortly be contacted by email.

The reboot is planned tonight (15/02/2023) at 22:00 UTC. Maintenance will start at 21:00 UTC.

EDIT 21:07 UTC: The maintenance is starting. Add-ons will be automatically migrated in the next few minutes.

EDIT 22:52 UTC: The maintenance is over.

Fixed · Infrastructure · Global

An hypervisor went down, we are investigating. Applications are being redeployed.

EDIT 22:47 UTC: The hypervisor is back online with add-ons UP since a few minutes. Root cause of the issue will be determined later. In the meantime, applications hosted on that hypervisor are still redeploying. We continue to monitor the situation.

EDIT 23:44 UTC: The incident is now over. Sorry for the inconvenience.

Fixed · Deployments · Global

We are currently seeing applications having troubles complete their deployments, especially when using dedicated build VM. They may be stuck or very slow at the cache archives upload. We are investigating.

EDIT 10:55 UTC: The root cause has been found. It was only impacting multipart uploads. For deployments already at the upload phase, you will need to cancel the current deployment and start a new one for the problem to be fixed. Sorry for the inconvenience.

Fixed · Infrastructure · Global

An hypervisor went down, we are investigating.

EDIT 22:24 UTC: The hypervisor is up again since 10 minutes. Add-ons are available again. We make sure all applications were redeployed.

EDIT 00:17 UTC: The incident is over.

Fixed · Infrastructure · Global

At 16:29 UTC, a staff member started investigating an alert on one of our hypervisors. They saw the hypervisor could not be logged into anymore.

All services running on that hypervisor are still up and running, but deployments fail to stop the obsolete VMs and we cannot connect to the host itself. We are considering a "semi" kernel crash on the hypervisor's host. We are investigating and may reboot the hypervisor in the following minutes/hours. (First, we try migrating as much important services as possible to avoid causing too much downtime to our customers.)

EDIT 16:46 UTC: We are starting to migrate add-ons on the impacted hypervisor.

EDIT 18:54 UTC: We rebooted the hypervisor, everything went well, all the remaining services are UP again.

Fixed · Git repositories · Global

Between 12:39 UTC and 20:10 UTC, some users may have experienced an error message WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! when pushing code using git+ssh on our Git repositories. This was due to an update of the allowed signature algorithms of our SSH servers. Users that had an old signature algorithm stored in their known_hosts ssh file were impacted.

The change has been rolled back.

January 2023

Fixed · MySQL shared cluster · Global

A few customer complains about performance issues on MySQL shared cluster. We are investigating.

EDIT 10:00 UTC We have made a hardware upgrade to the MySQL shared cluster

Fixed · Infrastructure · Global

Monitoring detect an increasing number of unreachable virtual machines. It seems related to an update deployment.

EDIT 01:00 UTC the update deployment has been rollback

Fixed · API · Global

Monitoring report that the number of timeout increase on the Clever Cloud API. We are investigating why.

EDIT 9:08 UTC : Backends behind Clever Cloud API are up and running. Numbers of timeouts have decreased. Everything is operating normally.

Fixed · FS Buckets · Global

One server that host FSBucket need additionnal disk space.

EDIT 10:10 UTC Operation to increase the disk space is done. We are redeploying the associated applications

Fixed · Services Logs · Global

Live logs system has an issue with the storage backend that put it to read only mode

EDIT 22:56 UTC : The storage backend has left the read-only mode