Incidents

Full history of incidents.

Oldest first

September 2018

A node of the RabbitMQ shared cluster crashed 7 years ago

Fixed · RabbitMQ shared cluster · Global

One of the nodes of the shared rabbitmq cluster crashed. It's currently restarting.

EDIT 18:50 UTC: The node has successfully restarted, the cluster should now be operational as usual

August 2018

Our main API is unavailable 7 years ago

Fixed · API · Global

The main API is unavailable, the console cannot be loaded as well.

We are looking into it.

EDIT 15:30 UTC: Our API is back online. The console can now be loaded.

Hypervisor unreachable 7 years ago

Fixed · Infrastructure · Global

An hypervisor is unreachable, we are working on fixing the issue.

Applications on this hypervisor are being automatically redeployed. Add-ons are unreachable.

EDIT 12:21 UTC: The hypervisor is back online and is restarting the add-ons.

EDIT 12:32 UTC: All add-ons are now reachable.

Deployments interruption during 30 minutes 7 years ago

Fixed · Global

Deployments will be interrupted during 30 minutes at 12:30 UTC+2 today. A core component upgrade will be performed. This will not impact already running applications or add-ons. All deployments will be queued and executed at the end of the maintenance.

The maintenance shouldn't last longer than 30 minutes but it may be possible that some delays occur. We will update this ticket to let you know about the status of the maintenance.

EDIT 12:25 UTC+2: New deployments are stopped to be consumed.

EDIT 12:30 UTC+2: The maintenance has started

EDIT 12:56 UTC+2: Deployments are back since ~10 minutes. We are still cleaning things up

EDIT 13:03 UTC+2: Maintenance is over and was successful. Do not hesitate to contact us if anything's wrong on your side.

Issue with deployment system in the Europe zone 7 years ago

Fixed · Deployments · Global

Deployments are temporarily disabled as we fixed the issue with a component of the deployment system.

EDIT 19:17 UTC: This was actually a false positive from our monitoring. After verifying that the component is working fine and fixing the monitoring probe, we re-enabled deployments.

FS Buckets unavailable 7 years ago

Fixed · FS Buckets · Global

One FS Buckets server is unavailable, we are awaiting news from our provider.

EDIT 05:28 UTC: The server is partially and randomly available: the problem has been identified by our provider: it's coming from the switch the server is connected to. They are working on fixing the issue.

EDIT 08:04 UTC: Issue is fully fixed since 07:30 UTC

Mongodb shared cluster node failure 7 years ago

Fixed · MongoDB shared cluster · Global

MongoDB cluster will not accept writes until failure is fixed.

The failing node is up again.

Cellar add-ons / bucket provisionning issue 7 years ago

Fixed · Cellar · Global

Creation of add-ons and buckets on cellar is temporarily failing. We are working on it

EDIT 15:30 UTC: The creation of add-ons and bucket is now fixed. It may take a little longer than usual but these slowness will be resolved in a few hours

MongoDB shared cluster is unreachable 7 years ago

Fixed · MongoDB shared cluster · Global

There is an issue with the entry point to the cluster.

Users are stretching the "fair usage" concept way above reasonnable limits. We are working with them to enforce the fair usage.

Performance issues on MongoDB shared cluster. 7 years ago

Fixed · MongoDB shared cluster · Global

Performance should have been restored.

We are still watching the cluster.

Logs collector is down 7 years ago

Fixed · Services Logs · Global

A network issue is preventing the logs system from working.

EDIT 13:17 UTC: Logs should be available, the cluster is slowly recovering

EDIT 13:23 UTC: The logs cluster is UP and running again, logs shouldn't have been lost thanks to buffering.

Sorry about the inconvenience.

Clever Cloud provided Git repositories will be read-only for 30 min 7 years ago

Fixed · Global

A maintenance of our Git repositories will be held on Thursday (2018-08-09) at 1pm, UTC + 2.

Write operations like "git push" or "clever deploy" to Clever Cloud repositories won't be possible during 30min. Read access won't be affected during this time.

Thanks for your patience.

EDIT 13:00 UTC+2: The maintenance is starting

EDIT 13:05 UTC+2: The maintenance is now complete. Do not hesitate to open a support ticket if anything goes wrong. Thanks for your patience!

July 2018

FS Buckets Accessibility Issues 7 years ago

Fixed · FS Buckets · Global

We are investigation on connectivity issues on the File System Buckets

EDIT 10:27 UTC: Connections should now be working again. It seemed that already established connections were also impacted and were slower than expected. This should now also be fixed.

EDIT 10:27 UTC: FS Buckets service is now fully operational .

Deployment Issues 7 years ago

Fixed · Deployments · Global

We are currently experiencing issues on our deployment systems.

EDIT 13:25 UTC: Recovery takes longer than expected, we are still working on it.

EDIT 13:59 UTC: We are still working on fixing these issues.

EDIT 14:08 UTC: We are still having issues but deployments can start.

EDIT 14:41 UTC: Deployments performance has been back to normal for more than 15 minutes now. We are still watching the situation closely. If you have an issue, please contact us.

June 2018

Network issue on one of our hypervisors 7 years ago

Fixed · Infrastructure · Global

One of our hypervisor had a network issue for approximately 5 minutes.

Some of our internal services were impacted by this network issue and thus, automatic re-deployment of applications has been delayed.

Everything is back to normal, applications are currently finishing their redeployment.

Logs system and a redis cluster are unreachable 7 years ago

Fixed · Services Logs · Global

Due to an ongoing maintenance from our provider, the logs system and a redis cluster of shared (and free) redis are unreachable. Logs may be lost. It should not last than 15 minutes according to them. A few minutes might be needed to restart the logs cluster.

Redis should be back as soon as the maintenance ends

EDIT 13:35 UTC: The maintenance is still ongoing

EDIT 13:50 UTC: The maintenance is over. Redis cluster is UP. Logs cluster is getting back UP. Logs should be saved but might not be directly available through the console

EDIT 14:30 UTC: The logs cluster is now fully operational too

Hard drive failure on one hypervisor 7 years ago

Fixed · Infrastructure · Global

One of our hypervisor has hard drive I/O failures. We are looking into it

EDIT 11:08 UTC: The server was shutdown a few minutes ago. Applications on it are being redeployed. Add-ons are currently unavailable

EDIT 11:52 UTC: We are still waiting for news from our provider regarding the hard drives issue

EDIT 21:20 UTC: Our provider is still working at finding the root cause of the issue

EDIT 2018-06-29 07:05 UTC: We received an answer from our provider and the server can't be brought back online. Databases will need migration. We are waiting an answer to know if we can access the disk in a read only mode to transfer the databases. If not, backups from the the 28th June will be used.

EDIT 2018-06-29 07:18 UTC: The disks can't be read. Backups will need to be used

Metrics problem 7 years ago

Fixed · Access Logs · Global

Metrics are experiencing issues.

Logs collector restart 7 years ago

Fixed · Global

The logs collector needs to be restarted. Some logs might be lost for one to two minutes.

EDIT 22:00 UTC: Restart took approximately 30 seconds, most applications sent again the logs they couldn't send during that time

HV down 7 years ago

Fixed · Infrastructure · Global

HV is down/unreachable. There seems to be a hardware problem. We are investigating it.

Some databases are unreachable.

EDIT 2018-06-18T23:25:00 UTC: Seems to be a malfunctioning fan. The server is still down for investigation. We are waiting for more informations from our hypervisor provider. EDIT 2018-06-19T00:37:00 UTC: The malfunctioning fans have been replaced. The server is up again. All the databases are up and running.