Incident History

Full history of incidents.

July 2017

[Cellar] Maintenance - 2nd step 8 years ago

Fixed · Global

We will be doing a maintenance on the Cellar cluster starting on 2017-07-26 at 08:00 UTC.

This is the 2nd step of the maintenance started on the 20th (https://status.clever-cloud.com/incident/31).

This should not have an impact on availability but may have a slightly bigger impact on performance than the first step (which did not have any noticeable impact).

It should take around 10 hours. This is a very rough estimate though, we will be posting updates along the way.

EDIT 08:01 UTC: Maintenance is starting now.

EDIT 11:55 UTC: Everything is going smoothly. Performance impact is very low.

EDIT 19:35 UTC: Maintenance is still in progress. No significant impact; so as for the 1st step, consider this event over.

[Europe] One hypervisor is unreachable 8 years ago

Fixed · Global

One hypervisor is unreachable. Affected applications are being redeployed automatically. Affected addons are unreachable.

EDIT 12:05 UTC: All affected applications have finished redeploying ; we are awaiting an answer from our provider

EDIT 12:47 UTC: Our provider is "running tests" on the affected server and has not given any ETA as of now.

EDIT 13:00 UTC: The server is reporting an hardware error, not disk-related. Our provider is working on fixing the issue.

EDIT 13:31 UTC: The server fails to start. Our provider is giving us another server and will put the disks of the old server into the new one.

EDIT 14:30 UTC: The server is ready, the disks are up and running. We are now rebooting the server in operational mode and will make sure everything starts up fine and will then update the network configuration.

EDIT 15:11 UTC: All databases are available again.

[Cellar] Maintenance - 1st step 8 years ago

Fixed · Global

We will be doing a maintenance on the Cellar cluster starting on 2017-07-20 at 08:00 UTC.

This is a 2-steps maintenance, the second one will be scheduled at a later stage.

This should not have an impact on availability but may have a light to moderate impact on upload / download speeds.

No ETA as of now, we will be posting updates along the way.

EDIT 2017-07-20 08:00 UTC: Maintenance is starting now

EDIT 10:00 UTC: We are expecting the maintenance to end between 21:00 UTC and 2017-07-21 01:00 UTC ; we are seeing no significant impact on upload / download speeds as of now

EDIT 14:45 UTC: The maintenance is running fine and still has no significant impact on performance, we are keeping it as-is. Consider this event over; If something goes wrong, we will create a new event.

Maintenance: Log system will be unavailable on 2017-07-18 10am UTC 8 years ago

Fixed · Global

A maintenance of the logs system will happen at 10am UTC. Applications logs will be unavailable during this maintenance.

The maintenance should not last more than 1 hour.

EDIT 10:18 UTC: Maintenance started a few minutes ago, logs collection will be disabled in a few seconds

EDIT 10:44 UTC: Maintenance is over since a few minutes, logs are now available

API availability issues 8 years ago

Fixed · API · Global

An issue occurred on the main API. It was mostly unavailable, only answering to ~30% of requests at best for close to 10 minutes, until we switched to a backup system.

At this point, most services were available except for logs, events and notifications.

30 minutes after the beginning of this issue, it's now fully available.

Network issue in Europe zone 8 years ago

Fixed · Infrastructure · Global

Network is flaky in the Europe zone, we are seeing intermittent unreachability issues on multiple elements of our infrastructure. We are investigating.

EDIT 06:48 UTC: The network seems to work fine now. Deployments are unavailable, we are working on bringing them back up.

EDIT 07:35 UTC: Deployments have been back up since 07:15, we are still cleaning up the remaining items.

EDIT 07:40 UTC: Everything is cleaned up and functional now. If you have an issue, come ping us.

June 2017

Deployments suspended 8 years ago

Fixed · Deployments · Global

Deployments are disabled for a short maintenance operation.

EDIT 16:12 UTC: Deployments are back

Delayed deployments in Europe zone 8 years ago

Fixed · Deployments · Global

We are currently experiencing performance issues on a component of our deployment system. Deployments are delayed by a few minutes.

[Europe] Maintenance operation 9 years ago

Fixed · Deployments · Global

We are doing a maintenance operation on a component of our monitoring system. Deployments may be delayed until the end of the operation.

This should last no more than 10 minutes. Deployments should not be delayed by more than a couple minutes.

Maintenance operation will start at 09:10 UTC.

EDIT 09:19 UTC: Deployments should go back to normal in the next few minutes. Maintenance is over, we are now checking that everything is working fine.

EDIT 09:24 UTC: Deployments delay back to normal; end of incident

[Europe] One hypervisor unreachable 9 years ago

Fixed · Infrastructure · Global

One hypervisor went down, affected applications are being automatically redeployed. Addons on this hypervisor are unreachable (~2% of dedicated addons in the Europe zone).

We are awaiting news from our provider.

EDIT 15:30 UTC: We are still awaiting a manual operation from our provider

EDIT 15:37 UTC: They have rebooted the server manually but "observed an error" and are "analyzing" the issue

EDIT 16:04 UTC: The power supply is out of order and is being replaced

EDIT 16:55 UTC: The operation is over, the server just rebooted and will now start recovering / cleaning up after the forced reboot. Databases will be coming back online automatically.

EDIT 17:50 UTC: Most databases are available since 17:15 UTC. The remaining databases are now available

Monitoring issue 9 years ago

Fixed · Deployments · Global

An incident occurred in our monitoring tools. Old instances are unable to stop, thus causing instability in applications.

Deployments are stopped until the monitoring is back up and running.

[Europe] Monitoring system issue 9 years ago

Fixed · Deployments · Global

We are working on fixing an issue with our applications and addons monitoring system of the Europe zone. Deployments have been disabled to allow the monitoring to catch up faster.

Addons connectivity issue 9 years ago

Fixed · Infrastructure · Global

The addon gateway has been restarted, some connections have been forcibly closed.

Addons connectivity issue 9 years ago

Fixed · Infrastructure · Global

The addon gateway has been restarted, some connections have been forcibly closed.

[Europe] Deployment infrastructure upgrade 9 years ago

Fixed · Global

A core component of the deployment infrastructure will be upgraded to improve stability and performance. As a result, deployments will be stopped for up to 60 minutes (hopefully less)

EDIT 11:05 UTC: Maintenance is fully over now, deployments have been available since 10:50 UTC.

Deployment delays 9 years ago

Fixed · Deployments · Global

Deployments take more time to start due to higher than usual activity. We are working on fixing the problem.

EDIT 16:00 UTC: The deployment starting time is back to normal

May 2017

Deployment delays 9 years ago

Fixed · Deployments · Global

Deployments take more time to start due to higher than usual activity. We are working on fixing the problem.

Deployments are disabled in the Europe zone 9 years ago

Fixed · Deployments · Global

Deployments are disabled following an incident on a component of our deployment system. We are working on bringing it back up.

ETA is about an hour.

Network split triggered redeployments and causes delays 9 years ago

Fixed · Deployments · Global

Our monitoring system had a small network split making it think applications were unreachable. This triggered a lot of redeployments. This does not make applications unreachable. You might receive some emails with a "Monitoring/Unreachable" deployment reason.

Also, deployments are delayed until we clean the non-important redeployments

UPDATE 5:07PM UTC: Incident has been resolved, sorry for those redeployments

April 2017

One of our reverse proxy for addons is unreachable, leading to addons being unreachable 9 years ago

Fixed · Infrastructure · Global

We are investigating the problem.

UPDATE 12:43PM UTC: The problem has been resolved, we will investigate about why it happened and how to prevent this from happening again.