Incidents

Full history of incidents.

Oldest first

June 2018

A hypervisor is down/unreachable. 7 years ago

Fixed · Infrastructure · Global

Some databases are unreachable.

EDIT 2018-06-18 16:29 UTC: The hypervisor is up again, the databases are getting back up.

Applications that were on this HV were redeployed on another one.

Network instabilities on *.cleverapps.io on the Paris zone 7 years ago

Fixed · Infrastructure · Global

We have detected some network instabilities on one of our reverse proxy of the *.cleverapps.io domain, affecting the Paris zone. Our network provider has been notified.

EDIT 15:08 UTC: We are still waiting for our network provider to find the root cause of it.

EDIT 15-06-18 13:00 UTC: Instabilities have ceased since this morning. Everything should be back to normal

Rabbitmq shared crashed 7 years ago

Fixed · Infrastructure · Global

One of the nodes of the shared rabbitmq cluster went down. We are bringing it back

EDIT 10:40 UTC: The node has been restarted, we continue to monitor the situation.

EDIT 13:20 UTC: The cluster has been running fine since the incident

Add-on reverse proxy restart 7 years ago

Fixed · Infrastructure · Global

One of our add-on reverse proxy had to be restarted following an increasing rate of connections refuse. We will continue to monitor the situation closely

May 2018

Git repository maintenance 7 years ago

Fixed · Global

Our git repository will be shutdown for up to 15 minutes at 13:30 UTC, May 25th. Deployments will be shutdown and Git push / clone will be unavailable.

EDIT 13:30 UTC: The maintenance has begun. Deployments are shutdown (but are queued) and git repositories aren't available anymore.

EDIT 13:39 UTC: The maintenance is over, deployments and git repositories are available again

VPN connections time out 7 years ago

Fixed · VPN · Global

Some instances have troubles reaching VPN targets through our VPN service, we are investigating. Timeouts or unreachable routes are expected.

EDIT 09:45 UTC: We might have found why connections are hanging, we are currently doing some tests

EDIT 10:10 UTC: The tests worked fine and a fix has been deployed. All connections should have been restarted. If you still experience troubles with connecting to a particular service, please let us know at support@clever-cloud.com with the service you're trying to access

Metrics are unavailable 7 years ago

Fixed · Access Logs · Global

An operation maintenance is in progress on the storage backend of Metrics. Metrics are currently unavailable.

EDIT 14:40 UTC: Metrics are back since 14:15. Performance is gradually coming back to its usual level.

Deployment issues 7 years ago

Fixed · Deployments · Global

Deployments are having trouble to start or complete. We are working on it

EDIT 08:05 UTC: Deployments should be back to normal, we are keeping an eye on the situation.

EDIT 08:33 UTC: Some deployments still won't start

EDIT 09:00 UTC: Deployments should be back to normal again. We are still keeping an eye on the situation and cleaning up the remaining issues

EDIT 12:28 UTC: Again, some deployments are failing to finish even though they appear as successfully done in the logs. We are looking at it

EDIT 13:27 UTC: Deployments are going to be stopped to fully clean the system. It should not last more than 15 minutes. The maintenance is starting now.

EDIT 14:08 UTC: Deployments are available since 13:45 UTC. The maintenance period is over. We keep looking for everything to go back to normal

EDIT 16:30 UTC: Everything seems to be back to normal

DDoS one of our front proxies 7 years ago

Fixed · Infrastructure · Global

We (or a client of us) were targeted by a DDoS attack starting at 10:05 UTC. We removed this IP from our front pool. The issue has been mitigated. We are still watching it.

One hypervisor is down 7 years ago

Fixed · Infrastructure · Global

At 8:13am Paris Time today, our hypervisor hv-par2-036 has been detected as unreachable.
A hard reboot has been requested to our hosting service.
Around 20 add-ons are impacted.

9:17am Paris Time: incident is fixed. All add-ons have recovered.

Network instabilities on one of our reverse proxies 7 years ago

Fixed · Infrastructure · Global

Network instabilities are affecting one of our reverse proxy, leading to packet / requests loss.

EDIT 13:50 UTC: Instabilities have stopped for 10 minutes now, we are still closely monitoring the situation.

PHP Applications: SSH Gateway asks for a password 7 years ago

Fixed · SSH Gateway · Global

The SSH Gateway asks for a password for PHP application instead of letting you connect. We are investigating the issue.

EDIT 08:00 UTC: A new version of the PHP image has been released. Redeploying your application should be enough to SSH again to the machine

Metrics unavailable 7 years ago

Fixed · Access Logs · Global

We have started a maintenance operation on a component of the Metrics cluster. This operation takes more time than expected.

Until it's over, Metrics are not available. Metrics agents on scalers should push the data when the service is back.

EDIT 15:14 UTC: Metrics are back since 15:12 UTC

April 2018

Dedicated add-ons reverse proxy issue 8 years ago

Fixed · Global

A dedicated add-ons reverse proxy stopped accepting new connections at 15:28 UTC and was restarted at 15:31:30 UTC.

Traffic was back to normal at 15:32:00 UTC.

Logs drains are currently stopped 8 years ago

Fixed · Global

Logs drains are currently stopped, we are working on fixing this issue.

Deployments and application status are not working properly 8 years ago

Fixed · Deployments · Global

Due to a network issue, deployments are not working properly. Also, the state of the applications might be displayed wrong (grey disc instead of green one) in the console.

March 2018

Cellar network issues 8 years ago

Fixed · Infrastructure · Global

Cellar is having network issues on some node. Some requests are failing, both requests to GET resources as requests to send resources.

We are investigating the problem

EDIT 19:35 UTC: The problem seems to be gone.. It may be due to a maintenance operation made on the Cellar cluster which shouldn't have caused this. This maintenance has been done multiples times without problems. We will keep an eye on the cluster when this maintenance starts again, probably tomorrow.

Network slow down 8 years ago

Fixed · Infrastructure · Global

Multiple reports are indicating there is a network slow down for some clients. We are investigating the issue. Applications may take higher time than usual to respond

EDIT 15:10 UTC: The source of the problem is one of our customers receiving a DDoS on its application. While the infrastructure can handle such load, we detected a problem with the configuration of our reverse proxies which doesn't allow us to correctly handle the load of this DDoS. We are looking at how we can improve that. In the meantime, traffic targetting that customer's application has been blocked.

EDIT 16:45 UTC: Most of the traffic is filtered. We will continue watch the issue in the following hours

Dedicated add-ons connections issue 8 years ago

Fixed · Global

A dedicated addons reverse proxy is refusing new connections. It is being restarted.

EDIT 11:49 UTC: Incident over since 11:45 UTC

Logs real-time delivery issues 8 years ago

Fixed · Services Logs · Global

Real-time log delivery is affected by an outage on our message broker. Log drains are affected as well. Logs are still archived.

EDIT 17:03 UTC: Real-time delivery is back since 16:50 UTC