Incidents

Full history of incidents.

January 2019

[MongDB] Free Shared cluster connections issues 7 years ago

Fixed · MongoDB shared cluster · Global

MongoDB free shared cluster is having troubles accepting connections. We are investigating the issue.

EDIT 16/01/2019 09:45 UTC: The problem might be due to old clients drivers being used on the cluster. We have set up a new cluster (version 4.0.3) which should greatly improve things. You can create a new add-on to migrate your database.

To dump your data from your existing, you can use this command: mongodump -u "${MONGODB_ADDON_USER}" -p "${MONGODB_ADDON_PASSWORD}" -h "${MONGODB_ADDON_HOST}" -d "${MONGODB_ADDON_DB}" --archive --gzip

You can then import the data into the new database by using the mongorestore command displayed in the dashboard of your new add-on.

An automatic migration tool for mongodb should be available in the next few days.

[RabbitMQ] Network loss for one node of shared cluster 7 years ago

Fixed · RabbitMQ shared cluster · Global

One node of the shared RabbitMQ cluster lost its connectivity during 1 minute. It then re-joined the cluster as expected. Real time logs were unavailable at the same time because of that issue.

[Redis] Redis add-on upgrade at 13:00 UTC 7 years ago

Fixed · Global

Redis add-on creation will be disabled starting 13:00 UTC. Dashboard might be unavailable too. This should not include redsmin integration which should remain available.

16:15 UTC: The maintenance is over. Add-on creation and dashboard are now fully available again.

An hypervisor is down 7 years ago

Fixed · Infrastructure · Global

(Hours in UTC)

At 22:27, one of our hypervisors lost access to parts of its disks. Amongst others, It impacted a deprecated front reverse proxy for applications and a front reverse proxy for add-ons (databases). We moved the IP of one of the proxies. The other one, related to the application reverse proxy (62.210.92.244) couldn't be moved and is now unreachable. If you still use it, you should update your DNS records: https://www.clever-cloud.com/doc/admin-console/custom-domain-names/#personal-domain-names

The situation is stabilized. We still consider the infrastructure not fully recovered.

MySQL Addon 7 years ago

Fixed · Global

We are adding new features on MySQL Addon. The addon dashboard and management (creation, deletion) will be offline during the maintenance.

EDIT 15:00 UTC: new addon dashboard is available, but addon creation is still unavailable.

EDIT 17.28 UTC: maintenance is now finished.

Issue with an addons load balancer 7 years ago

Fixed · Infrastructure · Global

16:36:30 UTC: A load balancer stops accepting new connections

16:38:00 UTC: An alert due to an important change in network traffic is triggered

16:39:30 UTC: The load balancer is restarted

Everything is back to normal now.

[Reverse Proxy] A reverse proxy is dropping TLS connections 7 years ago

Fixed · Infrastructure · Global

A reverse proxy is dropping some of the TLS connections it receives

EDIT 10:07 UTC: The reverse proxy has been restarted and the issue seems to be resolved. We are monitoring the situation.

Shared MongoDB experiencing downtime 7 years ago

Fixed · MongoDB shared cluster · Global

The front "mongos" component of the free shared cluster is behaving erratically. We are investigating it.

EDIT: There was sudden drops in free disk space. We change the logging method and it seems to have stabilized the system. We are still working on figuring out the issue.

MongoDB shared cluster authentication issues 7 years ago

Fixed · MongoDB shared cluster · Global

A maintenance operation is in progress on the Europe MongoDB shared cluster.

We are having issues with the authentication component. Open connections are working fine, new connections are impossible for now.

17:21 UTC: It should be fixed. We are making sure.

17:30 UTC: Incident over.

December 2018

[Reverse proxy] Two reverse proxy are having intermittent network failures 7 years ago

Fixed · Infrastructure · Global

Two reverse proxies are having intermittent networking failures. Those reverse proxies are only when your domain is configured to use A records. Domains using CNAME records should be reachable as usual. We are working on it

EDIT 20:15 UTC: Incident resolved, it was due to a network miss-configuration. We will ensure this doesn't reproduce anymore.

[Deployments] Deployments will be unavailable for up to 30 minutes starting at 13:00 UTC on the Paris zone 7 years ago

Fixed · Global

Deployments will be unavailable for up to 30 minutes starting 13:00 UTC because of a maintenance on our deployment system. Deployment actions like START, RESTART, STOP, ... will be unavailable but will remain in queue and will be processed at the end of the maintenance.

EDIT 13:06 UTC: The maintenance is starting EDIT 13:17 UTC: Deployments are now available again. Queued deployments have been processed.

Maintenance is over.

Clever Cloud API 7 years ago

Fixed · API · Global

The Clever Cloud API is currently down, we are investigating.

EDIT 16:53 UTC: API is fixed. We detected a problem on our reverse proxies, we are currently fixing it.

EDIT 16:54 UTC: fixed.

PostgreSQL Addon 7 years ago

Fixed · Console · Global

The PostgreSQL Addon Dashboard is currently unavailable, we are working to fix it.

EDIT 15:17 UTC: fixed.

SSH to instances currently unavailable 7 years ago

Fixed · SSH Gateway · Global

The SSH gateway is currently unavailable. We are working on bringing it back as soon as possible

EDIT 12:18 UTC: We are still trying to figure out a fix for the issue.

EDIT 12:47 UTC: The problem should now be fixed. A configuration error made this incident longer than it should have last. Applications may need to be redeployed to get the SSH service back online.

Sorry about this incident.

Unresponsive reverse proxy 7 years ago

Fixed · Infrastructure · Global

One of our reverse proxies went quite unresponsive but was still able to process some requests and report its state to our monitoring. Most of the requests it received weren't processed. This is now fixed.

Sorry for the inconvenience

MySQL shared cluster overloaded 7 years ago

Fixed · MySQL shared cluster · Global

A MySQL shared cluster is overloaded at the moment. We are looking into which users are over-using it.

14:26 UTC: One culprit has been found. The cluster's load has been reduced significantly.

14:38 UTC: The cluster's load is back to normal since 14:30.

Deployments issue 7 years ago

Fixed · Deployments · Global

Some deployments fail to start and/or are not being properly reported by the API. We are investigating.

08:38 UTC: We are restarting part of the deployment system.

08:49 UTC: Since 5 minutes ago, deployments are being processed with some delay.

08:54 UTC: Back to normal.

API unavailability 7 years ago

Fixed · API · Global

Some API endpoints seem to be currently unavailable making the console unavailable too. We are currently investigating what's causing this.

10:00 UTC: We found the root cause. The console still can't be loaded at the moment but other services should now be available (like deployments) 10:06 UTC: There was an underlying issue causing the console loading. It is now fixed. The incident is now over. Sorry for the inconvenience

Metrics unavailability 7 years ago

Fixed · Access Logs · Global

Metrics are currently unavailable for read requests. Write requests are working as expected.

EDIT 14:04 UTC: Metrics are getting back up

EDIT 14:10 UTC: Metrics are fully recovered. Sorry for the inconvenience

November 2018

Cogentco upgrades/maintenances can affect Montréal (MTL) zone 7 years ago

Fixed · Deployments · Global

As precised on Cogentco status page,

Cogent will be performing code upgrades in the following areas.
During these upgrades, customers in or transiting the area may experience
intermittent periods of packet loss and latency between 15 and 45 minutes 
for the duration of the window.

Location: Paris, France
Start time: 11/30 00:01 CET
End time: 11/30 06:00 CET
Work order number: NC840-119

our link with Montréal (MTL) zone can be affected by issues, so our systems (deployments, monitoring, etc.) on Montréal (MTL) can experiences issues.