Incidents

Full history of incidents.

February 2018

Fixed · Global

Our monitoring system has detected network connectivity issues. Issues were caused by a network configuration inconsistency, they are solved.

NodeJS build failing due to missing dependency nomnom 8 years ago

Fixed · Deployments · Global

NodeJS applications are failing to deploy because of the missing nomnom module. We are investigating the issue.

EDIT 10:53 UTC: You can create the following environment variable for a temporary workaround: CC_PRE_RUN_HOOK=npm install nomnom@1.8.1 -g

EDIT 11:33 UTC: A fix has been made and the new image version is now deploying on our servers.

EDIT 12:33 UTC: The new image is now live. All NodeJS applications will be redeployed to avoid using a now broken image.

Metrics 8 years ago

Fixed · Access Logs · Global

The metrics data cluster is under unusual load. Metrics display is currently unavailable, but metrics are still collected.

EDIT 17:35 UTC: Service is back to normal and collected metrics have all been correctly persisted.

An add-on reverse proxy is dropping new connections 8 years ago

Fixed · Global

The proxy is being restarted. Some add-ons may be unreachable until it's done.

EDIT 15:42 UTC: Incident over since 15:40 UTC.

Log pipeline issue 8 years ago

Fixed · Services Logs · Global

The log storage cluster is experiencing network issues. We are working on it. In the meantime, only realtime logs are available.

January 2018

An add-on reverse proxy is dropping new connections 8 years ago

Fixed · Infrastructure · Global

The proxy is being restarted. Some add-ons may be unreachable until it's done

EDIT 16:41 UTC: the proxy has been successfully restarted. Add-ons should be reachable again. Applications not supporting the loss of an established connection will be redeployed. We continue to monitor the proxy.

EDIT 17:30 UTC: the incident is now over

Redis cluster is restarting 8 years ago

Fixed · Infrastructure · Global

A redis cluster was down and is restarting

EDIT 20:17:00 UTC: The cluster has been restarted, impacted applications have been redeployed. The incident is over

PostgreSQL addon dashboards and creation unavailability 8 years ago

Fixed · Global

PostgreSQL addon dashboards will be unavailable for about 15 minutes starting on 2018-01-25 at 12:30 UTC

EDIT: Delayed to 12:50 UTC

EDIT 12:50 UTC: Will start in a few seconds

EDIT 13:07 UTC: Maintenance over. If you encounter an issue, please tell us.

Logs unavailable 8 years ago

Fixed · Services Logs · Global

Logs are currently unavailable. We are working on restoring them. All logs sent in the last 30 minutes won't be stored.

EDIT 03:15 UTC: Logs are back again

MongoDB shared cluster upgrade 8 years ago

Fixed · Global

The MongoDB shared cluster needs to be upgraded to have more resources.

Performance issues and or partial outage are to be expected. We will try to keep them as low as possible.

The maintenance starts at 22:00 UTC

EDIT 02:00 UTC: the maintenance is now over

Addon reverse proxy is restarting 8 years ago

Fixed · Infrastructure · Global

An addon reverse proxy is restarting, connections are dropped and impacted applications will be redeployed

EDIT 20:45:00 UTC: The reverse proxy took ~1 minute to restart. It is now restarted

EDIT 20:48:00 UTC: Impacted applications were redeployed as expected. The incident is now over and all add-ons are now reachable again

Deployments are displayed as FAILED even if they succeeded 8 years ago

Fixed · Deployments · Global

All deployments from around 15:40 UTC might be shown in a FAILED state, even though they were successful. It's just a matter of display and the instances, if correctly deployed, are put into production.

The Activity pane (Console), clever status (cli) and the API endpoint /applications/<app>/deployments incorrectly report the deployment status.

Notifications (slack webhooks, mails) correctly report the deployment status (failed or successful) and can be trusted.

EDIT 21:48 UTC: It should now be fixed. Deployments with the "FAILED" state will keep their broken state.

Network instability on some of the infrastructure 8 years ago

Fixed · MySQL shared cluster · Global

Network instability on Online DC2 makes some products unreachable:

Mysql shared cluster
Postgresql shared cluster
Mongodb shared cluster
One of the cleverapps front proxies

Shared mongodb addons connectivity issues 8 years ago

Fixed · MongoDB shared cluster · Global

The shared mongodb cluster is experiencing issues, we're working on bringing it back up.

Only last 4 days of logs stored 8 years ago

Fixed · Services Logs · Global

Due to disk space, we need to lower the number of logs we store, for now. Only the last 4 days are kept, instead of the ideal number of last 7 days.

EDIT 2018-06-15 UTC: All 7 days are now available again.

December 2017

Core component upgrade 8 years ago

Fixed · Global

A core component will be upgraded. Deployments will be disabled for an hour starting at 11:30 UTC. This upgrade should fix some deployments delay among other things.

EDIT 11:31 UTC: Maintenance is starting

EDIT 12:06 UTC: Deployments are back, we are now cleaning some old artefacts

EDIT 13:00 UTC: The maintenance is over

Deployment system slow down 8 years ago

Fixed · Deployments · Global

Our deployment system encounter some slow down. Some application may take longer than usual to deploy. We are working on it

EDIT 19:25 UTC: Those slow downs might require an infrastructure change that will be done next week. Until then, slow downs should be less frequent and less important

EDIT 2017-12-08: 12:00 UTC: Deployments take less time after some fixes on our end. The migration will still happen to entirely fix it. Incident is considered as closed because we don't see any more extra times.

Network issues on a front load balancer 8 years ago

Fixed · Infrastructure · Global

We've observed an elevated error rate on two front load balancers newly added to the pool. We're pulling traffic back from these load balancers.

November 2017

Some applications have troubles to deploy 8 years ago

Fixed · Deployments · Global

Some deployments might have troubles starting a deployment. We are investigating.

EDIT 17h31 UTC: Deployments are disabled for now EDIT 17h38 UTC: Deployments are now back up but may be stopped again in a few minutes if needed EDIT 17h55 UTC: The incident is now resolved. We will keep an eye on it for the upcoming days

Partial network issue 8 years ago

Fixed · Infrastructure · Global

We are experiencing a network issue on one of our front. The support team is actively working on this.

EDIT 14:56 UTC+1: Unreachable servers are being restarted and will be available shortly. In the meantimes, impacted applications are being redeployed

EDIT 15:26 UTC+1: The team is performing the final cleanup. The issue is about to be closed. The remaining apps and add-ons are being restarted.

EDIT 15:50 UTC+1: The outage is now resolved. Contact the support is you encounter any trouble.