Incidents

Full history of incidents.

August 2025

[PAR] An hypervisor has crashed 7 months ago

Fixed · Infrastructure · Global

Hypervisor Issue

Issue: One of our hypervisors has crashed, causing service interruptions for addons hosted on the affected infrastructure. Applications are currently redeploying

Status: Our engineering team has been immediately notified and is actively working to restore service. We are investigating the root cause.

[RETROACTIVE] Paris: add-ons load balancer high latency 7 months ago

Fixed · Infrastructure · Global

Duration: 14:25 - 17:45 CEST (3 hours 20 minutes)

Affected Services: Services connecting to add-ons on the Paris region through the impacted load balancer: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, Jenkins.

Impact: One load balancer handling add-on connections on the Paris region experienced increased latency during normal operations. This may have resulted in:

Delayed connection establishment to add-ons
Increased data transfer times (both sending and receiving)

Current Status: Resolved - All metrics returned to normal at 17:45 CEST. The load balancer is now operating within expected parameters.

Next Steps: Root cause analysis is in progress to identify the underlying issue and prevent recurrence. We continue to monitor the situation for a few hours.

Rabbitmq internal cluster communication issue 8 months ago

Fixed · Infrastructure · Global

We are experiencing a communication issue between our internal rabbitmq cluster and several infrastructure components. It's blocking the deployments. We are investigating the root cause.

[Git] Errors with some operations 8 months ago

Fixed · Git repositories · Global

We are investigating issues happening with our Git repositories, some operations are failing

Hypervisor down on PAR region 8 months ago

Fixed · Global

An hypervisor went down on PAR region We successully reboot it

July 2025

Newly booted add-on may not be available 8 months ago

Fixed · PostgreSQL · Global

Newly booted add-ons as PostgreSQL, MySQL, Redis or MongoDB are currently occuring availability issues (when using their public domain names)

We are investigating

Increased deployments failure 9 months ago

Fixed · Deployments · Global

We are investigating failures for deployments to complete. Deployments may fail without logs or any information as to why they failed. We are looking into it.

Failure of hypervisor 9 months ago

Fixed · Infrastructure · Global

06:12 UTC : An hypervisor is down due to disk failure, we're working to bring it up again. Some specific stateful instances (databases...) can be impacted

[Par] Loss of redundancy between two AZ of the Paris region 9 months ago

Fixed · Infrastructure · Global

We lost one dark fiber between two AZ of the Paris region (GDN <-> TH2) causing a redundancy loss of the connectivity between those two AZ. No customer impact is expected and we are looking into it with our providers.

Scalability/region configuration is not working properly 9 months ago

Fixed · Deployments · Global

We are investigating an issue with the configuration of regions and scalabilty. Users are not able to modify these parameters anymore in the information tab ("red" error message at the bottom of the console)

Edit 15:25 CEST: All users are not impacted by the issue (only Java users).

RabbitMQ shared cluster not fully responsive 9 months ago

Fixed · RabbitMQ shared cluster · Global

The shared cluster for our rabbitmq service is experiencing issues. A node seems to be out of the cluster while still running. We are investigating it.

June 2025

[SGP] An hypervisor is unreachable 10 months ago

Fixed · Infrastructure · Global

An hypervisor is unreachable on Singapore, we are investigating

Deployment disruption 10 months ago

Fixed · Deployments · Global

Some deployments are currently blocked, we are investigating the issue

[Paris] Investigating various issues 10 months ago

Fixed · Infrastructure · Global

We are investigating various issues on the Paris region.

Metrics service issues 10 months ago

Fixed · Metrics · Global

Metrics service is currently under heavy load and has issues to hold read/write operations. We are currently investigating and fixing the issue. Lag on metrics can be seen (few minutes at most), no data is lost.

15:14 UTC: Situation is back to normal

[MEA, MTL, SYD, WSW] FS bucket creation issues 10 months ago

Fixed · FS Buckets · Global

Following friday’s incident, FS bucket creation in several regions were prevented by a sporadic network issue.

This also prevented deployments for new PHP applications that did not disable the automatic FS bucket.

This is now resolved.

Load balancers desynchronization 10 months ago

Fixed · Reverse Proxies · Global

Between 12:00 UTC and 13:40 UTC all the load balancers stopped consuming orders. The monitoring alerted in a weird way, which led to a slower on-call response than expected.

The situation has been resolved

Availability issues on the gra-hds zone 10 months ago

Fixed · Infrastructure · Global

We identified availability issues on newly created add-on on the gra-hds zone. We are investigating

Applications creation and management disruption 10 months ago

Fixed · Console · Global

A maintenance led to a disruption which can prevent to create addons and applications. Panels in the console are impacted as well.

Applications and addons are still running, only the console is impacted.

12:22 UTC: Problem is identified, our on-call team is investigating and deploy fixes

13:29 UTC: Problem is fixed

A Hypervisor in RBX is not responding 10 months ago

Fixed · Infrastructure · Global

A hypervisor on RBX is not responding.

All applications on it have been redeployed. The databases services running on it are unreachable.

We are rebooting the machine and investigating