Clever Cloud Status

Incidents

Full history of incidents.

Oldest first

November 2023

Fixed · Reverse Proxies · Global

Some cleverapps.io domain might display 404 or 503 instead of the actual website they should point to. We are reloading the reverse proxies configuration.

update 08:40 UTC - the reverse proxies have been resynchronized. We are watching it and looking for the reason of the desynchronization.

Fixed · Deployments · Global

Due to an issue with our message broker, deployments are acting up. We are investigating.

Update 20:04 UTC - We have fixed the broker issue and restarted every service that failed to reconnect. The situation is back to normal.

October 2023

Fixed · Reverse Proxies · Global

We are experiencing issues with Paris public reverse proxies.

These issues impact TLS and the ability to answer correctly.

EDIT 20:32 UTC - fixed.

Fixed · Access Logs · Global

We are detecting some errors on our newly metrics stack. We are on it.

Edit Sat Oct 28 14:51 2023 UTC: infrastructure have been scaled up, optimizations on LBs are underway, you may still experience errors during queries

Fixed · Reverse Proxies · Global

We have to update load balancer in the Paris region. We will remove dns one A record of load balancer, wait for the TTL, update the load balancer behind and then add the dns record back. If you are long running connection, they will be closed at the end of the TTL as we will stop the load balancer.

Edit 15:00 UTC : We start rolling the load balancer records for domain.par.clever-cloud.com

Edit 15:50 UTC : We have finished to do the rolling of the first ip address (46.252.181.103), next ones should be faster.

Edit 16:00 UTC: We have removed the second record (46.252.181.104), we are waiting for the ttl to expire before beginning

Edit 16:10 UTC: We have added back the second record (46.252.181.104), we are waiting for the ttl to expire before going further.

Edit 16:15 UTC : We have removed the third record (185.42.117.108 ) we are waiting for the ttl to expire before beginning

Edit 16:25 UTC: We have added back the third record ((185.42.117.108), we are waiting for the ttl to expire before going further.

Edit 16:30 UTC : We have removed the fourht and last one record (185.42.117.109 ) we are waiting for the ttl to expire before beginning

Edit 16:40 UTC : We have added back the third record ((185.42.117.109), we have finished the maintenance

Edit 17:38 UTC: We have an increase in TLS errors for incoming requests, we are looking into it.

Edit 18:08 UTC: We found a potential issue. We are deploying a fix and will monitor the situation closely.

Edit 19:06 UTC: The fix has been deployed since 18:55 and we are monitoring the situation

Edit D+1 16:00 UTC : We have find the issue on the update and patch the software. We will apply it in a few moment.

Edit D+1 16:30 UTC : We will update the first ip address 46.252.181.103.

Edit D+1 17:15 UTC : We have updated the second ip address 46.252.181.104, we will begin the third address 185.42.117.108.

Edit D+1 17:30 UTC : We have updated the fourth ip address 185.42.117.109.

Edit D+1 18:30 UTC : We have finished the operation, we are watching it

Fixed · cleverapps.io domains · Global

We are currently experiencing TLS requests issues on *.cleverapps.io domains. We are looking into the issue.

EDIT 13:00 UTC: The problem has been fixed and will be investigated further to pinpoint the origin. EDIT 13:30 UTC: We have applied a patch to solve the issue.

Fixed · Deployments · Global

There is an issue on the deployment stack. We have identified the issue and we have begun the recovery process.

07:34 UTC : we have fixed the issue and we keep watching the issue

13:00 UTC: The issue did not occur again. This incident is now over.

Fixed · Global

This schedule concerns the availability of Stats API, Grafana metrics and Web console metrics (like heatmaps and HTTP statistics).

Friday 6PM UTC (20h CEST): we will activate the new Logging and Metrics infrastructure for your services.
  
Clever Cloud Observability has been beta for a while now, hiding the underlying work to provide a generally available service.
  
Not statisfied with the current quality of service, in the last months we've been building and testing a new customer experience for Logs and Metrics with a whole new infrastructure optimized for performance and durability. Part of this work is already available as tech preview for Clever Tools users wanting to consume their Logs. This maintenance is how we will deliver it for all other services.
 
What does it means?  

Logs

There are 3 kinds of Logs :

  • Access Logs
  • Services Logs for Apps and AddOns
  • Audit Logs
     
    Services Logs are exposed in the Web Console and the CLI while AccessLogs are exposed in the CLI only and Audit Logs are now exposed currently.
     
    The new infrastructure homogenize Logs and Access Logs through the same Logs API using our Topic as a Service service under the hood. It means you will be able to setup a custom retention for all your Logs. Also a new API will let you sync them with other services (Pulsar, Otel, Datadog, etc...). In the coming weeks, we will deliver our brand new Web Console Logging experience that we hope you will love.
     
    Meanwhile, the Clever Tools CLI will be updated to reflect the new Logs API capabilities, providing Live and Replay streams of your Logs data. During the maintenance window, these data may not be available and be sure to update your Clever Tools CLI to benefit from the new Logs API for your AccessLogs. \

Metrics

There are multiple use of Metrics data:

  • Generated Grafana Dashboards
  • Statsd pushed metrics
  • Stats API for differents products
  • Metrics shown in Web Console
  • Geolocalized heatmap of your requests and connections

    They all share the same storage layer which has not satisfied our quality expectations to reach GA. This storage technology has been replaced and is expected to bring more stability for all Clever Cloud's Observability metrics.

    All services will be switched to the new infrastructure, which will cause some unavailability for the time of the operation.
    We hope this operation will find you happy with the overall new Observability experience it will brought as this is a big accomplishment for us :)

    For all operations, a follow up will be maintained on https://www.clevercloudstatus.com/

Edit 18:08PM UTC: We start the maintenance operation with redeployment of apps with Token dependencies. (grafana, scheduler, etc.)

Edit 18:11PM UTC: Grafana is being shut to reconfigure the managed service behind.

Edit 18:40PM UTC: Token manage is successfully up to date. Apps are being redeployed to switch their metrics endpoint

Edit 18:46PM UTC: Web console metrics are unavailable for a few minutes (this is expected)

Edit 19:31PM UTC: Web console has now server metrics available

Edit 20:16PM UTC: All Grafana dashboards are back online. If you encounter an issue with a "Error 500: invalid token", then you can go to your org home page > Metrics in Grafana > and click on the RESET ALL DASHBOARDS button.

Edit 21:20PM UTC: Only access logs based dashboards remain unavailable.

Fixed · Global

We are going to migrate our DEV PostgreSQL services on the Paris (PAR) region. Applications using those services will be impacted.

For this reason, we have deployed a new cluster in version 15. Starting from today, you can already migrate your DEV add-on to this new cluster and by Thursday last delay, we will automatically migrate all add-ons that are compatible with PostgreSQL version 15.

For incompatible add-ons, we are planning a maintenance in order to update the par dev cluster. This maintenance will take place on Thursday the 26st of October 2023, between 15:00 UTC+2 and 17:00 UTC+2.

For the entire duration of the update, services will be unavailable. The time required to perform the update is estimated between 1 and 2 hours. However, total downtime might be longer as every application using the cluster will need to be restarted.

In case you have connection issues after those updates, you can manually trigger a redeployment of your linked applications.

If you do not want to be impacted by your DEV add-on being offline, you can still order or migrate to a dedicated one before this maintenance starts.

Our support team is available for any questions via the ticket center in the console.

EDIT 2023-10-25 15:00 UTC+2: We will delayed the maintenance to 15:00 UTC+2 the 26st of October 2023.

EDIT 2023-10-26 15:00 UTC+2: Most of the DEV addons have been migrated, we are going to start the maintenance

EDIT 2023-10-26 15:35 UTC+2: Dev cluster par-postgresql-c4 is back online.

EDIT 2023-10-26 16:30 UTC+2: Everything is now back to normal. Maintenance end

Fixed · Global

For security reasons, we will migrate our public load balancers on the Paris (PAR) region including cleverapps.io domains.

The maintenance will take place on Sunday 22 October 2023, between 14:00 UTC+2 and 20:00 UTC+2.

During the maintenance, applications and add-ons on this region will experience unexpected connection closed or reset, specifically on long running connections, beginning at 16:00 UTC+2. To prevent issues, you could restart your application if you see connection issues.

To check which of your services are impacted, you can consult the information section of your applications and see the region where your application is deployed.

14:45 UTC+2 : we are beginning the preparation steps to update load balancer that received cleverapps.io traffic 16:00 UTC+2 : we have identified a bug, so we will skip the update for now of cleverapps.io load balancers 16:30 UTC+2 : we are beginning the update of the last load balancer. 18:00 UTC+2 : we will soon update dns records to send traffics to new load balancer. 18:15 UTC+2: dns records has been updated 18:20 UTC+2 : monitoring is green, the maintenance is done

Fixed · Global

Due to security updates, we will need to shutdown and upgrade a component of our deployment infrastructure.

As a result, we will need to shutdown the deployment component for approx. 1 hour.

The maintenance is over, deployments are now usable again.

Fixed · Global

For security reasons, we will migrate our public load balancers on the Paris (PAR) region including cleverapps.io domains.

The maintenance will take place on Saturday 21 October 2023, between 14:00 UTC+2 and 20:00 UTC+2.

During the maintenance, applications and add-ons on this region will experience unexpected connection closed or reset, specifically on long running connections, beginning at 16:00 UTC+2. To prevent issues, you could restart your application if you see connection issues.

To check which of your services are impacted, you can consult the information section of your applications and see the region where your application is deployed.

14:15 UTC+2 : The maintenance will start soon, we are ending preparation steps 15:15 UTC+2: Preparation steps took more time than estimated, we are rolling some configuration update on dedicated load balancers 16:15 UTC+2: Update in rolling of dedicated load balancers is terminated, we are beginning the public shared load balancer. 17:15 UTC+2: We are udpating the domain name resolutions for public shared load balancer of addons 18:30 UTC+2: We have updated two of eights servers of public shared load balancer of addons. 19:00 UTC+2: We have updated four of eights servers of public shared load balancer of addons. 19:15 UTC+2: We have updated six of eights servers of public shared load balancer of addons. 19:15 UTC+2: We have updated seven of eights servers of public shared load balancer of addons. 19:50 UTC+2: We have updated all servers of public shared load balancer of addons. As it is late and we are reaching the end of the window, we will update last load balancers tomorrow afternoon

Fixed · Global

For security reasons, we will migrate our DEV MySQL services on the Paris (PAR) region. Applications using those services will be impacted.

Only the par dev cluster will be updated during this maintenance.

The maintenance will take place on Monday 23rd of October 2023, between 11:45 UTC+2 and 15:00 UTC+2.

For the entire duration of the update, the services will not be available.

The time required to perform the update is estimated between 1 and 2 hours. However, total downtime might be longer as every application using the cluster will need to be restarted.

In case you have connection issues after those updates, you can manually trigger a redeployment of your linked applications.

If you do not want to be impacted by your DEV addon being offline, you can still order or migrate to a dedicated one before this maintenance starts.

Our support team is available for any questions via the ticket center in the console.

EDIT 2023-10-23 11:50 UTC+2: Maintenance is starting.

EDIT 2023-10-23 12:20 UTC+2: Dev addons are now available again. We will restart linked applications

EDIT 2023-10-23 12:22 UTC+2: We investigate an error while creating new DEV addons

EDIT 2023-10-23 12:40 UTC+2: New DEV addons can now be created. All applications linked to DEV addons are currently restarting

EDIT 2023-10-23 13:00 UTC+2: All applications linked to DEV addons have restarted

Notifications
Fixed · Global

A maintenance on our notification stack is scheduled for this afternoon. This means services such as WebHooks and Email Notifications will not work.

Downtime is expected to last between 30 minutes to 1 hour.

[14:15 UTC] All notifications services are now up and running

Fixed · Infrastructure · Global

This night, we had network connectivity issues when adding new bgp peer. Those connectivity issues has resulted in desynchronisation of our load balancers. You may have experienced connectivity issue to your database and unavailability of your applications.

Fixed · FS Buckets · Global

PHP+FTP applications may have seen degraded performances and / or timeouts those past few hours on the Paris region due to a configuration issue of the underlying disk.

The configuration has been fixed at 21:00 UTC and disk access time are now in the normal range. We will keep monitoring the situation in the upcoming days to make sure performance stays in normal ranges.

Fixed · Global

Our security updates will need us to update nodes from our DEV mongodb cluster on PAR.

If you use the mongodb uri correctly, it should only disrupt your application for a few seconds. Otherwise, expect up to two hours of maintenance.

Fixed · Services Logs · Global

We are encountering issues with our log systems.

The fetch of logs can take a while.

EDIT 21:37 UTC - fixed.

Fixed · Global

For security reasons, we will update our logs collection systems.

The logs collection (logs drains too) will be unavailable during the maintenance.

EDIT 00:00 UTC: The maintenance is now over.

Fixed · Global

For security reasons, we will update the our FS Bucket services on the Paris (PAR) region. Applications using those services will be impacted.

FS Bucket hosts that will be updated during this maintenance are: n19 and n20.

The maintenance will take place on Friday 20 October 2023, between 12:00 UTC+2 and 14:00 UTC+2.

During the update of each server host, the services will only be available in read-only mode. Once the update is complete, linked applications will be restarted automatically to take into account the environment variables of the updated services and to restore write capacity.

The required update time is estimated at 1 hour but the total time until the applications are restarted might be longer.

In case you have write issues after those updates, you can manually initiate a redeployment of your linked applications in order to avoid waiting for the automatic redeployment.

To check if your services are impacted, you can consult your FS Bucket’s server in the Dashboard tab of your add-ons, in the “Cluster information” section and thus determine the update day(s) that concerns you.

Specific case - old applications

Please check if you have any old applications (>5 years) that are still using a buckets.json file in their code repository, as we will not be able to prioritize the redeployment of these applications and they will most likely suffer from read-only FS Bucket for an extended time. We therefore recommend that you now mount FS Bucket by environment variable (ideally by linking the add-on to your application). See more details in this documentation page: https://www.clever-cloud.com/doc/deploy/addon/fs-bucket/#configuring-your-application

Our support is available for any questions via the ticket center in the console.

EDIT 2023-10-20 12:00 UTC+2: Maintenance is starting.

EDIT 2023-10-20 13:24 UTC+2: Applications are currently redeploying.

EDIT 2023-10-20 14:20 UTC+2: Applications have redeployed. We are cleaning things up.

EDIT 2023-10-20 16:12 UTC+2: The maintenance is over.