Clever Cloud Status

Incidents

Full history of incidents.

Oldest first

October 2023

Fixed · Global

For security reasons, we will update the our FS Bucket services on the Paris (PAR) region. Applications using those services will be impacted.

FS Bucket hosts that will be updated during this maintenance are: n10 and n17.

The maintenance will take place on Thursday 19 October 2023, between 12:00 UTC+2 and 14:00 UTC+2.

During the update of each server host, the services will only be available in read-only mode. Once the update is complete, linked applications will be restarted automatically to take into account the environment variables of the updated services and to restore write capacity.

The required update time is estimated at 1 hour but the total time until the applications are restarted might be longer.

In case you have write issues after those updates, you can manually initiate a redeployment of your linked applications in order to avoid waiting for the automatic redeployment.

To check if your services are impacted, you can consult your FS Bucket’s server in the Dashboard tab of your add-ons, in the “Cluster information” section and thus determine the update day(s) that concerns you.

Specific case - old applications

Please check if you have any old applications (>5 years) that are still using a buckets.json file in their code repository, as we will not be able to prioritize the redeployment of these applications and they will most likely suffer from read-only FS Bucket for an extended time. We therefore recommend that you now mount FS Bucket by environment variable (ideally by linking the add-on to your application). See more details in this documentation page: https://www.clever-cloud.com/doc/deploy/addon/fs-bucket/#configuring-your-application

Our support is available for any questions via the ticket center in the console.

Fixed · Global

For security reasons, we will update the our FS Bucket services on the Paris (PAR) region. Applications using those services will be impacted.

FS Bucket hosts that will be updated during this maintenance are: n15 and n16.

The maintenance will take place on Wednesday 18 October 2023, between 12:00 UTC+2 and 14:00 UTC+2.

During the update of each server host, the services will only be available in read-only mode. Once the update is complete, linked applications will be restarted automatically to take into account the environment variables of the updated services and to restore write capacity.

The required update time is estimated at 1 hour but the total time until the applications are restarted might be longer.

In case you have write issues after those updates, you can manually initiate a redeployment of your linked applications in order to avoid waiting for the automatic redeployment.

To check if your services are impacted, you can consult your FS Bucket’s server in the Dashboard tab of your add-ons, in the “Cluster information” section and thus determine the update day(s) that concerns you.

Specific case - old applications

Please check if you have any old applications (>5 years) that are still using a buckets.json file in their code repository, as we will not be able to prioritize the redeployment of these applications and they will most likely suffer from read-only FS Bucket for an extended time. We therefore recommend that you now mount FS Bucket by environment variable (ideally by linking the add-on to your application). See more details in this documentation page: https://www.clever-cloud.com/doc/deploy/addon/fs-bucket/#configuring-your-application

Our support is available for any questions via the ticket center in the console.

Fixed · Global

For security reasons, we will update the our FS Bucket services on the Paris (PAR) region. Applications using those services will be impacted.

FS Bucket hosts that will be updated during this maintenance are: n12 and n13.

The maintenance will take place on Tuesday 17 October 2023, between 12:00 UTC+2 and 14:00 UTC+2.

During the update of each server host, the services will only be available in read-only mode. Once the update is complete, linked applications will be restarted automatically to take into account the environment variables of the updated services and to restore write capacity.

The required update time is estimated at 1 hour but the total time until the applications are restarted might be longer.

In case you have write issues after those updates, you can manually initiate a redeployment of your linked applications in order to avoid waiting for the automatic redeployment.

To check if your services are impacted, you can consult your FS Bucket’s server in the Dashboard tab of your add-ons, in the “Cluster information” section and thus determine the update day(s) that concerns you.

Specific case - old applications

Please check if you have any old applications (>5 years) that are still using a buckets.json file in their code repository, as we will not be able to prioritize the redeployment of these applications and they will most likely suffer from read-only FS Bucket for an extended time. We therefore recommend that you now mount FS Bucket by environment variable (ideally by linking the add-on to your application). See more details in this documentation page: https://www.clever-cloud.com/doc/deploy/addon/fs-bucket/#configuring-your-application

Our support is available for any questions via the ticket center in the console.

EDIT 2023-10-17 12:10 UTC+2: The maintenance is starting. FSBucket servers are set in read-only mode.

EDIT 2023-10-17 12:47 UTC+2: Applications are being redeployed to use the new FSBucket server. You can also start a deployment on your side to speed things up.

EDIT 2023-10-17 16:15 UTC+2: The maintenance is over. All applications should now have access to their fsbucket since 14:00 UTC+2. Please reach out to our support team if you have any issues following this maintenance.

Fixed · Global

For security reasons, we will update the PHP FTP services on the Paris (PAR) region used by PHP+FTP applications. PHP applications using the Git deployment method are not impacted by this maintenance.

PHP FTP hosts that will be updated during this maintenance are: n11 and n18.

The maintenance will take place on Monday 16 October 2023, between 12:00 UTC+2 and 14:00 UTC+2.

During the update of each server host, the services will only be available in read-only mode. Once the update is complete, linked applications will be restarted automatically to take into account the environment variables of the updated services and to restore write capacity.

The required update time is estimated at 1 hour but the total time until the applications are restarted might be longer.

In case you have write issues after those updates, you can manually initiate a redeployment of your PHP+FTP applications in order to avoid waiting for the automatic redeployment.

Our support is available for any questions via the ticket center in the console. This maintenance will be updated during the maintenance window.

EDIT 2023-10-16 12:03 UTC+2: The maintenance will begin shortly. FSBucket add-on hosted on those servers will soon become read-only.

EDIT 2023-10-16 12:09 UTC+2: FSBuckets are now read-only

EDIT 2023-10-16 12:54 UTC+2: Applications are being redeployed to use the new FSBucket server. You can also start a deployment on your side to speed things up.

EDIT 2023-10-16 14:04 UTC+2: The maintenance is over. All applications should now have access to their fsbucket since 13:30 UTC+2. Please reach out to our support team if you have any issues following this maintenance.

Fixed · Infrastructure · Global

An hypervisor had its internal network unreachable between 15:16 and 15:20 UTC. During that time, services on the hypervisors may have suffered a total loss of network connectivity. Applications have been redeployed because of the unreachability. The issue has been fixed and we are monitoring the situation.

EDIT 15:49 UTC: All services are now reachable again, the incident is now over.

Fixed · Access Logs · Global

We are detecting a data lag on our Metrics / AccessLog stack. You may not experience the latest datapoints. We are working on it.

EDIT 10:00 PM: lag has been fully absorbed

Fixed · Infrastructure · Global

One hypervisor is unreachable. We are investigating it.

It may impact some databases that are hosted on top of this hypervisor.

06:20 The hypervisor seems to have encountered a kernel panic. It has been rebooted and we fixed the kernel version to avoid future Kernel Panic.

06:45 Now that the hypervisor is back up, we are cleaning the situation: checking all add-ons instances have rebooted successfully, that all applications have redeployed successfully.

07:21 Everything is now back to normal

Fixed · Global

Starting 2023-10-06 23:00 UTC, we will conduct a maintenance operation on one of the key parts of the deployment system. With the security margins, we expect it to take 2 hours.

As a result, deployments on all our zones will be disabled between 11:00 and 01:00 (2023-10-07).

EDIT 2023-10-07 01:32 UTC: The maintenance is over. Deployments are now working again.

Fixed · Infrastructure · Global

Part of our network infrastructure in Paris is unreachable. We are looking into it.

09:25 PM UTC: Network is back online. We are bringing back services which are not healthy.

09:30 PM UTC: Network is still flappy, we are on it.

10:06 PM UTC: Network seems stable. We are bringing back services which are not healthy.

10:37 PM UTC: We are bringing back services which are not healthy.

10:52 PM UTC: all services are back online. Good night.

Fixed · Global

The SSH Gateway will be unavailable on October 3rd, 2023 starting at 20:00 UTC. During that time, SSH access to services using either ssh -t ssh@sshgateway-clevercloud-customers.services.clever-cloud.com or using our CLI command clever ssh will be unavailable. Existing SSH connections through the gateway to services will be interrupted.

Once the maintenance is over, it is possible that some applications will need to be restarted to be able to be accessed through the SSH Gateway again.

The maintenance is planned to last less than 30 minutes.

EDIT 20:05 UTC: The maintenance is starting.

EDIT 20:27 UTC: The maintenance is now over. We are monitoring the results. You should now be able to access your services using the SSH gateway.

EDIT 20:27 UTC: Everything is working as intended. If you have any issues using the SSH gateway, you can try to redeploy your service and contact our support team.

Fixed · Deployments · Global

Some pushes are not starting a new deployments. We are investigating.

EDIT 09:06 UTC: We implemented a fix and are monitoring the results. If you pushed new commits that didn't get deployed, you can either contact us through the support with your application id and the associated commit, or use our CLI with clever restart --commit <commit>.

EDIT 12:25 UTC: The incident is now resolved.

Fixed · Global

A maintenance on deployments and git repositories will happen at 22:00 UTC on October 2nd, 2023. During that time, deployments may be queued up for longer than usual. Git repositories will also be unavailable for a few minutes. Git pushes may fail to start new deployments or be rejected.

The maintenance is expected to last less than 30 minutes.

EDIT 22:00 UTC: The maintenance is starting

EDIT 22:35 UTC: The maintenance is mostly over. Deployments and git repositories are back since 15 minutes. We continue to make sure everything is running smoothly.

EDIT 23:05 UTC: Everything is back to normal since 22:20 UTC. The maintenance is over. Thanks for your patience.

September 2023

Fixed · Infrastructure · Global

We are experiencing some network instabilities. We are investigating.

EDIT 02:55 PM UTC: the network instability has been fixed. Some customers may experiences a connection reset.

Fixed · Infrastructure · Global

Some applications may have encountered an elevated rate of deployments with the monitoring/unreachable reason even if the applications were correctly reachable. A fix has been implemented and we are monitoring the situation.

EDIT 14:46 UTC: This incident is no over. No more incorrect Monitoring/Unreachable alerts were emitted.

Query latency
Fixed · Access Logs · Global

We are observing some latency to retrieve metrics and accesslogs from our storage layer. We are investigating.

EDIT 02:46 PM UTC: Latencies have been fixed by rebalancing data

EDIT 27/09 at 13:00 UTC: Queries were not available

EDIT 27/09 at 14:10 UTC: Queries are re-open

We continue to investigate

EDIT 27/09 at 16:10 UTC: Closing incident

Fixed · API · Global

Our main API is affected by the pulsar outage. We are looking into it.

EDIT 15:02 UTC: we deployed a new version of the API that will survive future pulsar outages.

Fixed · Pulsar · Global

We are experiencing issues on one of our Pulsar clusters.

We have identified the issue and are working on it.

EDIT 13:31 UTC - we are still working on the issue.

EDIT 14:44 UTC - we are still working on the issue.

EDIT 16:09 UTC - fixed.

Fixed · Access Logs · Global

The storage layer has lost some nodes. We are investigating the issue.

EDIT 13:45 UTC : We have found that we have a network issue which cause storage nodes to timeout and then crash. Those nodes are now up and running, we are beginning the recovery process

EDIT 15:10 UTC : We have finished the recovery process and we are consuming the lag.

EDIT 18:52 UTC : We have almost consume all the data lag (estimate duration is 30 mins left), but there is still 2h of metadata lag.

EDIT 21:00 UTC: We have catched up the data and metadata lag, the query is now open

Fixed · API · Global

Our main API is currently unreachable. We are aware of the issue and working towards bringing it back.

EDIT 12:56 UTC: The main issue is now resolved and the API is back online. We continue to see some errors and are working towards identifying their source.

EDIT 14:25 UTC: The API has stabilized but we are still looking for the origin of the troubles.

EDIT 13/09 09:03 UTC: The API is unreachable again, we are working on it

EDIT 13/09 09:15 UTC: The API is now operational, the root cause has been identified.

Fixed · API · Global

We are performing security updates on some core components.

Our main API may be unavailable for 1 hour.

EDIT 00:30 UTC: The maintenance is now over since 25 minutes ago. We are monitoring the results.