Incidents
Full history of incidents.
May 2022
The SSH Gateway will undergo a maintenance which will stop the service. Expected downtime is 30 minutes. During this time, SSH access to instances will be unavailable both from the CLI or from the regular SSH tool. Existing SSH connections will be stopped.
Maintenance is expected to start in a few minutes
EDIT 17:56 UTC: Service is back online, you should now be able to SSH to your instances. Sorry for the inconvenience.
Metrics and access logs are currently having some ingestion/query issues. We are working on it.
EDIT 23:06 UTC - Storage cluster is now up. We are now catching up the accumulated ingestion lag. Query components will be restarted in a rolling fashion throughout the next 6 hours.
EDIT Sunday 11:27 UTC - Some query components are still reloading
EDIT Sunday 20:27 UTC - We are still experiencing issues on the query components.
EDIT Monday 07:20 UTC - Query is back online
A few hypervisors on the Paris zone had a configuration issue between 12:21 UTC and 14:16 UTC leading to instances not being properly monitored. This caused Monitoring/Unreachable deployments for the instances hosted on them.
Because of this, those hypervisors became more empty than the others. More VMs were scheduled on them since they had more resources available, which then lead to more Monitoring/Unreachable events.
Instances weren't, for the most part, unreachable, but were redeployed anyway.
This should now be fixed. Sorry for the inconvenience
Some FS-Bucket add-ons will need to be migrated to a different server for security reasons. During this migration, the Buckets will be in Read-Only mode. Any attempt to create or update a file on the add-on will fail, including for FTP operations. Errors related to Read-only file system are expected during this migration.
The migration is expected to last at most 1 hour. All impacted applications will be redeployed during the migration. After the deployment, applications will be able to write to the bucket. Read operations will not be impacted.
Users of buckets that need to be migrated have received emails.
EDIT 2022-05-31 10:00 UTC: The migration is starting, buckets will be put into read-only.
EDIT 2022-05-31 10:25 UTC: The migration is over. Applications have started redeploying, it should take around 2 hours. You can redeploy your application earlier to finish the migration.
EDIT 2022-05-31 13:11 UTC: All applications have been redeployed, the migration is now over.
Metrics and access logs are currently having some query issues. We are working on it.
EDIT 07:16 UTC - Indexes have been rebuilt. Query is now available.
There is currently a delay in monitoring actions for some applications. This may result in extended time to detect crashed application instances and upscales / downscales events. Actions are currently queued and will resume shortly. ETA is 30 minutes.
EDIT 17:12 UTC: The queue is still being consumed.
EDIT 17:27 UTC: The queue is now empty. Every monitoring actions should now be working as expected.
An hypervisor is currently unavailable. Applications are currently restarting. Add-ons hosted on that hypervisor are currently unavailable. We are looking into the root cause.
EDIT 10:50 UTC: Hypervisor is back online. Add-ons hosted on that hypervisor are currently available.
Metrics and access logs are currently having some query issues. We are working on it.
EDIT 15:02 UTC - Indexes have been rebuilt. Query is now available.
Metrics and access logs are currently having some query issues. We are working on it.
EDIT 09:20 UTC - Indexes have been rebuilt. Query is now available.
A FSBucket server was unreachable for 15 minutes, leading to increased response time for basic read / write operations on some FSBuckets. This has been fixed, impacted applications will be redeployed.
Some FS-Bucket add-ons will need to be migrated to a different server for security reasons. During this migration, the Buckets will be in Read-Only mode. Any attempt to create or update a file on the add-on will fail, including for FTP operations. Errors related to Read-only file system are expected during this migration.
The migration is expected to last at most 1 hour. All impacted applications will be redeployed during the migration. After the deployment, applications will be able to write to the bucket. Read operations will not be impacted.
Users of buckets that need to be migrated have received emails.
EDIT 24/05/2022 12:00 UTC+2: The migration will start soon. FSBuckets will be put into read-only for a couple of minutes so that all buckets are correctly synchronized.
EDIT 24/05/2022 12:03 UTC+2: FSBuckets are now in read-only mode.
EDIT 24/05/2022 12:39 UTC+2: Synchronization is over. Applications are being redeployed. If you wish to recover faster, you can trigger a deployment through the web Console or CLI. Deployments are expected to all be started within the next 30 minutes.
EDIT 24/05/2022 13:34 UTC+2: The migration is over, if you have any issues, please contact our support team
Metrics and access logs are currently having ingestion and query issues. We are working on it.
EDIT 08:28 UTC - We are consuming the lag.
EDIT 08:28 UTC - Indexes are rebuilding.
EDIT 09:34 UTC - Indexes are rebuilt. Query is available.
EDIT 16:03 UTC - Fixed.
AccessLogs/Metrics are experiencing issues
EDIT 23:41 UTC - Issue has been identified and we are consuming the lag.
EDIT 07:28 UTC - Lag has been consumed .
EDIT 07:30 UTC - Fixed.
Logs are experiencing issues
23:55 UTC - Issues has been identified and we are consuming the lag.
00:19 UTC - lag has been consumed.
00:20 UTC - Fixed.
Metrics/AccessLogs are experiencing issues.
EDIT 09:11 UTC - Metrics/AccessLogs are catching up their lag.
EDIT 16:34 UTC - Fixed.
Logs and drains systems are experiencing issues. We are working on it.
EDIT 09:06 UTC - The logs are catching up.
EDIT 11:15 UTC - Fixed.
Deployment components are experiencing issues to due deployment lag triggered by the Core API issues.
EDIT 08:00 UTC - We have identified ongoing issues.
EDIT 08:02 UTC - New deployments are currently disabled to reduce the impact on our infrastructures. We will reactivate them when the queued ones will be deployed.
EDIT 08:45 UTC - Deployments are still flaky, we are working to resolve the issues.
EDIT 09:08 UTC - Deployments queue is catching up. When it ends, we will redeploy a part of the PAR zone to ensure deployments are monitoring are consistent.
EDIT 09:25 UTC - The mentioned deployments are running.
EDIT 11:16 UTC - We are about at 75% of the deployments completed.
EDIT 12:06 UTC - Finished and fixed.
We are investigating issues with our Core API.
EDIT 06:34 UTC - Our orchestrator is impacted and the deployments are experiencing issues.
EDIT 06:44 UTC - Core API is fixed.
EDIT 08:34 UTC - We are experiencing issues affecting console, cli. We are investigating.
EDIT 08:45 UTC - Core API is fixed.
Clever Cloud API behind the domain name api.clever-cloud.com got some slow-downs
We found out that there is an issue with a shard of our indexes. Some metrics may be unavailable during the reloading period.