[Global] Access logs ingestion delay
Resolved:
We are currently investigating an issue affecting the access logs ingestion pipeline.
Impact:
- Access logs may be delayed or temporarily unavailable in the console
- Applications and addons continue to run normally without any impact
Updates
We have identified the root cause of the access logs ingestion delay and are currently deploying a fix. The issue was caused by excessive time required to create producers on our Pulsar messaging system, which resulted in a bottleneck in the logs ingestion pipeline.
During our investigation and deployment of the initial fix, we have identified an underlying issue with our Pulsar cluster that requires additional attention.
Current Situation:
- The producer creation delays were a symptom of a broader issue affecting the Pulsar cluster stability
- Our infrastructure team is actively working on stabilizing the cluster
- Access logs ingestion remains degraded while we address the root infrastructure issue
Impact (updated):
- Access logs continue to experience significant delays
- Some access logs may be queued for extended periods
- Applications remain unaffected and continue to run normally
- Real-time metrics and monitoring remain operational
Current Actions:
- Infrastructure team investigating Pulsar cluster health
- Implementing cluster-level remediation measures
- Monitoring system performance closely
- Preparing contingency measures if needed
We are impacted by the following incident from pulsar cluster: https://www.clevercloudstatus.com/incidents/1026
We are catching back access logs lag
At the current speed, it will take 6 hours to catch up the lag from the incident. We will post an update when the catch up will be finished.
We have catched up the lag, the incident is closed.