Console timing out
Incident Report for Xandr
Postmortem

Incident Summary

On March 18th 2019, console API and UI were partially unresponsive for about 50 minutes from 12:38 PM UTC to 13:29 UTC.

Scope of Impact

During the incident window, users were unable to access console UI and API and user sessions were timing out.

Timeline (UTC)

2019-03-18 12:38: Incident started:This is a rough estimate of the time the incident started based on our monitoring dashboard
2019-03-18 12:40: We received notifications from clients about the time-out errors
2019-03-18 13:02: IM ticket was created
2019-03-18 13:02: Engineers were notified
2019-03-18 13:16: Engineers rolled out a fix
2019-03-18 13:29: Incident resolved

Cause Analysis

The incident was caused by a long running delete job that progressively slowed down the database.

Resolution Steps

Our engineers killed the long running query and the timeout errors disappeared almost immediately

Next Steps

Review the alerts time window for long running jobs.

Posted Apr 01, 2019 - 19:14 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Mar 18, 2019 - 14:22 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Console API, Console UI
  • Impact(s):
    • Page load failures and errors in user interface
    • Unable to save/edit objects
    • Latency, timeouts and errors in API
  • Severity: Major Outage
  • Datacenter(s): Global

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Mar 18, 2019 - 13:22 UTC