Ad serving partially down
Incident Report for Xandr
Postmortem

Summary

From approximately 13:05 UTC to 16:23 UTC on August 15, 2019, ad serving in the New York and Singapore regions was unavailable.

Scope of Impact

During the incident window, blank ads were served in these regions and requests to ib.adnxs.com failed.

Timeline (UTC)

  • 2019-08-15 13:05: New Console Bidder version released to a small set of servers.
  • 2019-08-15 13:06: Engineers alerted of outage.
  • 2019-08-15 13:12: Issue escalated as incident. Investigation continues.
  • 2019-08-15 13:32: Issue reported on status.appnexus.com
  • 2019-08-15 13:53: New hotfix attempt built and released. Hotfix did not resolve issue.
  • 2019-08-15 14:03: Rollback to old version completed for a few servers in NYM2. Rollback did not resolve issue.
  • 2019-08-15 15:40: “Bake” Console Bidder releases found in SIN1 and NYM2
  • 2019-08-15 15:46: Releases rolled back
  • 2019-08-15 16:05: NYM2 servers 100% recovered
  • 2019-08-15 16:23: SIN1 servers 100% recovered

Cause Analysis

The outage was caused by a Console Bidder “bake” (a release to a very small number of servers) in NYM2, which triggered a latent bug in the Impression Bus that resulted in an infinite loop, ultimately capping out server memory. This release was tested, but the exact code path that triggered the latent bug was not exercised in those tests.

Resolution Steps

Our engineers resolved the issue by rolling back the bidder release on the affected servers.

Next Steps

  • Fix the latent bug in the Impression Bus that caused the infinite loop.
  • Design a more advanced “bake” implementation such that bake servers are entirely isolated from downstream production systems.
Posted Aug 19, 2019 - 14:41 UTC

Resolved

The incident has been fully resolved. We apologize for the inconvenience this issue may have caused, and thank you for your continued support.

Posted Aug 15, 2019 - 16:29 UTC
Investigating

We are currently investigating the following issue:

  • Component(s): Ad Serving
  • Impact(s):
    • Increase in blanks for Console customers
    • Some calls to ib.adnxs.com failing
  • Severity: Major Outage
  • Datacenter(s): NYM2, SIN1

We will provide an update as soon as more information is available. Thank you for your patience.

Posted Aug 15, 2019 - 13:32 UTC