Renewing Expired SSL Certificate
Incident Report for AWAIR
Postmortem

Awair’s mission is to provide reliable (always-on) data service. Over the past couple of days we failed to deliver on this promise.

We want to take ownership of our mistake, and sincerely apologize for any inconvenience caused by disruptions to the Awair service.

We further want to communicate as clearly and quickly as possible what happened, how to resolve any remaining issues with Awair Omni, and what we are doing to prevent this from happening in the future.

What caused the disruption:

We had a web certificate expire that impacted our service on October 26th, 2020 at approximately 17:35 PT. We renewed the certificate on October 28th at approximately 02:35 PT, which we thought resolved the issue.

However, two hours later, at 04:45 PT on October 28th, we noticed a further incident the outage had created for Awair Omni devices: during the period of time the certificate was expired, Transport Layer Security (TLS) communications between Awair Omni devices and our servers failed. When Omni devices could not synchronize their clock with our servers (due to the expired certificate), these repeated failures caused devices to crash and remain disconnected.

How to resolve remaining issues with Omni:

NOTE: Your Awair Omni device does not require a factory reset.

It may, however, require a manual power cycle.
To power cycle your Awair Omni:

  • Hold down the right side button for ~3-5 seconds
  • Let go for a second
  • Push the button again until the Awair logo appears on the front of the device.

The service disruption for some Omni devices began on October 27th at 07:30 PT. You may see gaps in data from then until October 28th at 04:45 PT and/or until you power cycle your device.

How we will prevent this from happening in the future:

Awair Omni’s response to the server disconnection was an anomaly, one that we have addressed and debugged to make sure that it does not occur in the future.

Additionally, we have internally set up cross-checks within the company regarding the ownership of the certification renewal to guarantee that a lapse in renewal time will not be a repeated incident.

We are deeply apologetic for the inconvenience caused by this disruption in service. We are immensely appreciative of  your patience and support as we grow as a global company. If you find that your team or your Omni devices continue to have any issues following a power cycle, please reach out to hello@getawair.com.

Posted Oct 28, 2020 - 19:49 PDT

Resolved
This incident has been resolved.
Posted Oct 28, 2020 - 04:45 PDT
Monitoring
New certificate applied and monitoring for device reconnection.
Posted Oct 27, 2020 - 23:53 PDT
Update
A certificate has been issued and we are working hard to apply it as soon as possible.
Posted Oct 27, 2020 - 23:17 PDT
Identified
Devices are refreshing their TLS handshakes with our backend services. Data loss is expected.
We are working with the certificate authority to renew the certificate as soon as possible.
Posted Oct 27, 2020 - 19:20 PDT
Update
Temporary fix applied to address onboarding issues. Let us know if you see any unintended behaviors (hello@getawair.com) while we work on a permanent fix.
Posted Oct 27, 2020 - 10:42 PDT
Update
Dashboard, web, API, and most app features are working. We are working on resolving new (or factory reset) device onboarding. If you are experience issues, please avoid factory resetting your device until we can confirm that it is working.
Posted Oct 27, 2020 - 10:14 PDT
Monitoring
A fix by replacing CA with letsencrypt and we are monitoring the results.
Posted Oct 27, 2020 - 08:03 PDT
Update
Awaiting paperwork from Digicert
Posted Oct 26, 2020 - 23:57 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Oct 26, 2020 - 17:45 PDT
Investigating
We are currently investigating this issue.
Posted Oct 26, 2020 - 17:35 PDT
This incident affected: Dashboard, Developer APIs, OAuth 2.0, App, and Data Pipeline.