Postmortem -
Read details
Jun 21, 09:20 AEST
Resolved -
Platform has been stable for an hour now, so we believe disabling KernelCare has resolved the issue.
If you have not already done so please check your VPS as soon as convenient to ensure it is functioning correctly following host power-cycle. The following host nodes were affected by this incident:
- bnecompute03
- bnecompute04
- bnecompute05
- bnecompute08
- bnecompute10
- sydcompute01
- sydcompute02
- sydcompute03
- sydcompute04
- sydcompute05
- sydcompute06
- sydcompute07
- sydcompute09
- sydcompute10
- sydcompute18
- melcompute01
We will provide an incident post-mortem later in the week.
Jun 18, 23:03 AEST
Monitoring -
We have disabled kernelcare on all host nodes and have not seen any further kernel faults. We will continue to monitor the platform but at this point believe that this incident has been resolved.
An incident post-mortem will be available later in the week.
Jun 18, 22:05 AEST
Update -
We are continuing to disable kernelcare and restart affected host nodes.
Jun 18, 21:19 AEST
Update -
We believe this issue relates to a KernelCare update released today and are in progress of disabling it.
Jun 18, 20:41 AEST
Update -
Host nodes bnecompute03, bnecompute09, melcompute01 are affected.
Jun 18, 20:25 AEST
Identified -
This host node experienced a kernel fault requiring a reboot to correct. We have done so and are currently bringing customer VPS back online.
Jun 18, 20:19 AEST
Investigating -
We are currently investigating this issue.
Jun 18, 20:13 AEST