All services are operating normally at this time.
Data Center | Current | Last Day | Last Month | Last Year |
---|---|---|---|---|
Dallas, TX | Up | 100% | 100% | 100% |
Seattle, WA | Up | 100% | 100% | 100% |
Piscataway, NJ | Up | 100% | 100% | 100% |
Los Angeles, CA | Up | 100% | 100% | 100% |
London, UK | Up | 100% | 100% | 100% |
Maidenhead, UK | Up | 100% | 100% | 100% |
The Netherlands | Up | 100% | 100% | 100% |
dalstorage4 issue
dalvz7highram11 has crashed.
Emergency reboot of ukpure11
Packet loss in our UK location
ukvz7highram2 down
Connectivity issues affecting Seattle DC
Seattle reachability issues
Seattle network issues
Rebooting dalvz7highram12
NJ network problem
seapure1 connectivity
dalvz7highram6
seavz7highram1 accessibility issues
seavz7highram1 connectivity issues
Network Issue in Dallas
Hello,
OFFICIAL RFO - 10/28/2019
Summary of Incident:
———————————————
Yesterday, Monday October 28th 2019, at approximately 4:23pm portions of customers in our TPA1, TPA2 and DAL1 data centers experienced a loss of network that lasted anywhere from a few minutes to a few hours depending on your server(s) location. The cause of the issue has been identifed and is as follows:
At roughly 4:23pm one of our Network Engineers applied a policy update to our DAL1 edge routers. This policy update was incomplete which led to the full internet routing table being propogated throughout the aggreagation layer of DAL1. This mistake was further exacerbated when that full routing table was automatically injected into the Hivelocity DDoS protection network resulting in the full routing table being distributed to other Hivelocity facilities, i.e. TPA1 and TPA2. The full internet routing table injection led to multiple network devices having their resources exhausted which ultimately led to the network disuption. Once our Network Engineers identified the cause of the issue we began reloading each of the affected network devices to correct the problem. Ultimately, yesterday's network event was a result of human error.
Service Impact Times:
———————————————
October 28th, 4:23pm - 6:44pm EST
Remediation Plans:
———————————————
We have implemented new router policies that will prevent full route tables being similarly propogated should human error ever occur again. Additionally, we have implemented new review protocols to minimize the likelihood of any human error occurring.
For years most of our customers have experienced 100% uptime due to our redundancies and nearly 2 decades of experience. We take our responsbility to you very seriously and no one hates it more than us when we fall short of our goals. We are deeply sorry for the inconvenience and any negative impact this disruption had on your operation.
seavz7highram1 connectivity errors
LA server connectivity issues
ukhighram2 is down
lastorage1 is offline
Tuesday, 01 October 2019, 15:21 - We're aware that this server is currently offline. We've reached out to the datacenter and are currently assessing the situation. Further updates will be tagged to this announcement as they become available. Thank you for your patience.
Wednesday, 02 October 2019, 23:12 - From what it looks like some of the harddisks in the RAID60 array have silently spitted errors and the RAID card (HP P420) wasn't aware of this and therefore the filesystem has crashed. We have brought the node back online and we are currently running fsck.
Friday, 04 October 2019, 18:10 - We're still working on this. We'll update you as soon as further details become available.
Sunday, 06 October 2019, 04:56 - fsck process is still ongoing.
Tuesday, 08 October 2019, 11:55 - fsck remains ongoing at this time, and is expected to take a bit of time given the size of the disk array. We'll update this anouncement as we have more information.
Thursday, 10 October 2019, 09:06 - fsck process continues.
Monday, 14 October 2019, 02:06 - fsck process has been finished. Some of the VPS's are online. Some are not booting up due to corrupt files. We are working on this.
Monday, 23 October 2019, 14:18 - We were only able to recover around half of the VPS's from this node. We have built a new node and we are syncing the data to it right now. A ticket will follow for your situation.
Dallas network connectivity issues
08/05/2019 06:57 - We are having connectivity issues in Dallas at the moment. Our engineers is working to resolve the issue.
08/05/2019 07:49 - Our engineers are still trying to resolve the issue. Your patience is appreciated.
08/05/2019 08:02 - We have found the root cause of the issue and 90% of the servers are back online now.
08/05/2019 09:53 - All the servers are back online. The issue was related to a bug in our networking gear. We have applied a fix for this and we'll upgrade our core networking gear to have full redundancy to avoid issues like these in the following month. A full RFO will be posted here.
------------------------------
Reason for Outage:
Out of a sudden we have started to notice some of the IP addresses disappearing from the ARP table of our core router and we have started investigating the issue.
All the network configuration was correct and it's been running with the same configuration for the past 4 years without issues.
We have then identified that once a random MAC ID gets online, the ARP table was getting filled up in a few seconds which was causing the unstability.
We have narrowed down to which MAC ID was causing the issue and then we have replaced it.
We have notified the manufacturer of the routing gear about the issue and we are awaiting an explanation.
We are sorry for the inconvenience caused by this.