The issues in Miami are known, investigation is underway, updates will go here.

 

Update 1# 

 

This is a DC wide issue, all servers are unreachable, the exact reason is not known but is likley power related  network related, service credits will be issues where appropriate depending on the final RFO.


Update 2#

 

The DC has informed me that the issue affecting our servers is network related although have not expanded much beyond that, this means the servers are not physically afected so when the network is restired everything should be as it was.


Update 3#

 

Inception hosting uses ServerAxis in Miami who are in Miami-IX (the DC), the lack of information forthcoming is beyond a joke but Inception Hosting is being kept largely in the dark, it is not an acceptable situation that we collectively find ourselves in, no ETA can be given on restoring the network, once it is up plans will be made and accelerated to switch to another transit provider in Miami as is can never happen again.

 

Update 4#

 

As communication has completely stopped from the DC steps are being taken to start moving services elsewhere, an email is being drafted to all affected users now, while it is hoped that service will be resumed obviously no ETA can be given and this situation is unacceptable so changes must be made, further details will be in the RFO.


Update 5#

 

RFO sent out to all customers affected.

Update 6#

 

Copy of the RFO for those who host their email on the affected servers:

 

 

Late afternoon on the 29th February connectivity to all Inception Hosting servers in Miami was gradually lost over a period of a few hours. This included all remote and out of band access via iLO/IPMI equipment. All controls within the provider’s control panels also failed to respond to any requests. Requests for information and remote hands assistance was sent immediately following this.

 

 

 

Over the past 24 hours there have been precisely 3 responses from ServerAxis whom we trusted to provide service and networking within Miami-IX (datacentre). Initially they were looking in to it as at this point it was incorrectly assumed to be power related due to the nature of the failure. Despite numerous attempts to gain updates very little was forthcoming until confirmation was sent that it was a networking issue affecting all servers and services.

 

 

 

Many further attempts were made to glean more information including around 100 phone calls, none of which were answered. A final update came from ServerAxis around 6 hours ago stating they are still checking.

 

 

 

Obviously, this level of service is completely unacceptable. As it turns out, from communications with others, this issue is not isolated to Inception hosting but to all ServerAxis customers. It was then discovered that the entire /20 subnet was dropped by Cogent (network upstream) and it is assumed that there is a financial dispute between Cogent or Miami Internet Exchange directly and ServerAxis waiting to be resolved. Offers have been made by Inception Hosting to pay any outstanding bills to restore service temporarily for the benefit of our customers however Cogent will not communicate with a third party and ServerAxis simply are not responding.

 

 

 

With this in mind and no predictable ETA in sight for getting things restored, I have been liaising with other hosts to get alternative hardware provisioned within Florida. The best time scale offered in Florida due to the hardware spec. required is 14 days which is obviously not acceptable. Therefore requests were sent further afield. One of our long term, trusted suppliers has been able to offer immediate interim hardware (within the next 4 hours) pending the delivery of the actual replacement hardware this weekend.


It will take some time to get the interim hardware setup and configured, but please be assured that everything that can be done is being done. ServerAxis were initially selected due to their long record in business. This has come as a complete surprise and one that could not be predicted. It has left us, our customers and our customer’s  customers in a bad and in some cases vulnerable position, with a completely unacceptable response. In light of this even when (hopefully a when and not an if) access to the servers is restored, Inception hosting will not continue service with ServerAxis beyond some access for recovering customer data. A migration was being planned for better network diversity anyway, but it was never expected to happen under these circumstances.

 

 

 

On a lighter note, the change will bring some strengths and advantages. The new platform will be KVM based rather than Xen, which will offer some significant flexibility advantages. OpenVZ services will remain the same but will be switched to SolusVM for management instead of Virtualizor. The new hardware will have newer, faster CPU’s as well as enterprise drives in raid 10 with 1TB of SSD cache on board the raid controller with cachevault licenses. Additionally, all inbound bandwidth will become unmetered. The only shortcoming is a lack of IPv6initially however this will be in place within a few weeks of the new hardware delivery.

 

 

 

If you are able to wait until the weekend you can go directly on the new hardware/network. If you require your server sooner please open a support ticket with ‘request to be placed on interim hardware’ in the subject line. These tickets will then be actioned in the order they are opened when the hardware is ready. Limited space is also available in the Netherlands and UK if you would rather not wait at all.


I would like to apologise again for the inconvenience this will no doubt cause to many people but due to the nature of the problem it is impossible to mitigate the impact in any other way.

 

 

 

Further instructions will follow where necessary.

 

Update 7#

 

The interim server is online with IOflood, templates plans and WHMCS linking needs to be done as well as final checks, it is expected to be possible to start creating services between 8am and 10am GMT+0 2nd March.


Update 8#

 

Some delays were experienced in getting the config right on the interim hardware, and obviously dealing with a massive ticket volume right now is impeding progress, I expect to be provisioning servers for users within the hour.

Once the initial queue of server provisioning has been done a new mass email will be sent with a brief update and better explaining some of the finer points as it seems it was not clear that there is no guarantee of getting access to the old hardware and as time goes by the chances diminish, as if the time of writing this update 11:26 GMT+1 2nd March no response has been given by ServerAxis, it this stage it is a case of hoping for the best but planning for the worst.

 

Update 9#

All affected customer sent the following email to clarify the current position and suggested next steps.

 

 

This email is an update to the RFO sent out yesterday to all customers regarding the Miami outage.

 

 

 

Sadly there has still been absolutely no communication from ServerAxis, it seems that they are either simply ignoring everyone or are not in a position to provide any answers, at this stage it is expected that the chances of regaining access to the now old servers in Miami are slim, if access is not regained by Monday it will be considered a lost although the situation will continue to be pursued beyond Monday.

 

 

 

To clarify the position right now, the interim hardware is configured and available, provisioning of fresh servers can be done now on request, once the long term replacement hardware is delivered at the weekend anyone on the interim hardware will be moved in to that at a suitable time, the new IP issued to you  will not change during that process and the migration will take between 10 and 30 minutes depending on the size of your disk image.

 

 

 

It is simply not possible to recover data at this stage as it is physically not accessible by any means, you are obviously welcome to wait in hope that access is restored so you can migrate but the feeling is at this stage that you should consider getting set up fresh so you can start to rebuild and restore from your own backups.

 

 

 

The free backup space offered to all VPS customer which many of you take advantage of is in an independent location so is unaffected by this outage, if you have not taken advantage of this in light of the current situation I urge you to do so at the earliest convenience if you do not already have your own backup plan in place.

 

 

 

Those that have requested to be placed on interim hardware have been issued a fresh VPS, if anyone would like to proceed with this please do open a ticket requesting it and it will be actioned as soon as possible.

 

 

 

Once the long term replacement hardware is in place the process of issuing everyone a fresh server will start, you will get an email confirming your new server details and login information.

 

 

 

Should the old Miami hardware become available after this time a bulk email will go out to everyone urging you to grab your data as fast as possible, due to the sheer volume of data and time it would take to manually migrate the raw disk images this will not be done for you as it would take an estimated 15 days, during which time neither your new or old server would be usable.

 

 

 

If the old Miami servers do come up you will be able to access them over ssh but not via SolusVM, in order to give you the best possible chance of recovery if they do come back online I suggest you sign up for a monitoring service such as uptime robot, it is free and can check to see if your server is up every 5 minutes: https://uptimerobot.com/ , this will also be communicated to you via email by Inception Hosting but you will be able to react faster to a real time alert.

 

 

 

Thank you for the many words of support received during this difficult time, it really is appreciated.

 

 

Anthony.


UPDATE 9#

The new hardware is in place, saldy the cachevault needed to be replaced after initial tests, this is being done now and everything will be re-tested, it was hoped that the software install would have started by now but nothing is ever that straight forward.

Slight delay from being a day ahead but VPS provisioning should start first thing tomorrow.


Update 10#

 

There are continuing issues with the storage and cachcade config that are taking a long time to address, everyone is working very hard to get things right to avoid long term IO issues however this has introduced further delays, to combat this the OpenVZ services will start moving to the interim hardware immidiatley as it is a more efficient process to bulk move all OpenVZ containers in 1 go later down the line.


Should these issues persist in to Monday the same approach will be taken for KVM too.

Sorry for further delays it was hoped that things could be in their final state by now but sadly that has just not been possible, work is continuing as fast as is possible.

UPDATE 11#

All OpenVZ and KVM services have now been re-provisioned, if you feel you have been missed for any reason open a ticket asap, a few more checks will be done manually tomorrow anyway to ensure everyone was captured.

People who were initially moved on to the interim hardware will be moved over the comming week, further details will be sent out on that.

It seems at this stage any hope of recovery of data is gone, restoring from your own backups or rebuilding should be the priority now if it was not already.

A 'wrap up' email will be sent tomorrow afternoon to all those affected in which service credits will be discussed.

Known issues at this point:

1) rDNS for IPv4 is not yet available, it will be by the end of the day tomorrow if everything goes to plan, you will be able to set your own in the network tab in solusvm.

2) IPv6 is not available, an ETA will be given in the comming weeks

3) Right now to get things up and running quick KVM templates were used rather than ISO's, ISO's will be added over the next 24 hours to give you more installation options, (CD-Rom tab in solusvm) right now only the system rescue CD ISO is available all ISO's have been added.

Update 12#

 

This email was sent to all users on 8th march:

 

 

Inception Hosting

 

Good Evening,

 

 

 

This email is a round up and final update on the situation with the loss of service in Miami.

 

 

 

Data recovery

 

 

 

Sadly there has been no communication from ServerAxis on the situation, either publicly or directly. As a result it is being assumed that no data will be recoverable at this stage. I would still encourage you to run an uptime monitor on your old IP’s for the rest of the month and obviously if I do hear anything I will make everyone aware.

 

 

 

Options are being investigated as to what further action can be taken if any as they are still trading, however options are expected to be severely limited.

 

 

 

Restoration of services

 

 

 

As of now all OpenVZ and KVM services have been restored (fresh deployments) within PhoenixNAP. Those that required immediate service before the interim hardware was available were provisioned within the EU locations to restore from their own backups.

 

 

 

Those of you who opted to go on to EU servers are welcome to remain there if it suites you or you can open a ticket to discuss migration. For those that decided to wait for the interim hardware which was available and running within 36 hours, in the end all OpenVZ services were placed on the interim hardware to avoid further delays.

 

 

 

Next steps

 

 

 

Those of you on the interim hardware will be migrated at a suitable time over the next 10 days. All OpenVZ servers will be migrated in bulk and KVM servers will be done one at a time. Best estimates are around 2 hours for all OpenVZ servers and around 5 – 10 minutes on average per KVM server. This time may increase a little for those with the larger plans 8+ GB.

 

 

 

IPv6 is being worked on. Bst estimates are 14th March. Everyone will be allocated a /64 subnet.

 

 

 

rDNS for IPv4 is still in progress. It has not been a high priority as it can be set manually on request. This will follow in the coming week.

 

 

 

 

 

Known Issues

 

 

 

Some OpenVZ containers have been deployed without vswap. This means you have no swap. You can check this with the ‘free –m’ command. If yours reads 0 swap please open a ticket. It is a quick and easy fix.

Due to human error the ISO storage on the KVM side was not correctly mounted. As such a reboot from the control panel may have your server offline. While it only affected a handful of people, if you have found yourself in this position please unmount any ISO media then hit the reboot button in solusvm.

The KVM product templates mirrored the older Xen HVM product templates. All this means is that some plans see 5GB less disk space. If you want or need the extra space that is fine. Just put a ticket in to have your allocation updated. This only affected some of the legacy Xen PV customers.

 

 

 

Some people that opted to go on to interim hardware may see the old product within the billing area marked as on a free billing cycle, these will be cleaned up in time.


Backups

The following is not intended to upset or belittle anyone and gives me no pleasure to reiterate but it needs to be said. The single biggest issue that became apparent during this entire episode is that very few people backup what they consider to be critical data. I fully appreciate no one could have planned for this particular scenario, but catastrophic raid failure and hardware failure can and does happen in this industry all the time.


Inception Hosting does not backup end user data. This is your responsibility. Inception Hosting also provides free backup space for you to do this in case you don’t have your own plan in place.

 

 

 

All of the above is covered in one way or another in both the terms (which I know very few people will ever read through) and additionally in your VPS details email, alongside the offer of free backup space which the vast majority of people do read to get login information.

Please consider your backup plan carefully going forward if this caught you out. I will be looking for some sort of viable and more automated options to offer as an at cost add-on to end users. Although the fact remains your data is your own responsibility there is no harm in trying to make this even easier for you.

Finally, if anyone is simply unsure how to backup data please feel free to open a ticket to discuss this. I am more than happy to assist you in coming up with a suitable backup plan and recovery strategy based on your unique circumstances.

 


Service credits

Inception hosting has a fairly uncomplicated policy on uptime/SLA/credits for outages http://inceptionhosting.com/tos-aup/#faq-item-211 as this would strongly fall under the category of being caused by a third party. That said, everyone likes to feel like they have been made whole and those of you who opted to wait for the interim or new hardware really helped things along so 10 days of service credit is available per VPS.


Anyone wishing to take advantage of this please open a ticket requesting it.

 

 

 

Finally, I would like to thank everyone for their patience and support throughout this entire situation. It has been highly motivating and encouraging. While these may seem like empty words that are expected on official communications I want to be clear it is genuinely felt.

 

 

 

This will be the final bulk email on the situation, however if you feel there is anything else you would like to discuss please do open a ticket and I will respond directly. If it is regarding recovery of old data I am sorry but there is literally nothing I can do to help with that.

 

 

 

UPDATE 13#

A scheduled maintenance window to migrate people on the interim node will be sent out soon to those affected for migration on to the final hardware. 

 



Lunedì, Febbraio 29, 2016

« Indietro

Powered by WHMCompleteSolution