RESOLVED: Service Outage 24 November 22:09 GMT
By Walt on Tuesday, November 24 2009, 22:14 - Service status - Permalink
DETAIL: We are currently experiencing another a problem with our internet connectivity.
RESPONSE: We are working with our provider to restore services as soon as possible.
UPDATED 22:24: Our network provider has confirmed the problem and are working to restore service.
UPDATED 22:52: Services appear to have been restored.
UPDATED 11:03 30 Nov: Incident report from Telstra Added
FOLLOW UP: We have received the following incident report from our upstream provider:-
UPDATE TO TELSTRA INCIDENT REPORT FOR OUTAGES ON 23 AND 24 NOV 2009
Having identified the root cause of the previous day’s incident engineers planned an activity to configure the x.x.0.0/16 summary route on to the appropriate Juniper core routers to allow a future decommissioning of the legacy routers. The configuration of this summary route on the Juniper core routers resulted in Internet services for certain customers being affected again.
The engineers were unable to quickly isolate the cause of this issue and so reversed the change in order to restore service. However once the change had been reversed service was not restored for all customers as it should have been. The engineers identified a spurious route being received from the legacy routers which appeared to be causing the problem. The engineers reset the BGP sessions to the legacy routes which removed the spurious route and restored service to affected customers.
The engineers later identified a Juniper OS bug that had caused the reversal to be unsuccessful. Telstra has already been testing a later version of the Juniper OS in their labs which is intended for network wide deployment. Juniper has confirmed that this specific bug is resolved in the release in test but Telstra will also include this bug in their test planning prior to deployment in production networks.
REMEDIAL ACTION
- An urgent cross-functional review of the current MPLAN process has been scheduled (including a detailed analysis of our planning and handling of this incident).
- All MPLAN’s now have an extended Director level approval policy whilst we review the current planned works process.
- All MPlans will be checked after completion to ensure that the works have been carried out in accordance with the plan.
Telstra would like to take this opportunity to sincerely apologise for the disruption and inconvenience that these incidents have caused. Please be assured that the immediate actions stated above have been given the highest priorities within Telstra to be implemented as quickly as possible. This is in order to avoid further incidents and to provide the highest levels of service to our customers.
Comments
We are ready to launch on Dec 1st with 6k spent on radio advertising, i really hope this is a rare problem as i do not have the time to organise my site to be made again elsewhere!
I'm afraid this is not the first time this has happened. Tsk!
Hi Dan,
Despite the outage today and yesterday, our network provider is generally quite reliable. They were performing a routine upgrade yesterday and omitted a configuration. Once updated the service was returned to normal.
With regards the issue today are priority is to get the problem fixed first. Then we'll work with them to find out what happened, whether it was related to work they did yesterday, and what they'll be doing to ensure this does not happen again.
Please be assured we will be doing everything in our power to ensure this does not happen again.
We will provide an update once we have it.
- Walt
right in the middle of a save booo hoo
I was in the middle of building a site when moonfruit went down. Oh Well! It time to have coffee break. I hope moonfruit can resolve this quickly.
I love the site maker for retards. I too am retarded and even I have built a professional website. Hopefully we are talking hours of downtime and not days!
Wait? That's all well and good and I know you can't wave a magic wand but it is damned annoying as my site prides itself on its football results service - and none of my customers can use it tonight. 10pm outage - just at the height of my midweek useage. Grrrrrrr
Sites are back.. i have about 8 all are working fine!
Back on line.
iFans, I understand your frustration, we do not enjoy these outages either. We take our service delivery very seriously and we do our damned best to provide a good service. Sadly we are at times thwarted. Our current provider is a very large global provider and normally provide us with a faultless service. But we will be seeking undertakings from them that they will investigate these recent incidents and provide us with details of how they will minimise the risks of re-occurance. I can only offer our apologies for any inconvenience caused.
If any customers continue to encounter problems getting their site to come up, please do try clearing your cache just in case your browser has cached errored pages.
Thanks for your patience.
-Walt