Category - Service status

Fri, 26 Feb 2010


Eirik Pettersen

Fri, 26 Feb 2010, 10:38


RESOLVED: Service Outage 10:39 GMT 26 Feb 2010

DETAIL: A routine disk resize has hung and we need to restart the filers

RESPONSE: We are currently rebooting the necessary servers. Service should be restored shortly

UPDATE 10:48: Service has been restored. Sorry for any inconvenience.

UPDATE 10:52: Another reboot has been required.

UPDATE 11:00: Service has been restored.

Tue, 5 Jan 2010

Eirik Pettersen

Tue, 5 Jan 2010, 21:44



RESOLVED: Service Outage 20:57 GMT 05 Jan 2010

DETAIL: Our database server has required an emergency restart

RESPONSE: We are currently restarting all dependent services

UPDATED 21:47 GMT: Services have been returned

UPDATED 10:17 GMT 06 Jan 2010: Our apologies for this very unexpected outage. All website data is safe and all services were returned to normal in 50 minutes.

We were due to upgrade the software licences on our database layer which should have happened automatically. Unfortunately, there was a miscommunication with our database vendor that caused our database layer to shut down. After emergency discussion with our vendor the issue was resolved.

This is a unique event and will not happen again. We can only apologise for the inconvenience caused by this downtime.

Fri, 27 Nov 2009

Eirik Pettersen

Fri, 27 Nov 2009, 14:35



RESOLVED: Service Outage 14:31 GMT 27 Nov 2009

DETAIL: We are currently experiencing a problem accessing our filers

RESPONSE: We will need to reset the service, this should not take much time

UPDATED 14:47: The restart was successful.

Tue, 24 Nov 2009

Walt

Tue, 24 Nov 2009, 22:14



RESOLVED: Service Outage 24 November 22:09 GMT

DETAIL: We are currently experiencing another a problem with our internet connectivity.

RESPONSE: We are working with our provider to restore services as soon as possible.

UPDATED 22:24: Our network provider has confirmed the problem and are working to restore service.

UPDATED 22:52: Services appear to have been restored.

UPDATED 11:03 30 Nov: Incident report from Telstra Added

FOLLOW UP: We have received the following incident report from our upstream provider:-

UPDATE TO TELSTRA INCIDENT REPORT FOR OUTAGES ON 23 AND 24 NOV 2009

Having identified the root cause of the previous day’s incident engineers planned an activity to configure the x.x.0.0/16 summary route on to the appropriate Juniper core routers to allow a future decommissioning of the legacy routers. The configuration of this summary route on the Juniper core routers resulted in Internet services for certain customers being affected again.

The engineers were unable to quickly isolate the cause of this issue and so reversed the change in order to restore service. However once the change had been reversed service was not restored for all customers as it should have been. The engineers identified a spurious route being received from the legacy routers which appeared to be causing the problem. The engineers reset the BGP sessions to the legacy routes which removed the spurious route and restored service to affected customers.

The engineers later identified a Juniper OS bug that had caused the reversal to be unsuccessful. Telstra has already been testing a later version of the Juniper OS in their labs which is intended for network wide deployment. Juniper has confirmed that this specific bug is resolved in the release in test but Telstra will also include this bug in their test planning prior to deployment in production networks.

REMEDIAL ACTION

  1. An urgent cross-functional review of the current MPLAN process has been scheduled (including a detailed analysis of our planning and handling of this incident).
  2. All MPLAN’s now have an extended Director level approval policy whilst we review the current planned works process.
  3. All MPlans will be checked after completion to ensure that the works have been carried out in accordance with the plan.

Telstra would like to take this opportunity to sincerely apologise for the disruption and inconvenience that these incidents have caused. Please be assured that the immediate actions stated above have been given the highest priorities within Telstra to be implemented as quickly as possible. This is in order to avoid further incidents and to provide the highest levels of service to our customers.

Mon, 23 Nov 2009

Walt

Mon, 23 Nov 2009, 13:10



RESOLVED: Service Outage 23 November 13:02 GMT

DETAIL: We currently experience a problem with our internet connectivity.

RESPONSE: We are working with our provider to restore services as soon as possible.

UPDATE: Full service was restored at 13:25 GMT. We are now in contact with our provider to find out the cause of the outage.

Wed, 28 Oct 2009

Walt

Wed, 28 Oct 2009, 17:53




Scheduled Mail Service Maintenance 00:00 - 06:00 GMT 28, 29 and 30 Oct 2009

REASON: Network distribution maintenance

PLANNED DURATION: 3 days (limited to a few hours each morning)

NOTES: Our domain partner Gandi will be performing major upgrades to their network distribution infrastructure during the nights of 28 to 30 October. During this time email services will experience a number of intermittent disruptions in connectivity beginning at 12:00am (GMT) with maintenance activities finishing by 6:00am (GMT) each day. We will endeavour to minimise impact to services where possible, but given the scale of the upgrades being performed, most services will encounter at least some disruption. No mail will be lost but access may be restricted and delivery of mail delayed.

We apologise in advance for any inconvenience this work may cause.

Thu, 15 Oct 2009

Walt

Thu, 15 Oct 2009, 15:39



RESOLVED: Service Distruption 15:30 - 16:20 BST (GMT +1) 15 Oct 2009

DETAIL: We are currently experiencing some unexpected problems with our service delivery.

RESPONSE: We are investigating this and expect to have this resolved promptly.

DURATION: We will provide an update on the cause and resolution once completed.


UPDATE: 16:20 BST (GMT +1) 15 Oct 2009

We were performing a routine maintenance operation to expand disk space. This normally requires no downtime, however the operation hung causing a cascading problem which required us to restart the entire service.

Because the restart was unscheduled we took great care to minimise risk. This means that restarting our entire service takes time.


Tue, 22 Sep 2009

Walt

Tue, 22 Sep 2009, 11:20



POSTPONED: Scheduled Downtime 06:00 - 08:00 BST (GMT +1) 25th Sep 2009

REASON: Data cleansing and archiving

PLANNED DURATION: 2 hours

NOTES: To improve the efficiency of our database, we plan to archive legacy data. This operation needs to be done off-line, that is all websites must be taken off-line for this period. Customers will be shown a page advising them that their site is unavailable and will be back up shortly.

We apologise for any inconvenience.

Fri, 21 Aug 2009

Walt

Fri, 21 Aug 2009, 07:32




Scheduled Downtime planned 07:30 BST (GMT +1) 21, 24 & 26 August 2009

Scheduled Downtime

We will be performing routine database maintenance this morning which will take approximately 1 hour to complete. We will also be performing this action on Monday (24th) and Wednesday (26th), next week, in similar one hour blocks so that we can minimise disruption.

Thank you for your patience and apologies for any inconvenience caused.

REASON: Database Maintenance

PLANNED DURATION: Approximately 60 minutes each time.

NOTES: This is routine maintenance on the database which should result in improved overall performance of sites. We opted for these changes to be split into three different time slots during our periods of low activity in order to reduce the impact on customers.

NB. For the most up-to-date details on SiteMaker status, please visit our SiteMaker Service Status site and subscribe to the RSS feed for this service.


UPDATE: 8:39 BST (GMT +1) 21 August 2009

Maintenance completed successfully and service fully restored.

UPDATE: 8:32 BST (GMT +1) 24 August 2009

Maintenance completed successfully and service fully restored.

UPDATE: 8:15 BST (GMT +1) 26 August 2009

Maintenance completed successfully and service fully restored.


Thu, 2 Jul 2009

Walt

Thu, 2 Jul 2009, 09:47



RESOLVED: Service Disruption 09:11 BST (GMT +1) 02 Jul 2009

DETAIL: We are experiencing a distributed denial of service (DDoS) attack on the services, which is currently overloading our firewalls.

RESPONSE: Our technical team are working hard to get this attack under control and return full service to all customers. We hope to have this completely resolved shortly.

UPDATE: We've managed to block what appears to have been three different types of DDoS attack. Combating the three attacks has taken us longer than expected but we appear to have got them under control as of 11:35 but continued to experience minor issues until 12:32 BST (GMT +1).

- page 1 of 3