Service status

Build your own website at Moonfruit.com

Service status

Entries feed – Comments feed

Friday, February 26 2010

RESOLVED: Service Outage 10:39 GMT 26 Feb 2010

By Eirik Pettersen on Friday, February 26 2010, 10:38

DETAIL: A routine disk resize has hung and we need to restart the filers

RESPONSE: We are currently rebooting the necessary servers. Service should be restored shortly

UPDATE 10:48: Service has been restored. Sorry for any inconvenience.

UPDATE 10:52: Another reboot has been required.

UPDATE 11:00: Service has been restored.

Tuesday, January 5 2010

RESOLVED: Service Outage 20:57 GMT 05 Jan 2010

By Eirik Pettersen on Tuesday, January 5 2010, 21:44

DETAIL: Our database server has required an emergency restart

RESPONSE: We are currently restarting all dependent services

UPDATED 21:47 GMT: Services have been returned

UPDATED 10:17 GMT 06 Jan 2010: Our apologies for this very unexpected outage. All website data is safe and all services were returned to normal in 50 minutes.

We were due to upgrade the software licences on our database layer which should have happened automatically. Unfortunately, there was a miscommunication with our database vendor that caused our database layer to shut down. After emergency discussion with our vendor the issue was resolved.

This is a unique event and will not happen again. We can only apologise for the inconvenience caused by this downtime.

9 comments

Friday, November 27 2009

RESOLVED: Service Outage 14:31 GMT 27 Nov 2009

By Eirik Pettersen on Friday, November 27 2009, 14:35

DETAIL: We are currently experiencing a problem accessing our filers

RESPONSE: We will need to reset the service, this should not take much time

UPDATED 14:47: The restart was successful.

4 comments

Tuesday, November 24 2009

RESOLVED: Service Outage 24 November 22:09 GMT

By Walt on Tuesday, November 24 2009, 22:14

DETAIL: We are currently experiencing another a problem with our internet connectivity.

RESPONSE: We are working with our provider to restore services as soon as possible.

UPDATED 22:24: Our network provider has confirmed the problem and are working to restore service.

UPDATED 22:52: Services appear to have been restored.

UPDATED 11:03 30 Nov: Incident report from Telstra Added

FOLLOW UP: We have received the following incident report from our upstream provider:-

UPDATE TO TELSTRA INCIDENT REPORT FOR OUTAGES ON 23 AND 24 NOV 2009

Having identified the root cause of the previous day’s incident engineers planned an activity to configure the x.x.0.0/16 summary route on to the appropriate Juniper core routers to allow a future decommissioning of the legacy routers. The configuration of this summary route on the Juniper core routers resulted in Internet services for certain customers being affected again.

The engineers were unable to quickly isolate the cause of this issue and so reversed the change in order to restore service. However once the change had been reversed service was not restored for all customers as it should have been. The engineers identified a spurious route being received from the legacy routers which appeared to be causing the problem. The engineers reset the BGP sessions to the legacy routes which removed the spurious route and restored service to affected customers.

The engineers later identified a Juniper OS bug that had caused the reversal to be unsuccessful. Telstra has already been testing a later version of the Juniper OS in their labs which is intended for network wide deployment. Juniper has confirmed that this specific bug is resolved in the release in test but Telstra will also include this bug in their test planning prior to deployment in production networks.

REMEDIAL ACTION

  1. An urgent cross-functional review of the current MPLAN process has been scheduled (including a detailed analysis of our planning and handling of this incident).
  2. All MPLAN’s now have an extended Director level approval policy whilst we review the current planned works process.
  3. All MPlans will be checked after completion to ensure that the works have been carried out in accordance with the plan.

Telstra would like to take this opportunity to sincerely apologise for the disruption and inconvenience that these incidents have caused. Please be assured that the immediate actions stated above have been given the highest priorities within Telstra to be implemented as quickly as possible. This is in order to avoid further incidents and to provide the highest levels of service to our customers.

9 comments

Monday, November 23 2009

RESOLVED: Service Outage 23 November 13:02 GMT

By Walt on Monday, November 23 2009, 13:10

DETAIL: We currently experience a problem with our internet connectivity.

RESPONSE: We are working with our provider to restore services as soon as possible.

UPDATE: Full service was restored at 13:25 GMT. We are now in contact with our provider to find out the cause of the outage.

7 comments

Wednesday, October 28 2009

Scheduled Mail Service Maintenance 00:00 – 06:00 GMT 28, 29 and 30 Oct 2009

By Walt on Wednesday, October 28 2009, 17:53

  • email
  • mail
  • maintenance

REASON: Network distribution maintenance

PLANNED DURATION: 3 days (limited to a few hours each morning)

NOTES: Our domain partner Gandi will be performing major upgrades to their network distribution infrastructure during the nights of 28 to 30 October. During this time email services will experience a number of intermittent disruptions in connectivity beginning at 12:00am (GMT) with maintenance activities finishing by 6:00am (GMT) each day. We will endeavour to minimise impact to services where possible, but given the scale of the upgrades being performed, most services will encounter at least some disruption. No mail will be lost but access may be restricted and delivery of mail delayed.

We apologise in advance for any inconvenience this work may cause.

no comment

Thursday, October 15 2009

RESOLVED: Service Distruption 15:30 – 16:20 BST (GMT +1) 15 Oct 2009

By Walt on Thursday, October 15 2009, 15:39

DETAIL: We are currently experiencing some unexpected problems with our service delivery.

RESPONSE: We are investigating this and expect to have this resolved promptly.

DURATION: We will provide an update on the cause and resolution once completed.


UPDATE: 16:20 BST (GMT +1) 15 Oct 2009

We were performing a routine maintenance operation to expand disk space. This normally requires no downtime, however the operation hung causing a cascading problem which required us to restart the entire service.

Because the restart was unscheduled we took great care to minimise risk. This means that restarting our entire service takes time.


31 comments

Tuesday, September 22 2009

POSTPONED: Scheduled Downtime 06:00 – 08:00 BST (GMT +1) 25th Sep 2009

By Walt on Tuesday, September 22 2009, 11:20

REASON: Data cleansing and archiving

PLANNED DURATION: 2 hours

NOTES: To improve the efficiency of our database, we plan to archive legacy data. This operation needs to be done off-line, that is all websites must be taken off-line for this period. Customers will be shown a page advising them that their site is unavailable and will be back up shortly.

We apologise for any inconvenience.

no comment

Friday, August 21 2009

Scheduled Downtime planned 07:30 BST (GMT +1) 21, 24 & 26 August 2009

By Walt on Friday, August 21 2009, 07:32

  • Maintenance

Scheduled Downtime

We will be performing routine database maintenance this morning which will take approximately 1 hour to complete. We will also be performing this action on Monday (24th) and Wednesday (26th), next week, in similar one hour blocks so that we can minimise disruption.

Thank you for your patience and apologies for any inconvenience caused.

REASON: Database Maintenance

PLANNED DURATION: Approximately 60 minutes each time.

NOTES: This is routine maintenance on the database which should result in improved overall performance of sites. We opted for these changes to be split into three different time slots during our periods of low activity in order to reduce the impact on customers.

NB. For the most up-to-date details on SiteMaker status, please visit our SiteMaker Service Status site and subscribe to the RSS feed for this service.


UPDATE: 8:39 BST (GMT +1) 21 August 2009

Maintenance completed successfully and service fully restored.

UPDATE: 8:32 BST (GMT +1) 24 August 2009

Maintenance completed successfully and service fully restored.

UPDATE: 8:15 BST (GMT +1) 26 August 2009

Maintenance completed successfully and service fully restored.


6 comments

Thursday, July 2 2009

RESOLVED: Service Disruption 09:11 BST (GMT +1) 02 Jul 2009

By Walt on Thursday, July 2 2009, 09:47

DETAIL: We are experiencing a distributed denial of service (DDoS) attack on the services, which is currently overloading our firewalls.

RESPONSE: Our technical team are working hard to get this attack under control and return full service to all customers. We hope to have this completely resolved shortly.

UPDATE: We’ve managed to block what appears to have been three different types of DDoS attack. Combating the three attacks has taken us longer than expected but we appear to have got them under control as of 11:35 but continued to experience minor issues until 12:32 BST (GMT +1).

33 comments

Thursday, June 25 2009

RESOLVED: Service Disruption 06:10 BST (GMT +1) 25 Jun 2009

By Walt on Thursday, June 25 2009, 09:59

DETAIL: We are experiencing a large DDoS on one of our hosted sites, which is currently overloading our firewalls.

RESPONSE: Our technical team are working to resolve this issue as quickly as possible and should have it resolved shortly.

UPDATE: We are being flooded by an attack on one of our sites with 75% of all incoming traffic from over 2,300 IPs, which is causing all traffic to be congested and page load times to be affected. We are still currently investigating how to dynamically block this content.

30 comments

Friday, May 15 2009

Scheduled Downtime planned 09:30 BST (GMT +1) 18 May 2009

By Walt on Friday, May 15 2009, 11:31

Scheduled Downtime

We will be upgrading the database software to the latest version and performing a number of data updates in preparation for the release of the next version of SiteMaker.

Thank you for your patience and apologies for any inconvenience caused.

REASON: Database upgrade and software updates

PLANNED DURATION: Up to 60 minutes.

NOTES: Full details on the release and the new features will be provided following the upgrade.

NB. For the most up-to-date details on SiteMaker status, please visit our SiteMaker Service Status site and subscribe to the RSS feed for this service.

6 comments

Tuesday, April 7 2009

RESOLVED: Service Disruption 16:40 BST (GMT +1) 07 Apr 2009

By Walt on Tuesday, April 7 2009, 23:48

RESOLVED: Service Disruption – Please see entries below for more details on the disruptions and on essential maintenance be carried out…

Continue reading…

6 comments

Wednesday, November 26 2008

Incident Report re: Service Outage 25th November 12:52-13:05 and 15:32-17:12 GMT

By Walt on Wednesday, November 26 2008, 10:40

What Happened

Yesterday we suffered two incidents, one starting at 12:52 GMT lasting for 13 minutes resulting in very slow response times and another one starting at 15:32 GMT lasting for 1hr 40 minutes, which included some periods of full loss of service.

Service was restored to normal at 17:12 GMT and all sites were up and running with normal response times.

Continue reading…

no comment

Friday, October 31 2008

Incident Report re: Service Outage 29th October 16:42 – 17:37 GMT

By Walt on Friday, October 31 2008, 18:35

At 16:42 on the 29th October we were alerted to the fact our main file server appeared to be offline. This meant that customers’ uploaded files were unavailable. Once it was confirmed that the file server was not accessible, we immediately instigated our recovery plan and brought the standby file server online. We then attached the network storage to it, thus restoring access to customer uploaded files. Services were brought back up by 17:32.

On further investigation, we noticed that the server detected that it had entered an inconsistent state and as a result, halted itself as a safety measure. This an extremely rare occurrence and is the first time we have encountered this behaviour since SiteMaker began. However, this is the reason we have redundant hardware enabling us to quickly fail over to our standby system and recover services rapidly.

As a precaution, we have updated the operating systems of the file servers and, although we have no reason to suspect any damage, we are running a full diagnostics on the hardware.

27 comments

Incident Report re: Service Outage 23rd October 19:10 – 21:30 BST

By Walt on Friday, October 31 2008, 18:18

We host our servers at a secure data centre managed by Telstra, a reputable colocation provider. As part of their colocation service, they provide dual uninterrupted power supplies. Each server on site thus has two independent power feeds to ensure that in the event of either power supply failing, the servers will continue running without interruption.

At 19:10 (18:10 GMT) on 23rd October, Telstra engineers started planned maintenance work on one of their UPS (uninterruptable power supplies). During this work, the servers were to be moved temporarily onto mains power. This should have happened transparently but the additional load on the mains tripped a circuit breaker. At approximately 19:47 (18:47 GMT) the breaker was reset and power was restored; the servers were then brought back online one by one. The servers have been subsequently restored to redundant UPS and will continue to be fully protected in the future.

A full report on the outage and the remedial actions from Telstra is attached below.

In the hours that followed the file server noticed certain problems with the filesystems. As each problem was detected, to ensure data integrity, the affected file system was remounted in ‘read-only’ mode. This ultimately resulted in 50% of the file systems going into a read-only state, meaning for half of our sites file uploads and other ‘write’ activities (such as form submissions) failed. This could not be rectified until the following morning.

In order to do a thorough filesystem consistency check, each affected filesystem needed to be taken offline. We therefore took each individual affected filesystem offline one-at-a-time to perform these checks in order to cause minimum disruption. This meant that affected customers may have experienced ‘Internal Server Errors’ for a 20 minute period while their file system was checked.

We have taken multiple measures to ensure that our services remain available in the event of power failures. The failure of Telstra to guarantee a permanent supply was an unforeseen and extraordinary event. Again, we apologise for any inconvenience you may have experienced.

Continue reading…

2 comments

Wednesday, October 29 2008

RESOLVED: Service Outage: 29th October 16:42 – 17:37 GMT

By Walt on Wednesday, October 29 2008, 17:17

Service Outage: 29th October 16:42 GMT

We have had a very unexpected failure with the Disc Array. We are currently trying to bring up the Standby system to restore services to all customers.

You will be notified immediately of any changes to this condition and the expected ETA when we have one.

We appreciate this is not great timing after our last outage but this is completely unrelated and is being working on by our own technical team who are doing everything in their power to bring up the standby service as quickly as possible.

Thanks

================

UPDATE: 17:37 GMT

All services have been restored to normal operation. We will obviously investigate the cause of the incident and once we have this and get an explanation from our Data Centre over the cause of last weeks downtime we will inform all customers. We will also notify you of steps to avoid these situations occurring again.

If you encounter any difficulties with viewing your site we recommend clearing your cache (temporary Internet files) and reloading your browser. This will fix problems with pages loading and file uploads.

Thanks again for your patience.

7 comments

Thursday, October 23 2008

RESOLVED: Service Outage: 23rd October 19:10 – 21:30 BST (GMT +1)

By Joe on Thursday, October 23 2008, 21:43

Service Outage: 19:10 – 21:30 BST

A power failure at our data centre resulted in a complete loss of service at 19.10. This caused all of our servers to shut down unexpectedly and then restart. We then had to run full diagnostic checks to make sure they were running correctly before bringing the service back online at 21.30 BST.

The service has now been fully restored, with no loss of data or other negative side effects. Latency may be a little bit higher than normal over the next few hours while caching is restored on the live system.

We apologies for this unexpected outage, and will be following up with our network provider to discover the cause, and why secondly power supplies were not called into use.

Thank you for your patience during this outage.

==============

UDPATE: 10:12 BST (GMT +1)

Because the power failure caused our machines to shut down unexpectedly, our disk array which contains uploaded files restarted in ‘read only’ mode as a precaution. This means we have to run full disk integrity checks before we can go back to ‘write’ mode. Only 50% of sites have been affected by this, and we will repair them 10% at a time.

In any event, all content on these disks is backed up both on and off site, so in event of any errors in the disk, we can fully restore the files.

We will let you know when this is complete.

==============

UDPATE: 14:00 BST (GMT +1)

The service is now fully restored.

This means that all websites should now:

  • Be fully accessible
  • No longer display a “500 Internal Server Error” message
  • Permit files to be uploaded normally
  • Enable forms to be submitted without error
  • Allow pages to be saved correctly

Now that the service has been fully restored we will begin our investigation into the causes of the outage and how we can ensure that this is not repeated. We will ensure that you are kept informed of our findings as well as the solutions.

If you are still encountering problems related to this outage (see above) you may wish to:

  1. Flush or clear your cache/temporary Internet files within your web browser. This will ensure any error pages saved in your browser will be cleared and your website reloaded from fresh.
  2. Contact support using the query form and give us as much detail as possible about the error e.g. any error messages, when it occurs, what pages it occurs on etc.

Thanks again for your patience, understanding and support with this issue. It is very much appreciated.

==============

29 comments

Tuesday, October 14 2008

Mail maintenance on October 15th, 2008 at 1h00 AM BST (Midnight GMT)

By Walt on Tuesday, October 14 2008, 16:37

Our domain name partner will be running a major maintenance operation on the Mail platform. This operation will lead to greater stability of the service in the event of an excessive load.

The service will be unavailable during the maintenance period, which will be on Wednesday (October 15th, 2008) between 8:00 PM BST (19:00 GMT) and 1:00 AM BST (23:59 GMT). Incoming mails will be received but you will not be able to access them before we have completed the maintenance work. POP, IMAP, Webmail will all be unavailable.

26 comments

Tuesday, October 7 2008

Database Update 09/10/2008 – 10:30 BST (GMT +1) – Rescheduled to 13/10/08 – 15:00 BST (GMT +1)

By Josh on Tuesday, October 7 2008, 16:04

A number of database changes need to be run to enable new features in SiteMaker. Unfortunately this means a service outage for no more than 15 minutes on Thursday morning, 9 October 2008 10:30 BST (GMT+1).

During this time, visitors to SiteMaker sites will be presented with a holding page informing them that the site should be back soon.

We are sorry for any inconvenience this may cause.

Regards,

Hiren Joshi

Systems Manager

Update:

This has now been rescheduled to 13/10/08 15:00 BST (GMT +1), we thank you for your understanding.

one comment

Β« previous entries – page 1 of 2