Category - Service status

Thu, 25 Jun 2009


Walt

Thu, 25 Jun 2009, 09:59



RESOLVED: Service Disruption 06:10 BST (GMT +1) 25 Jun 2009

DETAIL: We are experiencing a large DDoS on one of our hosted sites, which is currently overloading our firewalls.

RESPONSE: Our technical team are working to resolve this issue as quickly as possible and should have it resolved shortly.

UPDATE: We are being flooded by an attack on one of our sites with 75% of all incoming traffic from over 2,300 IPs, which is causing all traffic to be congested and page load times to be affected. We are still currently investigating how to dynamically block this content.

Fri, 15 May 2009

Walt

Fri, 15 May 2009, 11:31



Scheduled Downtime planned 09:30 BST (GMT +1) 18 May 2009

Scheduled Downtime

We will be upgrading the database software to the latest version and performing a number of data updates in preparation for the release of the next version of SiteMaker.

Thank you for your patience and apologies for any inconvenience caused.

REASON: Database upgrade and software updates

PLANNED DURATION: Up to 60 minutes.

NOTES: Full details on the release and the new features will be provided following the upgrade.

NB. For the most up-to-date details on SiteMaker status, please visit our SiteMaker Service Status site and subscribe to the RSS feed for this service.

Tue, 7 Apr 2009

Walt

Tue, 7 Apr 2009, 23:48



RESOLVED: Service Disruption 16:40 BST (GMT +1) 07 Apr 2009

RESOLVED: Service Disruption - Please see entries below for more details on the disruptions and on essential maintenance be carried out...

Continue reading...

Wed, 26 Nov 2008

Walt

Wed, 26 Nov 2008, 10:40



Incident Report re: Service Outage 25th November 12:52-13:05 and 15:32-17:12 GMT

What Happened

Yesterday we suffered two incidents, one starting at 12:52 GMT lasting for 13 minutes resulting in very slow response times and another one starting at 15:32 GMT lasting for 1hr 40 minutes, which included some periods of full loss of service.

Service was restored to normal at 17:12 GMT and all sites were up and running with normal response times.

Continue reading...

Fri, 31 Oct 2008

Walt

Fri, 31 Oct 2008, 18:35



Incident Report re: Service Outage 29th October 16:42 - 17:37 GMT

At 16:42 on the 29th October we were alerted to the fact our main file server appeared to be offline. This meant that customers' uploaded files were unavailable. Once it was confirmed that the file server was not accessible, we immediately instigated our recovery plan and brought the standby file server online. We then attached the network storage to it, thus restoring access to customer uploaded files. Services were brought back up by 17:32.

On further investigation, we noticed that the server detected that it had entered an inconsistent state and as a result, halted itself as a safety measure. This an extremely rare occurrence and is the first time we have encountered this behaviour since SiteMaker began. However, this is the reason we have redundant hardware enabling us to quickly fail over to our standby system and recover services rapidly.

As a precaution, we have updated the operating systems of the file servers and, although we have no reason to suspect any damage, we are running a full diagnostics on the hardware.

Walt

Fri, 31 Oct 2008, 18:18



Incident Report re: Service Outage 23rd October 19:10 - 21:30 BST

We host our servers at a secure data centre managed by Telstra, a reputable colocation provider. As part of their colocation service, they provide dual uninterrupted power supplies. Each server on site thus has two independent power feeds to ensure that in the event of either power supply failing, the servers will continue running without interruption.

At 19:10 (18:10 GMT) on 23rd October, Telstra engineers started planned maintenance work on one of their UPS (uninterruptable power supplies). During this work, the servers were to be moved temporarily onto mains power. This should have happened transparently but the additional load on the mains tripped a circuit breaker. At approximately 19:47 (18:47 GMT) the breaker was reset and power was restored; the servers were then brought back online one by one. The servers have been subsequently restored to redundant UPS and will continue to be fully protected in the future.

A full report on the outage and the remedial actions from Telstra is attached below.

In the hours that followed the file server noticed certain problems with the filesystems. As each problem was detected, to ensure data integrity, the affected file system was remounted in 'read-only' mode. This ultimately resulted in 50% of the file systems going into a read-only state, meaning for half of our sites file uploads and other 'write' activities (such as form submissions) failed. This could not be rectified until the following morning.

In order to do a thorough filesystem consistency check, each affected filesystem needed to be taken offline. We therefore took each individual affected filesystem offline one-at-a-time to perform these checks in order to cause minimum disruption. This meant that affected customers may have experienced 'Internal Server Errors' for a 20 minute period while their file system was checked.

We have taken multiple measures to ensure that our services remain available in the event of power failures. The failure of Telstra to guarantee a permanent supply was an unforeseen and extraordinary event. Again, we apologise for any inconvenience you may have experienced.

Continue reading...

Wed, 29 Oct 2008

Walt

Wed, 29 Oct 2008, 17:17



RESOLVED: Service Outage: 29th October 16:42 - 17:37 GMT

Service Outage: 29th October 16:42 GMT

We have had a very unexpected failure with the Disc Array. We are currently trying to bring up the Standby system to restore services to all customers.

You will be notified immediately of any changes to this condition and the expected ETA when we have one.

We appreciate this is not great timing after our last outage but this is completely unrelated and is being working on by our own technical team who are doing everything in their power to bring up the standby service as quickly as possible.

Thanks

================

UPDATE: 17:37 GMT

All services have been restored to normal operation. We will obviously investigate the cause of the incident and once we have this and get an explanation from our Data Centre over the cause of last weeks downtime we will inform all customers. We will also notify you of steps to avoid these situations occurring again.

If you encounter any difficulties with viewing your site we recommend clearing your cache (temporary Internet files) and reloading your browser. This will fix problems with pages loading and file uploads.

Thanks again for your patience.

Thu, 23 Oct 2008

Joe

Thu, 23 Oct 2008, 21:43



RESOLVED: Service Outage: 23rd October 19:10 - 21:30 BST (GMT +1)

Service Outage: 19:10 - 21:30 BST

A power failure at our data centre resulted in a complete loss of service at 19.10. This caused all of our servers to shut down unexpectedly and then restart. We then had to run full diagnostic checks to make sure they were running correctly before bringing the service back online at 21.30 BST.

The service has now been fully restored, with no loss of data or other negative side effects. Latency may be a little bit higher than normal over the next few hours while caching is restored on the live system.

We apologies for this unexpected outage, and will be following up with our network provider to discover the cause, and why secondly power supplies were not called into use.

Thank you for your patience during this outage.

==============

UDPATE: 10:12 BST (GMT +1)

Because the power failure caused our machines to shut down unexpectedly, our disk array which contains uploaded files restarted in 'read only' mode as a precaution. This means we have to run full disk integrity checks before we can go back to 'write' mode. Only 50% of sites have been affected by this, and we will repair them 10% at a time.

In any event, all content on these disks is backed up both on and off site, so in event of any errors in the disk, we can fully restore the files.

We will let you know when this is complete.

==============

UDPATE: 14:00 BST (GMT +1)

The service is now fully restored.

This means that all websites should now:

  • Be fully accessible
  • No longer display a "500 Internal Server Error" message
  • Permit files to be uploaded normally
  • Enable forms to be submitted without error
  • Allow pages to be saved correctly

Now that the service has been fully restored we will begin our investigation into the causes of the outage and how we can ensure that this is not repeated. We will ensure that you are kept informed of our findings as well as the solutions.

If you are still encountering problems related to this outage (see above) you may wish to:

  1. Flush or clear your cache/temporary Internet files within your web browser. This will ensure any error pages saved in your browser will be cleared and your website reloaded from fresh.
  2. Contact support using the query form and give us as much detail as possible about the error e.g. any error messages, when it occurs, what pages it occurs on etc.

Thanks again for your patience, understanding and support with this issue. It is very much appreciated.

==============

Tue, 14 Oct 2008

Walt

Tue, 14 Oct 2008, 16:37



Mail maintenance on October 15th, 2008 at 1h00 AM BST (Midnight GMT)

Our domain name partner will be running a major maintenance operation on the Mail platform. This operation will lead to greater stability of the service in the event of an excessive load.

The service will be unavailable during the maintenance period, which will be on Wednesday (October 15th, 2008) between 8:00 PM BST (19:00 GMT) and 1:00 AM BST (23:59 GMT). Incoming mails will be received but you will not be able to access them before we have completed the maintenance work. POP, IMAP, Webmail will all be unavailable.

Tue, 7 Oct 2008

Josh

Tue, 7 Oct 2008, 16:04



Database Update 09/10/2008 - 10:30 BST (GMT +1) - Rescheduled to 13/10/08 - 15:00 BST (GMT +1)

A number of database changes need to be run to enable new features in SiteMaker. Unfortunately this means a service outage for no more than 15 minutes on Thursday morning, 9 October 2008 10:30 BST (GMT+1).

During this time, visitors to SiteMaker sites will be presented with a holding page informing them that the site should be back soon.

We are sorry for any inconvenience this may cause.

Regards,

Hiren Joshi

Systems Manager

Update:

This has now been rescheduled to 13/10/08 15:00 BST (GMT +1), we thank you for your understanding.

- page 2 of 3 -