<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet title="XSL formatting" type="text/xsl" href="http://www.moonfruitlounge.com/feed/rss2/xslt" ?><rss version="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:wfw="http://wellformedweb.org/CommentAPI/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
  <title>Moonfruit Lounge - Service status</title>
  <link>http://www.moonfruitlounge.com/</link>
  <description></description>
  <language>en</language>
  <pubDate>Wed, 14 May 2008 22:25:02 +0100</pubDate>
  <copyright></copyright>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Dotclear</generator>
  
    
  <item>
    <title>Notification of Planned System Downtime</title>
    <link>http://www.moonfruitlounge.com/post/2008/04/11/Notification-of-Planned-System-Downtime</link>
    <guid isPermaLink="false">urn:md5:0918f6ac9536f9f234544629d5d1dc65</guid>
    <pubDate>Fri, 11 Apr 2008 11:59:00 +0100</pubDate>
    <dc:creator>Walt</dc:creator>
        <category>Service status</category>
            
    <description>    &lt;p&gt;&lt;strong&gt;Time: Thursday, 17th April 2008 10 AM BST (GMT+1)&lt;/strong&gt;&lt;/p&gt;


&lt;p&gt;&lt;strong&gt;Duration: 1 hour&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;We will be performing an upgrade to our database layer. During this upgrade, sites will be unavailable for editing or viewing. Visitors will see a notice informing them the site is down for maintenance with the expected return time. Normal service should resume by 11 AM.&lt;/p&gt;



&lt;p&gt;We apologise for any inconvenience.&lt;/p&gt;</description>
    
    
    
          <comments>http://www.moonfruitlounge.com/post/2008/04/11/Notification-of-Planned-System-Downtime#comment-form</comments>
      <wfw:comment>http://www.moonfruitlounge.com/post/2008/04/11/Notification-of-Planned-System-Downtime#comment-form</wfw:comment>
      <wfw:commentRss>http://www.moonfruitlounge.com/feed/rss2/comments/710</wfw:commentRss>
      </item>
    
  <item>
    <title>Resolved: service outage: 08.10 GMT 25 Jan 2008</title>
    <link>http://www.moonfruitlounge.com/post/2008/01/25/Service-outage%3A-0810-GMT-25-Jan-2008</link>
    <guid isPermaLink="false">urn:md5:406063ad011e0338b68c3cd9e3d17b40</guid>
    <pubDate>Fri, 25 Jan 2008 08:32:00 +0000</pubDate>
    <dc:creator>Joe</dc:creator>
        <category>Service status</category>
            
    <description>    &lt;p&gt;All services should be back to normal from 10.15 GMT. We will provide further information on the cause of the outage once or network provider makes it available to us. Apologies for this service disruption.&lt;/p&gt;


&lt;hr /&gt;


&lt;p&gt;SiteMaker services are currently due to a network outage at our datacentre affecting not just SiteMaker. Our network providers engineers are working to resolve the issue as soon as possible. We will provide updates as soon as we have them.&lt;/p&gt;</description>
    
    
    
          <comments>http://www.moonfruitlounge.com/post/2008/01/25/Service-outage%3A-0810-GMT-25-Jan-2008#comment-form</comments>
      <wfw:comment>http://www.moonfruitlounge.com/post/2008/01/25/Service-outage%3A-0810-GMT-25-Jan-2008#comment-form</wfw:comment>
      <wfw:commentRss>http://www.moonfruitlounge.com/feed/rss2/comments/591</wfw:commentRss>
      </item>
    
  <item>
    <title>Essential Maintenance at 9:27 Thursday 22 Nov.</title>
    <link>http://www.moonfruitlounge.com/post/2007/11/22/Essential-Maintenance-at-9%3A27-Friday-22-Nov</link>
    <guid isPermaLink="false">urn:md5:a80497683a70628cb29b13b83b7b2b74</guid>
    <pubDate>Thu, 22 Nov 2007 09:25:00 +0000</pubDate>
    <dc:creator>Walt</dc:creator>
        <category>Service status</category>
            
    <description>    &lt;p&gt;As already highlighted in our Message of the Day to all users (all site leaders will see this when logging in), our services have been taken down briefly for essential maintenance. This will be a brief period of downtime and should take no longer than 10-15 min.&lt;/p&gt;


&lt;p&gt;The site is back up and running (9:45). If you continue to see a maintenance page then your browser will have stored a copy of that page and you will need to clear your cache (or Temporary Internet Files). Please use the Help in your web browser to find out how to clear your cache or temporary Internet files.&lt;/p&gt;


&lt;p&gt;We thank you for your patience and understanding.&lt;/p&gt;</description>
    
    
    
      </item>
    
  <item>
    <title>Resolved: Service outage 14:15 - 14:36 GMT</title>
    <link>http://www.moonfruitlounge.com/post/2007/10/09/Issue%3A-Service-outage-14%3A15-GMT</link>
    <guid isPermaLink="false">urn:md5:c94fdc2e0d8d2be1dcfaa6c7e16312ea</guid>
    <pubDate>Tue, 09 Oct 2007 15:20:00 +0100</pubDate>
    <dc:creator>Joe</dc:creator>
        <category>Service status</category>
            
    <description>&lt;p&gt;Normal service has been restored. We will follow up with our service provider and post a summary of what happened as we know more. Apologies for the loss in service during this time.&lt;/p&gt;    &lt;p&gt;&lt;del&gt;&lt;/del&gt;-- 14:15 GMT&lt;/p&gt;


&lt;p&gt;Our network provider is currently experiencing an failure at a critical point in their network. This has caused all of their London clients to go offline, including SiteMaker. We will update you as we know more.&lt;/p&gt;</description>
    
    
    
      </item>
    
  <item>
    <title>Issue Resolved - re: 9 August 2007 1pm GMT - user uploaded files on read-only mode</title>
    <link>http://www.moonfruitlounge.com/post/2007/08/09/Issue-9-August-2007-1pm-GMT-user-uploaded-files-on-read-only-mode</link>
    <guid isPermaLink="false">urn:md5:aaa16abffc9e86aafe2c75c93eae8498</guid>
    <pubDate>Thu, 09 Aug 2007 14:48:00 +0100</pubDate>
    <dc:creator>Joe</dc:creator>
        <category>Service status</category>
            
    <description>&lt;p&gt;UDPATE: 12.32 GMT 13 August 2007: The SiteMaker network was offline for approximately 13 hours between 7pm GMT on 9 August and 8am GMT on 10 August 2007. For 24 hours following this a minority of users (approximately 6%) suffered a loss of images on their sites while files were restored from backups. By 8am GMT Saturday 11 August all files were restored and all systems and sites fully operational again, though most users had service returned 24 hours prior to this.&lt;/p&gt;    &lt;p&gt;This was the result of a critical systems failure in our file system that serves user uploaded files (e.g. JPEGS, MP3s, etc.). This was the first such failure of this system in our 8 year service history, and also the worst outage to affect the SiteMaker network during that time.&lt;/p&gt;


&lt;p&gt;The events that led to the service going offline were complex and partly driven by safety procedures that kicked in to ensure that no further data was lost or corrupted. The network is designed to cope with system failures with no loss of service, but in this case we were unable to prevent it.&lt;/p&gt;


&lt;p&gt;We fully understand that service disruption of this kind and duration is unacceptable and for this we sincerely apologise. We are taking further action in the next few days to ensure that this doesn't happen again.&lt;/p&gt;


&lt;p&gt;If you are still seeing any lost files or unusual site behaviour, please submit a support ticket and we will follow it up to resolve this issue for you.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Detailed explanation of what happened&lt;/strong&gt;&lt;br /&gt;
The SiteMaker network is built up by a large number of servers and data storage devices which work together to provide the service. Each machine has layers of redundancy in it, e.g. multiple pieces of hardware providing the same function, and each service has multiple machines in a pool to draw from, meaning that one or more could go offline with no loss to the overall service provision. In addition to this, all data is backed up both locally and in an offline facility.&lt;/p&gt;


&lt;p&gt;This means that in most cases, the SiteMaker network is extremely resilient to failures, and customers will not notice any disruption while pieces of hardware are replaced or new machines added or removed from the pool.&lt;/p&gt;


&lt;p&gt;&lt;em&gt;So what went wrong?&lt;/em&gt;&lt;br /&gt;
Well the file system for user uploaded files is a large array of disks with many Terabytes of data (1000s of Gigabytes) on it. The disks are set up so that a number of them could simultaneously fail with no loss of data, and like the other services, all the data is backed up elsewhere.&lt;/p&gt;


&lt;p&gt;The cause of the failure was a corruption of the file system data, which mean the location and references to some of the files on the disks were lost. This meant that while the files were still there, their location had become unknown to the system, and we could no longer serve them.&lt;/p&gt;


&lt;p&gt;This failure caused the disk systems safety features to kick in and prevent any new files being added. It further required a full disk check to be run before services could be restored. This forced us to bring the whole service down to run the check. With many terabytes of data, this took 13 hours.&lt;/p&gt;


&lt;p&gt;Once the disk system was happy that there were no disk errors, we brought the service back up and began the restore job for those files that had become 'lost'. For this we had to rely on our offsite backup, which had to sync back old data to the live system over the internet. Despite a 100mbs connection between the 2 facilities, this again took 12 hours before all files were restored.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;What are we doing about it?&lt;/em&gt;&lt;br /&gt;
Two things caused us trouble here. Firstly, despite having large redundancy (multiple disks and backups) in the file system, we did not have a parallel file system capable of serving the files from backup while the primary file server was down. This we will rectify by investing in a parallel system which could kick in in event of the primary service going offline. This would allow us to avoid the first 13 hour downtime in event of a failure of this kind again (despite it having happened only once in 8 years, it's not good enough).&lt;/p&gt;


&lt;p&gt;Secondly, the restoration of lost files took a long time to complete. This is because only a chunk of the live backup data is stored locally, with the majority held in an off-site facility. When this failure required is to check everything, we had no choice but to use the off-site copy. To rectify this we will invest in an additional full local backup, so that any future restorations can happen much more quickly. This would allow us to reduce the second 24 hour period, to around 1 or 2 hours, though the service would remain online throughout.&lt;/p&gt;


&lt;p&gt;These changes will be in place within 3 weeks on the live system.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;What can we take from this?&lt;/em&gt;&lt;br /&gt;
Systems are designed to minimise the risk of failure in any part. However, there are always risks to any system which can lead to outages. Indeed most of the biggest names on the internet do lose service from time to time.&lt;/p&gt;


&lt;p&gt;In any system designed to be robust, when failure does happen, it's usually bad. So in this case, I can say that we are very proud and grateful to the engineering team that was able to put the system back together over this 36 hour period. And extremely grateful for the help and support provided by customers to identify problems and resolve them.&lt;/p&gt;


&lt;p&gt;However, this 'unlikely event' explanation is never an excuse, and we fully accept that this has been a disruptive and painful experience for our customers. For this we sincerely apologise and commit to taking the action described above to prevent it happening again.&lt;/p&gt;


&lt;p&gt;Thank you again for your support.&lt;/p&gt;


&lt;p&gt;Joe&lt;/p&gt;




&lt;hr /&gt;


&lt;p&gt;UPDATE 9.57 GMT 11 August 2007: The full restore finished around 9am GMT this morning. So all the images should be back. If you have specific pages where you are still missing images, then please submit these to the support queue as we will have to restore these manually. This may be the result of pages were being saved again which the images were missing, but cases should be rare. Thanks for your patience during this serious issue, and we'll follow will a full explanation and details for future plans.&lt;/p&gt;


&lt;hr /&gt;


&lt;p&gt;UPDATE 8.41 GMT 10 August 2007: Site access has now been restored. File upload/delete has now been restored. There continue to be a minority of cases where files are missing. These are currently being restored from back-up and will be available over the next few hours. A full and detailed explanation of the incident will be provided once all systems are fully restored. We appreciate your patience in this.&lt;/p&gt;


&lt;hr /&gt;


&lt;p&gt;Following a change on the file system, sitemaker users are currently unable to upload new files or delete old files. Existing files are being served in 'read-only' mode, so that in most cases your site will be unaffected. There is a minority of cases where the 'read-only' files cannot be displayed. We are working to resolve this issue.&lt;/p&gt;</description>
    
    
    
      </item>
    
  <item>
    <title>Service status RSS</title>
    <link>http://www.moonfruitlounge.com/post/2007/08/06/Service-status-RSS</link>
    <guid isPermaLink="false">urn:md5:e8b7ff087cb965000eaedf1290e980fb</guid>
    <pubDate>Mon, 06 Aug 2007 15:28:00 +0100</pubDate>
    <dc:creator>Joe</dc:creator>
        <category>Service status</category>
            
    <description>    &lt;p&gt;We've set up a new section on the blog for service notifications. Here you will find details about any planned downtime, unexpected outages and updates, and completion notes of any work on the server infrastructure. If you want to subscribe to the 'service status' RSS feed, then you will be notified whenever we publish something to this section so you can keep up to date with any issues.&lt;/p&gt;</description>
    
    
    
          <comments>http://www.moonfruitlounge.com/post/2007/08/06/Service-status-RSS#comment-form</comments>
      <wfw:comment>http://www.moonfruitlounge.com/post/2007/08/06/Service-status-RSS#comment-form</wfw:comment>
      <wfw:commentRss>http://www.moonfruitlounge.com/feed/rss2/comments/475</wfw:commentRss>
      </item>
    
  <item>
    <title>Issue - 24 July 2007 - Access/Speed Issues with SiteMaker</title>
    <link>http://www.moonfruitlounge.com/post/2007/07/24/Access/Speed-Issues-with-SiteMaker</link>
    <guid isPermaLink="false">urn:md5:a11a83b8425d7a82aa8ae86e2426549d</guid>
    <pubDate>Tue, 24 Jul 2007 10:47:00 +0100</pubDate>
    <dc:creator>Walt</dc:creator>
        <category>Service status</category>
            
    <description>    &lt;p&gt;As some of you will have noticed, loading speeds and access to your websites have not been optimal.  This is down to continued problems that we have been encountering with our network. We are in ongoing discussions with our supplier to correct this.&lt;/p&gt;


&lt;p&gt;We are also reconfiguring our current system to find additional speed improvements which in the short term may cause some further access or speed problems but in the long term should fix the problem outright.&lt;/p&gt;


&lt;p&gt;This is the first time in over 5 years we have had sustained difficulties with our network and we are confident that this will be corrected shortly and permanently. All of us at SiteMaker know how important your sites are to you and your visitors and we are working as quickly and as diligently as we can to fix this.&lt;/p&gt;


&lt;p&gt;We do apologise for the inconvenience this has caused you. We ask only for a little more patience while we resolve this problem and return our sites to a high and consistent level of access.&lt;/p&gt;


&lt;p&gt;Thanks.&lt;/p&gt;</description>
    
    
    
      </item>
    
</channel>
</rss>