Moonfruit Lounge - Service status http://www.moonfruitlounge.com/ en Thu, 29 Jul 2010 17:23:29 +0100 http://blogs.law.harvard.edu/tech/rss Dotclear RESOLVED: Service Outage 10:39 GMT 26 Feb 2010 http://www.moonfruitlounge.com/post/2010/02/26/Service-Outage-10%3A39-GMT-26-Feb-2011 urn:md5:ce64e4a0ee47950c6653fad9d9b77f6d Fri, 26 Feb 2010 10:38:00 +0000 Eirik Pettersen Service status <p>DETAIL: A routine disk resize has hung and we need to restart the filers</p> <p>RESPONSE: We are currently rebooting the necessary servers. Service should be restored shortly</p> <p>UPDATE 10:48: Service has been restored. Sorry for any inconvenience.</p> <p>UPDATE 10:52: Another reboot has been required.</p> <p>UPDATE 11:00: Service has been restored.</p> RESOLVED: Service Outage 20:57 GMT 05 Jan 2010 http://www.moonfruitlounge.com/post/2010/01/05/Service-Outage-20%3A57-GMT-05-Jan-2010 urn:md5:d863d5f19b08586cfb961c9304b7b871 Tue, 05 Jan 2010 21:44:00 +0000 Eirik Pettersen Service status <p>DETAIL: Our database server has required an emergency restart</p> <p>RESPONSE: We are currently restarting all dependent services</p> <p>UPDATED 21:47 GMT: Services have been returned</p> <p>UPDATED 10:17 GMT 06 Jan 2010: Our apologies for this very unexpected outage. All website data is safe and all services were returned to normal in 50 minutes.</p> <p>We were due to upgrade the software licences on our database layer which should have happened automatically. Unfortunately, there was a miscommunication with our database vendor that caused our database layer to shut down. After emergency discussion with our vendor the issue was resolved.</p> <p>This is a unique event and will not happen again. We can only apologise for the inconvenience caused by this downtime.</p> http://www.moonfruitlounge.com/post/2010/01/05/Service-Outage-20%3A57-GMT-05-Jan-2010#comment-form http://www.moonfruitlounge.com/post/2010/01/05/Service-Outage-20%3A57-GMT-05-Jan-2010#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1508 RESOLVED: Service Outage 14:31 GMT 27 Nov 2009 http://www.moonfruitlounge.com/post/2009/11/27/Service-Outage-14%3A31-GMT-27-Nov-2009 urn:md5:80c7087f5a12f6287879d9262538171f Fri, 27 Nov 2009 14:35:00 +0000 Eirik Pettersen Service status <p>DETAIL: We are currently experiencing a problem accessing our filers</p> <p>RESPONSE: We will need to reset the service, this should not take much time</p> <p>UPDATED 14:47: The restart was successful.</p> http://www.moonfruitlounge.com/post/2009/11/27/Service-Outage-14%3A31-GMT-27-Nov-2009#comment-form http://www.moonfruitlounge.com/post/2009/11/27/Service-Outage-14%3A31-GMT-27-Nov-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1478 RESOLVED: Service Outage 24 November 22:09 GMT http://www.moonfruitlounge.com/post/2009/11/24/Service-Outage-24-November-22%3A09-GMT urn:md5:c21fdfde5eb1aa97c65f6bd78b030cc5 Tue, 24 Nov 2009 22:14:00 +0000 Walt Service status <p>DETAIL: We are currently experiencing another a problem with our internet connectivity.</p> <p>RESPONSE: We are working with our provider to restore services as soon as possible.</p> <p>UPDATED 22:24: Our network provider has confirmed the problem and are working to restore service.</p> <p>UPDATED 22:52: Services appear to have been restored.</p> <p>UPDATED 11:03 30 Nov: Incident report from Telstra Added</p> <p>FOLLOW UP: We have received the following incident report from our upstream provider:-</p> <blockquote><p><strong>UPDATE TO TELSTRA INCIDENT REPORT FOR OUTAGES ON 23 AND 24 NOV 2009</strong></p></blockquote> <blockquote><p>Having identified the root cause of the previous day’s incident engineers planned an activity to configure the x.x.0.0/16 summary route on to the appropriate Juniper core routers to allow a future decommissioning of the legacy routers. The configuration of this summary route on the Juniper core routers resulted in Internet services for certain customers being affected again.</p></blockquote> <blockquote><p>The engineers were unable to quickly isolate the cause of this issue and so reversed the change in order to restore service. However once the change had been reversed service was not restored for all customers as it should have been. The engineers identified a spurious route being received from the legacy routers which appeared to be causing the problem. The engineers reset the BGP sessions to the legacy routes which removed the spurious route and restored service to affected customers.</p></blockquote> <blockquote><p>The engineers later identified a Juniper OS bug that had caused the reversal to be unsuccessful. Telstra has already been testing a later version of the Juniper OS in their labs which is intended for network wide deployment. Juniper has confirmed that this specific bug is resolved in the release in test but Telstra will also include this bug in their test planning prior to deployment in production networks.</p></blockquote> <blockquote><p><strong>REMEDIAL ACTION</strong></p></blockquote> <ol> <li>An urgent cross-functional review of the current MPLAN process has been scheduled (including a detailed analysis of our planning and handling of this incident).</li> <li>All MPLAN’s now have an extended Director level approval policy whilst we review the current planned works process.</li> <li>All MPlans will be checked after completion to ensure that the works have been carried out in accordance with the plan.</li> </ol> <blockquote><p>Telstra would like to take this opportunity to sincerely apologise for the disruption and inconvenience that these incidents have caused. Please be assured that the immediate actions stated above have been given the highest priorities within Telstra to be implemented as quickly as possible. This is in order to avoid further incidents and to provide the highest levels of service to our customers.</p></blockquote> http://www.moonfruitlounge.com/post/2009/11/24/Service-Outage-24-November-22%3A09-GMT#comment-form http://www.moonfruitlounge.com/post/2009/11/24/Service-Outage-24-November-22%3A09-GMT#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1473 RESOLVED: Service Outage 23 November 13:02 GMT http://www.moonfruitlounge.com/post/2009/11/23/Service-Outage-23-November-13%3A02-GMT urn:md5:d1d5e1e4177dad9d4bc313f2dc83e557 Mon, 23 Nov 2009 13:10:00 +0000 Walt Service status <p>DETAIL: We currently experience a problem with our internet connectivity.</p> <p>RESPONSE: We are working with our provider to restore services as soon as possible.</p> <p>UPDATE: Full service was restored at 13:25 GMT. We are now in contact with our provider to find out the cause of the outage.</p> http://www.moonfruitlounge.com/post/2009/11/23/Service-Outage-23-November-13%3A02-GMT#comment-form http://www.moonfruitlounge.com/post/2009/11/23/Service-Outage-23-November-13%3A02-GMT#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1470 Scheduled Mail Service Maintenance 00:00 - 06:00 GMT 28, 29 and 30 Oct 2009 http://www.moonfruitlounge.com/post/2009/10/28/Scheduled-Mail-Service-Maintenance-00%3A00-06%3A00-GMT-28-29-and-30-Oct-2009 urn:md5:339f5a914fc94c011a861aaf0766a19b Wed, 28 Oct 2009 17:53:00 +0000 Walt Service status emailmailmaintenance <p>REASON: Network distribution maintenance</p> <p>PLANNED DURATION: 3 days (limited to a few hours each morning)</p> <p>NOTES: Our domain partner Gandi will be performing major upgrades to their network distribution infrastructure during the nights of 28 to 30 October. During this time email services will experience a number of intermittent disruptions in connectivity beginning at 12:00am (GMT) with maintenance activities finishing by 6:00am (GMT) each day. We will endeavour to minimise impact to services where possible, but given the scale of the upgrades being performed, most services will encounter at least some disruption. No mail will be lost but access may be restricted and delivery of mail delayed.</p> <p>We apologise in advance for any inconvenience this work may cause.</p> http://www.moonfruitlounge.com/post/2009/10/28/Scheduled-Mail-Service-Maintenance-00%3A00-06%3A00-GMT-28-29-and-30-Oct-2009#comment-form http://www.moonfruitlounge.com/post/2009/10/28/Scheduled-Mail-Service-Maintenance-00%3A00-06%3A00-GMT-28-29-and-30-Oct-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1437 RESOLVED: Service Distruption 15:30 - 16:20 BST (GMT +1) 15 Oct 2009 http://www.moonfruitlounge.com/post/2009/10/15/Service-Distruption-15%3A30-BST-GMT-1-15-Oct-2009 urn:md5:47fe97007eba8c7c7b17ca2c12e04841 Thu, 15 Oct 2009 15:39:00 +0100 Walt Service status <p>DETAIL: We are currently experiencing some unexpected problems with our service delivery.</p> <p>RESPONSE: We are investigating this and expect to have this resolved promptly.</p> <p>DURATION: We will provide an update on the cause and resolution once completed.</p> <hr /> <p>UPDATE: 16:20 BST (GMT +1) 15 Oct 2009</p> <p>We were performing a routine maintenance operation to expand disk space. This normally requires no downtime, however the operation hung causing a cascading problem which required us to restart the entire service.</p> <p>Because the restart was unscheduled we took great care to minimise risk. This means that restarting our entire service takes time.</p> <hr /> http://www.moonfruitlounge.com/post/2009/10/15/Service-Distruption-15%3A30-BST-GMT-1-15-Oct-2009#comment-form http://www.moonfruitlounge.com/post/2009/10/15/Service-Distruption-15%3A30-BST-GMT-1-15-Oct-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1420 POSTPONED: Scheduled Downtime 06:00 - 08:00 BST (GMT +1) 25th Sep 2009 http://www.moonfruitlounge.com/post/2009/09/22/Scheduled-Downtime-06%3A00-08%3A00-BST-GMT-1-25th-Sep-2009 urn:md5:89731037421577782c8bde711cf6bed4 Tue, 22 Sep 2009 11:20:00 +0100 Walt Service status <p>REASON: Data cleansing and archiving</p> <p>PLANNED DURATION: 2 hours</p> <p>NOTES: To improve the efficiency of our database, we plan to archive legacy data. This operation needs to be done off-line, that is all websites must be taken off-line for this period. Customers will be shown a page advising them that their site is unavailable and will be back up shortly.</p> <p>We apologise for any inconvenience.</p> http://www.moonfruitlounge.com/post/2009/09/22/Scheduled-Downtime-06%3A00-08%3A00-BST-GMT-1-25th-Sep-2009#comment-form http://www.moonfruitlounge.com/post/2009/09/22/Scheduled-Downtime-06%3A00-08%3A00-BST-GMT-1-25th-Sep-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1394 Scheduled Downtime planned 07:30 BST (GMT +1) 21, 24 & 26 August 2009 http://www.moonfruitlounge.com/post/2009/08/21/Scheduled-Downtime-planned-07%3A30-BST-GMT-1-21-August-2009 urn:md5:a1c305f253bccd6b84a95a119694d87b Fri, 21 Aug 2009 07:32:00 +0100 Walt Service status Maintenance <p><strong>Scheduled Downtime</strong></p> <p>We will be performing routine database maintenance this morning which will take approximately 1 hour to complete. We will also be performing this action on Monday (24th) and Wednesday (26th), next week, in similar one hour blocks so that we can minimise disruption.</p> <p>Thank you for your patience and apologies for any inconvenience caused.</p> <p>REASON: Database Maintenance</p> <p>PLANNED DURATION: Approximately 60 minutes each time.</p> <p>NOTES: This is routine maintenance on the database which should result in improved overall performance of sites. We opted for these changes to be split into three different time slots during our periods of low activity in order to reduce the impact on customers.</p> <p>NB. For the most up-to-date details on SiteMaker status, please visit our <a href="http://status.sitemakerlive.com/" hreflang="en">SiteMaker Service Status</a> site and subscribe to the RSS feed for this service.</p> <hr /> <p>UPDATE: 8:39 BST (GMT +1) 21 August 2009</p> <p>Maintenance completed successfully and service fully restored. <br /> <br /></p> <p>UPDATE: 8:32 BST (GMT +1) 24 August 2009</p> <p>Maintenance completed successfully and service fully restored. <br /> <br /></p> <p>UPDATE: 8:15 BST (GMT +1) 26 August 2009</p> <p>Maintenance completed successfully and service fully restored.</p> <hr /> http://www.moonfruitlounge.com/post/2009/08/21/Scheduled-Downtime-planned-07%3A30-BST-GMT-1-21-August-2009#comment-form http://www.moonfruitlounge.com/post/2009/08/21/Scheduled-Downtime-planned-07%3A30-BST-GMT-1-21-August-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1372 RESOLVED: Service Disruption 09:11 BST (GMT +1) 02 Jul 2009 http://www.moonfruitlounge.com/post/2009/07/02/Service-Disruption-09%3A11-BST-GMT-1-02-JuL-2009 urn:md5:7a10f02d70973af75f99210718236add Thu, 02 Jul 2009 09:47:00 +0100 Walt Service status <p>DETAIL: We are experiencing a distributed denial of service (DDoS) attack on the services, which is currently overloading our firewalls.</p> <p>RESPONSE: Our technical team are working hard to get this attack under control and return full service to all customers. We hope to have this completely resolved shortly.</p> <p>UPDATE: We've managed to block what appears to have been three different types of DDoS attack. Combating the three attacks has taken us longer than expected but we appear to have got them under control as of 11:35 but continued to experience minor issues until 12:32 BST (GMT +1).</p> http://www.moonfruitlounge.com/post/2009/07/02/Service-Disruption-09%3A11-BST-GMT-1-02-JuL-2009#comment-form http://www.moonfruitlounge.com/post/2009/07/02/Service-Disruption-09%3A11-BST-GMT-1-02-JuL-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1315 RESOLVED: Service Disruption 06:10 BST (GMT +1) 25 Jun 2009 http://www.moonfruitlounge.com/post/2009/06/25/Service-Disruption-06%3A10-BST-GMT-1-25-Jun-2009 urn:md5:97bacdcea3d492d1c5d6bc8553899e99 Thu, 25 Jun 2009 09:59:00 +0100 Walt Service status <p>DETAIL: We are experiencing a large DDoS on one of our hosted sites, which is currently overloading our firewalls.</p> <p>RESPONSE: Our technical team are working to resolve this issue as quickly as possible and should have it resolved shortly.</p> <p>UPDATE: We are being flooded by an attack on one of our sites with 75% of all incoming traffic from over 2,300 IPs, which is causing all traffic to be congested and page load times to be affected. We are still currently investigating how to dynamically block this content.</p> Scheduled Downtime planned 09:30 BST (GMT +1) 18 May 2009 http://www.moonfruitlounge.com/post/2009/05/15/Scheduled-Downtime-planned-09%3A30-BST-GMT-1-18-May-2009 urn:md5:e78d85166f9cae7e0b9bc3a9f87388a8 Fri, 15 May 2009 11:31:00 +0100 Walt Service status <p><strong>Scheduled Downtime</strong></p> <p>We will be upgrading the database software to the latest version and performing a number of data updates in preparation for the release of the next version of SiteMaker.</p> <p>Thank you for your patience and apologies for any inconvenience caused.</p> <p>REASON: Database upgrade and software updates</p> <p>PLANNED DURATION: Up to 60 minutes.</p> <p>NOTES: Full details on the release and the new features will be provided following the upgrade.</p> <p>NB. For the most up-to-date details on SiteMaker status, please visit our <a href="http://status.sitemakerlive.com/" hreflang="en">SiteMaker Service Status</a> site and subscribe to the RSS feed for this service.</p> http://www.moonfruitlounge.com/post/2009/05/15/Scheduled-Downtime-planned-09%3A30-BST-GMT-1-18-May-2009#comment-form http://www.moonfruitlounge.com/post/2009/05/15/Scheduled-Downtime-planned-09%3A30-BST-GMT-1-18-May-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1232 RESOLVED: Service Disruption 16:40 BST (GMT +1) 07 Apr 2009 http://www.moonfruitlounge.com/post/2009/04/07/Service-Disruption%3A-16%3A40-BST-GMT-1-07-Apr-2009 urn:md5:b1c8fc9c6b5b8dd62094bb229a340053 Tue, 07 Apr 2009 23:48:00 +0100 Walt Service status <p>RESOLVED: Service Disruption - Please see entries below for more details on the disruptions and on essential maintenance be carried out...</p> <hr /> <p><strong>RESOLVED: Service Disruption 14:34 BST (GMT +1) 04 Apr 2009</strong></p> <p>DETAIL: On April 4 a section of our system went read-only. This currently affects 20% of our customers and has an impact on file uploads, new site building, form submissions and forum posts. The other 80% of our customers are completely unaffected.</p> <p>RESPONSE: Our technical team are working to resolve this issue as quickly as possible.</p> <p>UPDATE: We are working on repairing the file system for the affected customers. Current estimate for completion is 2 - 4 hours.</p> <p>UPDATE: 13:40 BST - We will be taking the service off-line for 10 minutes at approximately 13:55 to reconfigure the file system as part of our recovery actions.</p> <p>UPDATE: 14:04 BST - The file system reconfiguration was successful and enabled us to restore full service to half of the affected customers. We expect to restore the service fully to the remaining 10 % of customers (the remaining half affected by this issue) within the next couple of hours.</p> <hr /> <p>RESOLVED: 15:02 BST</p> <p>DETAIL: Full service has now been restored to all customers.</p> <p>RECOVERY: We failed over to our backup file server and performed consistency checks to ensure that the data was not corrupted.</p> <p>FOLLOW UP: We will run background consistency checks on all our data over the next few days to ensure that no further problems occur. Visitors may possibly experience slower page loads while these checks take place. We will also conduct an investigation into the cause of this event and take steps to mitigate a repeat of this incident.</p> <hr /> <hr /> <p><strong>RESOLVED: Service Disruption 16:40 BST (GMT +1) 07 Apr 2009</strong></p> <p>DETAIL: On April 7 a section of our system went read-only. This currently affects only 10% of our customers and has an impact on file uploads, new site building, form submissions and forum posts. The other 90% of our customers are completely unaffected by this event.</p> <p>RESPONSE: Our technical team are working to resolve this issue as quickly as possible and should have it resolved shortly.</p> <p>We apologise for any inconvenience that this causes any of our customers. Updates will follow.</p> <p>UPDATE: 01:45 BST - All affected systems have now been fully restored. We will continue to monitor the situation and post more details later in the morning.</p> <hr /> <hr /> <p><strong>Essential Maintenance: 11:20 BST (GMT +1) 09 Apr 2009</strong></p> <p>REASON: Our investigations into the recent service disruptions indicated that by running consistency checks on the affected data, the file system was stabilised. To be certain that the data is secure we will be running consistency checks on the remaining data in a controlled manor.</p> <p>PLANNED DURATION: 40 minutes per site</p> <p>NOTES: Those communities not already affected by the recent disruptions will go Read-Only for about 40 minutes. This will only affect file uploads, new site building, forum posts and saving of data. All sites will remain on view to visitors and owners. There will be six periods of 40 minutes during which 10% of sites will be affected.</p> <hr /> <p>UPDATE: This has now been completed. The consistency checks have been successful.</p> <hr /> <hr /> http://www.moonfruitlounge.com/post/2009/04/07/Service-Disruption%3A-16%3A40-BST-GMT-1-07-Apr-2009#comment-form http://www.moonfruitlounge.com/post/2009/04/07/Service-Disruption%3A-16%3A40-BST-GMT-1-07-Apr-2009#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1190 Incident Report re: Service Outage 25th November 12:52-13:05 and 15:32-17:12 GMT http://www.moonfruitlounge.com/post/2008/11/26/Incident-Report-re%3A-Service-Outage-25th-November-12%3A52-13%3A05-and-15%3A32-17%3A12-GMT urn:md5:b7cb9bbf7921fdebbf90a324b1e33e2f Wed, 26 Nov 2008 10:40:00 +0000 Walt Service status <p><strong>What Happened</strong></p> <p>Yesterday we suffered two incidents, one starting at 12:52 GMT lasting for 13 minutes resulting in very slow response times and another one starting at 15:32 GMT lasting for 1hr 40 minutes, which included some periods of full loss of service.</p> <p>Service was restored to normal at 17:12 GMT and all sites were up and running with normal response times.</p> <p><strong>Why It Happened</strong></p> <p>We suspect the initial causes of the incident may have been a user uploaded file (or files) that resulted in a denial of service condition by causing our image conversion software use up excessive resources while processing the file. It may have been uploaded multiple times and this repeated action exacerbated the problem. Our image processing software has processed millions of files in the past without such issues. We are taking immediate steps to identify the cause and have already released a patch which we feel should prevent this happening again.</p> <p><strong>More Details</strong></p> <p>For a more detailed picture on this outage please visit our main <a href="http://status.sitemakerlive.com/" hreflang="en">SiteMaker Service Status</a> site where we provide detailed information on: programmed/scheduled downtime and reasons; as well unexpected downtime, causes/diagnosis and solutions. This also provides a feed which you can link to an RSS reader, although you can also do the same with the Moonfruit Lounge and on subjects of your choice.</p> <p>Thanks,</p> <p>Walt</p> http://www.moonfruitlounge.com/post/2008/11/26/Incident-Report-re%3A-Service-Outage-25th-November-12%3A52-13%3A05-and-15%3A32-17%3A12-GMT#comment-form http://www.moonfruitlounge.com/post/2008/11/26/Incident-Report-re%3A-Service-Outage-25th-November-12%3A52-13%3A05-and-15%3A32-17%3A12-GMT#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1037 Incident Report re: Service Outage 29th October 16:42 - 17:37 GMT http://www.moonfruitlounge.com/post/2008/10/31/Incident-Report-re%3A-Service-Outage%3A-29th-October-16%3A42-17%3A37-GMT urn:md5:1cf449d47691e55ec58815d00d693e81 Fri, 31 Oct 2008 18:35:00 +0000 Walt Service status <p>At 16:42 on the 29th October we were alerted to the fact our main file server appeared to be offline. This meant that customers' uploaded files were unavailable. Once it was confirmed that the file server was not accessible, we immediately instigated our recovery plan and brought the standby file server online. We then attached the network storage to it, thus restoring access to customer uploaded files. Services were brought back up by 17:32.</p> <p>On further investigation, we noticed that the server detected that it had entered an inconsistent state and as a result, halted itself as a safety measure. This an extremely rare occurrence and is the first time we have encountered this behaviour since SiteMaker began. However, this is the reason we have redundant hardware enabling us to quickly fail over to our standby system and recover services rapidly.</p> <p>As a precaution, we have updated the operating systems of the file servers and, although we have no reason to suspect any damage, we are running a full diagnostics on the hardware.</p> http://www.moonfruitlounge.com/post/2008/10/31/Incident-Report-re%3A-Service-Outage%3A-29th-October-16%3A42-17%3A37-GMT#comment-form http://www.moonfruitlounge.com/post/2008/10/31/Incident-Report-re%3A-Service-Outage%3A-29th-October-16%3A42-17%3A37-GMT#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1002 Incident Report re: Service Outage 23rd October 19:10 - 21:30 BST http://www.moonfruitlounge.com/post/2008/10/31/Incedent-Report-re-Service-Outage%3A-23rd-October-19%3A10-21%3A30-BST urn:md5:54f80da36e56e55fd4403c0d4011fe47 Fri, 31 Oct 2008 18:18:00 +0000 Walt Service status <p>We host our servers at a secure data centre managed by Telstra, a reputable colocation provider. As part of their colocation service, they provide dual uninterrupted power supplies. Each server on site thus has two independent power feeds to ensure that in the event of either power supply failing, the servers will continue running without interruption.</p> <p>At 19:10 (18:10 GMT) on 23rd October, Telstra engineers started planned maintenance work on one of their UPS (uninterruptable power supplies). During this work, the servers were to be moved temporarily onto mains power. This should have happened transparently but the additional load on the mains tripped a circuit breaker. At approximately 19:47 (18:47 GMT) the breaker was reset and power was restored; the servers were then brought back online one by one. The servers have been subsequently restored to redundant UPS and will continue to be fully protected in the future.</p> <p>A full report on the outage and the remedial actions from Telstra is attached below.</p> <p>In the hours that followed the file server noticed certain problems with the filesystems. As each problem was detected, to ensure data integrity, the affected file system was remounted in 'read-only' mode. This ultimately resulted in 50% of the file systems going into a read-only state, meaning for half of our sites file uploads and other 'write' activities (such as form submissions) failed. This could not be rectified until the following morning.</p> <p>In order to do a thorough filesystem consistency check, each affected filesystem needed to be taken offline. We therefore took each individual affected filesystem offline one-at-a-time to perform these checks in order to cause minimum disruption. This meant that affected customers may have experienced 'Internal Server Errors' for a 20 minute period while their file system was checked.</p> <p>We have taken multiple measures to ensure that our services remain available in the event of power failures. The failure of Telstra to guarantee a permanent supply was an unforeseen and extraordinary event. Again, we apologise for any inconvenience you may have experienced.</p> <hr /> <blockquote><p> <strong>Telstra Outage Description</strong></p> <p> At 18:45:00 on 23rd October, a maintenance window was used to make enhancements to the current UPS equipment on the 3rd floor of Telstra’s London Hosting Center. This work resulted in the total loss of power on the 3rd floor co-location facility.</p> <p> A thorough Method Statement and Risk assessment had been carried out to make sure all switching was correct and the load would be smoothly transferred through the static switch to raw mains for the duration of the maintenance. Telstra are unable to access this particular breaker and so could not predict that it would fail under a load well below its capacity. The Thermal and Ultrasonic survey of the electrical infrastructure carried out recently showed normal operation.</p> <p> The activity being undertaken in the maintenance window was to temporarily move the load from one leg to another. This would allow Telstra to undertake works to enhance the current UPS equipment. The load in the 3rd floor co-location facility is currently under 1250 amps. This load is fed by two feeders rated at 1250 amps each. The schematic below shows the basic configuration of the power layout.</p> <p> In order to conduct the maintenance on the UPS the load needed to be swapped on one leg thus allowing the Static Bypass Switch to throw and in turn allowing Telstra to move the load back but with the UPS isolated ready for safe working.</p> <p> When the load was moved a Breaker rated at 1250amps failed. This is the root cause of the issue, as already mentioned to total load was under 1250 amps and should have been easily supported by this breaker.</p> <p> The power outage started at 18:45:10 and power was restored at 18:47:05. As a direct result all services within the 3rd floor co-location facility powered down.</p> <p> As soon as the breaker tripped, we immediately isolated the UPS’s and static switch, reset the breaker and put level 3 into wrap round bypass supply. This keeps the UPS and static switch out of the circuit and supplies the load on raw mains.</p> <p> As the load is well below the capacity of the breaker, it was assumed the breaker had tripped early. At the time Telstra had no way of safely checking this breaker without switching the power off. Telstra engineers took the decision to remain on raw mains until further investigations and preparations could be made.</p> <p> On Friday 24th October, Telstra arranged for a Metropolitan Electrical Tegg service team to attend site with all the necessary H&amp;S equipment for them to remove the cover and check the breaker whilst it is live. Telstra will be better placed to make a decision on the best way forward once these findings have been collected and assessed.</p> <p></p> <p> <strong> Telstra Remedial Action</strong></p> <p> The most appropriate actions will be determined as soon as Telstra are given the findings of the specialist report.</p> <p> Clearly, Telstra needs to migrate back on to UPS supply as soon as possible. Based on the outcome of the analysis of the breaker Telstra will be looking to undertake an emergency planned outage at 00:00 Sunday morning.</p> <p> This document provides a high level report of the events, highlights areas of concern and lays forth remedial steps, which are being put in place to prevent the same type of episode arising in the future. Telstra would like to offer its sincere apologies for the unforeseen disruption caused to our customers. All teams involved are working together to prevent this situation arising again.</p> <p></p> <p></p></blockquote> http://www.moonfruitlounge.com/post/2008/10/31/Incedent-Report-re-Service-Outage%3A-23rd-October-19%3A10-21%3A30-BST#comment-form http://www.moonfruitlounge.com/post/2008/10/31/Incedent-Report-re-Service-Outage%3A-23rd-October-19%3A10-21%3A30-BST#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/1000 RESOLVED: Service Outage: 29th October 16:42 - 17:37 GMT http://www.moonfruitlounge.com/post/2008/10/29/Outage%3A-29th-October-16%3A42-GMT urn:md5:f0de3f1e8f2e0b59b07b91f687289be2 Wed, 29 Oct 2008 17:17:00 +0000 Walt Service status <p>Service Outage: 29th October 16:42 GMT</p> <p>We have had a very unexpected failure with the Disc Array. We are currently trying to bring up the Standby system to restore services to all customers.</p> <p>You will be notified immediately of any changes to this condition and the expected ETA when we have one.</p> <p>We appreciate this is not great timing after our last outage but this is completely unrelated and is being working on by our own technical team who are doing everything in their power to bring up the standby service as quickly as possible.</p> <p>Thanks</p> <p>================</p> <p><strong>UPDATE: 17:37 GMT</strong></p> <p>All services have been restored to normal operation. We will obviously investigate the cause of the incident and once we have this and get an explanation from our Data Centre over the cause of last weeks downtime we will inform all customers. We will also notify you of steps to avoid these situations occurring again.</p> <p>If you encounter any difficulties with viewing your site we recommend clearing your cache (temporary Internet files) and reloading your browser. This will fix problems with pages loading and file uploads.</p> <p>Thanks again for your patience.</p> http://www.moonfruitlounge.com/post/2008/10/29/Outage%3A-29th-October-16%3A42-GMT#comment-form http://www.moonfruitlounge.com/post/2008/10/29/Outage%3A-29th-October-16%3A42-GMT#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/988 RESOLVED: Service Outage: 23rd October 19:10 - 21:30 BST (GMT +1) http://www.moonfruitlounge.com/post/2008/10/23/Outage%3A-23th-October-19%3A10-BST-GMT-1-2130-BST urn:md5:3dbbab059dd14ec6a62e51621e803447 Thu, 23 Oct 2008 21:43:00 +0100 Joe Service status <p>Service Outage: 19:10 - 21:30 BST</p> <p>A power failure at our data centre resulted in a complete loss of service at 19.10. This caused all of our servers to shut down unexpectedly and then restart. We then had to run full diagnostic checks to make sure they were running correctly before bringing the service back online at 21.30 BST.</p> <p>The service has now been fully restored, with no loss of data or other negative side effects. Latency may be a little bit higher than normal over the next few hours while caching is restored on the live system.</p> <p>We apologies for this unexpected outage, and will be following up with our network provider to discover the cause, and why secondly power supplies were not called into use.</p> <p>Thank you for your patience during this outage.</p> <p>==============</p> <p>UDPATE: 10:12 BST (GMT +1)</p> <p>Because the power failure caused our machines to shut down unexpectedly, our disk array which contains uploaded files restarted in 'read only' mode as a precaution. This means we have to run full disk integrity checks before we can go back to 'write' mode. Only 50% of sites have been affected by this, and we will repair them 10% at a time.</p> <p>In any event, all content on these disks is backed up both on and off site, so in event of any errors in the disk, we can fully restore the files.</p> <p>We will let you know when this is complete.</p> <p>==============</p> <p>UDPATE: 14:00 BST (GMT +1)</p> <p>The service is now fully restored.</p> <p>This means that all websites should now:</p> <ul> <li>Be fully accessible</li> <li>No longer display a &quot;500 Internal Server Error&quot; message</li> <li>Permit files to be uploaded normally</li> <li>Enable forms to be submitted without error</li> <li>Allow pages to be saved correctly</li> </ul> <p>Now that the service has been fully restored we will begin our investigation into the causes of the outage and how we can ensure that this is not repeated. We will ensure that you are kept informed of our findings as well as the solutions.</p> <p>If you are still encountering problems related to this outage (see above) you may wish to:</p> <ol> <li>Flush or clear your cache/temporary Internet files within your web browser. This will ensure any error pages saved in your browser will be cleared and your website reloaded from fresh.</li> <li><a href="http://www.moonfruit.com/support.html">Contact support</a> using the query form and give us as much detail as possible about the error e.g. any error messages, when it occurs, what pages it occurs on etc.</li> </ol> <p>Thanks again for your patience, understanding and support with this issue. It is very much appreciated.</p> <p>==============</p> http://www.moonfruitlounge.com/post/2008/10/23/Outage%3A-23th-October-19%3A10-BST-GMT-1-2130-BST#comment-form http://www.moonfruitlounge.com/post/2008/10/23/Outage%3A-23th-October-19%3A10-BST-GMT-1-2130-BST#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/987 Mail maintenance on October 15th, 2008 at 1h00 AM BST (Midnight GMT) http://www.moonfruitlounge.com/post/2008/10/14/Mail-maintenance-on-October-15th-2008-at-1h00-AM-BST-Midnight-UTC urn:md5:8475e6988857b61f807bcb0d1244afde Tue, 14 Oct 2008 16:37:00 +0100 Walt Service status <p>Our domain name partner will be running a major maintenance operation on the Mail platform. This operation will lead to greater stability of the service in the event of an excessive load.</p> <p>The service will be unavailable during the maintenance period, which will be on Wednesday (October 15th, 2008) between 8:00 PM BST (19:00 GMT) and 1:00 AM BST (23:59 GMT). Incoming mails will be received but you will not be able to access them before we have completed the maintenance work. POP, IMAP, Webmail will all be unavailable.</p> http://www.moonfruitlounge.com/post/2008/10/14/Mail-maintenance-on-October-15th-2008-at-1h00-AM-BST-Midnight-UTC#comment-form http://www.moonfruitlounge.com/post/2008/10/14/Mail-maintenance-on-October-15th-2008-at-1h00-AM-BST-Midnight-UTC#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/970 Database Update 09/10/2008 - 10:30 BST (GMT +1) - Rescheduled to 13/10/08 - 15:00 BST (GMT +1) http://www.moonfruitlounge.com/post/2008/10/07/Database-Update-09/10/2008-10%3A30-BST-GMT-1 urn:md5:81dc33bdd6e305cb4a937d7ef1eef5a4 Tue, 07 Oct 2008 16:04:00 +0100 Josh Service status <p>A number of database changes need to be run to enable new features in SiteMaker. Unfortunately this means a service outage for no more than 15 minutes on Thursday morning, 9 October 2008 10:30 BST (GMT+1).</p> <p>During this time, visitors to SiteMaker sites will be presented with a holding page informing them that the site should be back soon.</p> <p>We are sorry for any inconvenience this may cause.</p> <p>Regards,</p> <p>Hiren Joshi</p> <p>Systems Manager</p> <p>Update:</p> <p>This has now been rescheduled to 13/10/08 15:00 BST (GMT +1), we thank you for your understanding.</p> http://www.moonfruitlounge.com/post/2008/10/07/Database-Update-09/10/2008-10%3A30-BST-GMT-1#comment-form http://www.moonfruitlounge.com/post/2008/10/07/Database-Update-09/10/2008-10%3A30-BST-GMT-1#comment-form http://www.moonfruitlounge.com/feed/rss2/comments/964