SiteMaker Server Status

To content | To menu | To search

Service status

Entries feed

Wednesday, July 13 2011

COMPLETED: SiteMaker Update 10:30 BST (GMT+1) 14 Jul 2011

We will be updating the core software to version R5.3.6 at 10:30 BST (GMT+1), 14 July 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, we advise customers to hit reload on their browsers to ensure they have the latest version.

Wednesday, June 22 2011

COMPLETED: SiteMaker Update 10:00 BST (GMT+1) 23 Jun 2011

We will be updating the core software to version R5.3.5 at 10:00 BST (GMT+1), 23 June 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, we advise customers to hit reload on their browsers to ensure they have the latest version.

Wednesday, June 1 2011

RESOLVED: Service Outage 11:19 BST (GMT+1) 1 June 2011

DETAIL: At 11:19 BST (GMT+1), we are experiencing a spike in incoming visitor traffic resulting in a high load on our dynamic serving layer during a period of server maintenance.

RESPONSE 11:30: We've restored the full service of the dynamic serving layer and the resulting spike has been dealt with. Apologies for the inconvenience.

Sunday, May 15 2011

RESOLVED: Service Degradation 08:03 BST (GMT+1) 15 May 2011

DETAIL: At 08:03 BST (GMT+1), we saw a spike in traffic resulting in a high load on our dynamic serving layer.

RESPONSE 08:20: Service has returned to normal.

UPDATE 15:13 BST 16 May: On further investigation, the platform outrage seems to be related to a known existing bug that can cause a cascade of blocking operations on the backend layer. The bug has been reprioritised and should be resolved shortly, preventing such incidents from happening again in the future.

Thursday, April 28 2011

RESOLVED: Service Outage 21:00 BST 28 Apr 2011

DETAIL: At 21:00 BST, we started to see a service degradation across the platform, with large numbers of connection time-outs occurring to our back-end dynamic layer. This resulted in partial site loads.

RESPONSE 23:10: Service has returned to normal. We are investigating our logs to determine the cause of the problem to prevent this happening again. We apologise for any inconvenience.

Monday, April 18 2011

COMPLETED: SiteMaker Update 10:30 BST (GMT+1) 19 Apr 2011

We will be updating the core software to version R5.3.0 at 10:30 BST (GMT+1), 19th April 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, we advise customers to hit reload on their browsers to ensure they have the latest version.

UPDATE 10:40: There was a slight problem with the release, which is currently being addressed. Once we get a patch, we plan to continue with the release once we've verified the fix. We hope to rollout this patch within the next hour.

UPDATE 12:10: The patch was successful. The update has been rolled out. Thanks for your patience and we apologies for any inconvenience.

Wednesday, April 13 2011

RESOLVED: Service Outage 11:10 BST 13 Apr 2011

DETAIL: At 11:00 BST, to apply a configuration change, the main gateway router rebooted and failed over to the secondary. Usually this happens transparently and does not affect the running of the service. On this occasion however, the secondary router's configuration caused all traffic to go to a single point causing an outage. The configuration on the secondary router was manually updated and all services were restored within 10 minutes.

RESPONSE: We are currently looking into what may have caused the configuration mismatch as the router fail-over has happened transparently in the past without the need for manual intervention.

Saturday, March 19 2011

RESOLVED: Service Outage 19:46 GMT 18 Mar 2011

DETAIL: Last night one of our senior technical staff were doing a routine operation to improve capacity to one of our file servers. This can be done without downtime which benefits customers, but during this operation there was a lockup which left the resize process "waiting". This lead to some web requests (website access by customers or visitors) to wait for disk access along with the already waiting resize process. Once the queue of waiting web requests got to capacity the platform went down. Rebooting the file server and letting the fail over mechanism kick in solved the issue. The period of outage lasted from 19:10 until 19:46.

RESPONSE: This issue was due to human an error by SiteMaker. While we do sincerely apologise for this incident we will also be updating our processes to make sure this does not happen again.

Wednesday, March 16 2011

RESOLVED: Service Outage 03:47 GMT 16 Mar 2011

DETAIL: At 3am (03:00 - 03:47 GMT), half of our application servers (which are used to help render the layout of the websites) went down due to a power failure at the data centre. Normally an incident like this would not be a problem for our service delivery, however on this occasion the database did not release the dead connections which prevented new connections being made.

UPDATE: All services are now back up and running.

RESPONSE: We are currently working with our service provider to establish the reason why back up power did not kick in as expected. We are also looking into why the database connection issue occurred as this was contrary to the normal behaviour of the database application.

Thursday, March 3 2011

COMPLETED: SiteMaker Update 12:30 GMT 3 Mar 2011

We will be updating the core software for SiteMaker to version R5.2.0 at 12:30 GMT, 3 Mar 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, customers should reload their sites to ensure they have the latest version.

Friday, February 4 2011

RESOLVED: Service Degradation 11:18 GMT 04 Feb 2011

DETAIL: We released a patch this morning to fix some release related bugs, but since the update the database has been running slowly. We've rolled back the update but the database is still struggling.

RESPONSE: We are currently trying to diagnose the cause.

UPDATE 18:01: We are still looking at the problem. We have now involved our database vendor to help us diagnose the problem.

UPDATE 21:18: We have reconfigured our database to provide more processing power with the help of the database vendor technicians, and this has brought our request response times to normal. We will perform further analysis over the next few days to investigate the cause of the additional load.

UPDATE 19:00 07 Feb: We have identified some optimisations which target the cuase of the additional load, a side effect of the new features brought in by the HTML generation architecture. We have now successfully rolled out the optimisations and the loads have returned to normal.

Wednesday, February 2 2011

RESOLVED: Service Outage 17:11 GMT 02 Feb 2011

DETAIL: We have lost outbound connectivity from our datacentre, This is unrelated to the software release this morning.

RESPONSE: We are currently working with our network providers to find out the cause.

UPDATE 17:18: Connectivity seems to have been restored. We are still trying to determine the cause.

UPDATE 18:34: We are still chasing our network provider for an explanation.

UPDATE 11:56 05 Feb: Our network provider came back with the following response:

Telstra can advise that the outage of your service may have been
affected indirectly by an emergency network change. Our 3rd line network
support team were required to make a small amendment to a network IP
address in one of our core devices and your service may have been
affected during a short period whilst the change converged through our
network. The change should have been non service affecting. However it
appears that the times you reported, and the fact that your cct is build
across the same core device we believe that the cct may have taken
itself down whilst the routing re-established itself.

Tuesday, February 1 2011

COMPLETED: SiteMaker Update 10:00 GMT 2 Feb 2011

We will be updating the core software for SiteMaker to version R5.1.0 at 10:00 GMT, 2 Feb 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, customers should reload their sites to ensure they have the latest version.

Thursday, January 27 2011

ROLLED BACK 13:10 GMT: SiteMaker Update 09:30 GMT 27 Jan 2011

We will be updating the core software for SiteMaker to version R5.1.0 at 09:30 GMT, 27 Jan 2011. There may be a momentary service disruption, but this should be unnoticeable. We recommend users save any changes before this time. After this time, customers should reload their sites to ensure they have the latest version.

UPDATE 09:06: We have postponed the update for 30 minutes. It will now go out at 10:00 GMT.

UPDATE 10:00: We are still having some configuration issues. It will now go out at 10:30 GMT

UPDATE 10:20: We believe we've identified the cause of our last minute issue. We have to write, test and apply a patch. We estimate now a go live time of 11:30 GMT. Apologies for the inconvenience.

UPDATE 11:06: The patch is taking longer than expected to complete. We've pushed the go live time back to 12:30 GMT.

UPDATE 12:30: We have gone live with the update.

UPDATE 13:10: The new code is still causing load issues and we have been forced to roll back. We will advise you of the new release date once the issues have been diagnosed and fixed. Again, sorry for the inconvenience and thank you for your patience.

Monday, January 17 2011

RESOLVED: Service Outage 15:52 GMT 17 Jan 2011

DETAIL: Following the release of the latest version of the SiteMaker Platform (R5.1.0), we suffered severe backend performance issues which has created a service outage.

RESPONSE: We are trying to identify the cause of the high load in order to mitigate the effects and restore service.

UPDATE 16:06: Despite various attempts to restore service, we have decided to rollback the release to R5.0.4

UPDATE 16:23: Service has been restored. We have collected the logs and we will work on identifying the causes of the high load so that we can attempt the release again next week.

Saturday, January 15 2011

RESOLVED: Service Outage 18:21 GMT 15 Jan 2011

DETAIL: There has been a severe degradation in service in the new caching layer that has locked up the entire platform down

RESPONSE: We are reverting to the old caching layer to restore service

UPDATE 18:28: Service has been restored

UPDATE 19:30: The indices on the caching layer had been incorrectly configured and the caching layer had slowed down gradually until finally locking up at 18:28 today. The index configuration has been corrected and the issue should not happen again.

RESOLVED: Service Outage 03:46 GMT 15 Jan 2011

DETAIL: There has been a node failure in our new, higher performance caching layer that has brought the entire platform down

RESPONSE: We are reverting to the old caching layer to restore service

UPDATE 04:12: Service has been restored

UPDATE 05:50: The new caching layer has been reinstated.

Tuesday, March 2 2010

COMPLETED: Scheduled Downtime 08:30-09:00 GMT 09 Mar 2009

REASON: To commission new database replication technology

PLANNED DURATION: 30 minutes

Siteleaders will not be able to login to their sites and visitors to sites will see a unbranded page letting them know that essential maintenance is underway and that they should return later.

UPDATE 08:48: Maintenance completed successfully

Friday, February 26 2010

RESOLVED: Service Outage 10:39 GMT 26 Feb 2010

DETAIL: A routine disk resize has hung and we need to restart the filers

RESPONSE: We are currently rebooting the necessary servers. Service should be restored shortly

UPDATE 10:48: Service has been restored. Sorry for any inconvenience caused.

UPDATE 10:52: Another reboot has been required.

UPDATE 11:00: Service has been restored.

Tuesday, January 5 2010

RESOLVED: Service Outage 20:57 GMT 05 Jan 2010

DETAIL: Our database server has required an emergency restart

RESPONSE: We are currently restarting all dependent services

UPDATED 21:47: Services have been resolved

UPDATED 10:17 GMT 06 Jan 2010: Our apologies for this very unexpected outage. All website data is safe and all services were returned to normal in 50 minutes.

We were due to upgrade the software licences on our database layer which should have happened automatically. Unfortunately, there was a miscommunication with our database vendor that caused our database layer to shut down. After emergency discussion with our vendor the issue was resolved.

This is a unique event and will not happen again. We can only apologise for the inconvenience caused by this downtime.

- page 2 of 4 -