RECOVERY: We are still investigating the causes, but it appears that the machine resources on each of the servers in the webserver layer were exhausted one by one. The lack of free resources meant that our system administrators were locked out or severely impaired in trying to diagnose the problem remotely. We did manage to restore service to half of our webservers after the first incident and dispatched a member of the system administrator team to our dedicated data center to diagnose the problems on site. The remaining webservers were rebooted and brought back up again shortly after 14:15 GMT, but as the engineer left the datacentre the service started deteriorating again and the engineer returned. In order to hurry the return to normal service, the affected webservers were power cycled, but this course of action unfortunately resulted in connections being left open on our database. This meant that once the web servers were back up again they were unable to connect as the maximum number of database connections were reached. In order to clear the database connections, it was necessary to restart the database, but after the database was restarted it seems that the query optimiser started returning inefficient query plans resulting in very slow response times. Our database administrator was brought in and after several attempts finally cleared the inefficient query plans from the cache and normal service returned.

FOLLOW UP: We suspect the initial causes of the incident may have been a user uploaded file (or files) that resulted in a denial of service condition by causing our image conversion software to consume excessive resources while processing. The file may have been uploaded multiple times and this repeated action exacerbated the problem. Our image procesing software has processed millions of files in the past without such issues; this an extraordinary occurrence and we are taking immediate steps to identify the cause. We have already released a patch which we hope should prevent this happening again in the future.