Case Study: Office for National Statistics (Cloud 1)
The Office for National Statistics (ONS) is the UK’s largest independent producer of official statistics and is the recognised national statistical institute for the UK. It also conducts the census in England and Wales.
Every 10 years, for over 200 years, every household in England and Wales has been required to respond to the census. Traditionally this is undertaken on a given day through householders providing paper or electronic returns. But what if there was another way?
Improvements in technology and in Government data sources offer opportunities to develop an alternative census method that reuses existing data already held within Government.
As part of this programme, SCISYS was engaged by the ONS to help them understand the potential for undertaking cloud-based matching and statistical analysis on large distributed Government databases.
The approach involved two stages:
+ A Discovery stage project to research the needs of the users and explore suitable technical options. This involved SCISYS experts conducting user research and investigation into the suitable patterns for performing the required matching between large administrative data sources over distributed infrastructure.
+ An Alpha stage project to prototype three of the scenarios identified during the Discovery stage in a cloud-based environment.
The second stage involved the creation of a cloud-based sand pit. Within this ONS could run multiple models to establish the characteristics of performing large distributed matching and modelling processes. The solution was deployed on Amazon Web Services (AWS). SCISYS set up a solution whereby virtual machines, networking and storage could be dynamically provisioned directly by ONS for the duration of a modelling session. Once the session was complete, the resulting data results are stored off to permanent storage and the compute resources dynamically de-provisioned. ONS has the ability to bring new data and models and, other than minimal general maintenance of the environments by SCISYS, they are entirely able to self-serve.
+ Significant cost savings compared to provisioning this service on premise
+ Ability to model multiple scenarios in parallel
+ Ability to mimic likely network issues between distributed datasets
+ Self-service approach streamlined overall project delivery