From Imitation to Innovation – Synthetic Data in Policing
- Phil Tomlinson
- Oct 19
- 3 min read
Updated: Oct 20
We know access to law enforcement data has always been a challenge. But what if those barriers could be removed? What skills, capabilities and insights could be unlocked across policing?
Having access to realistic data could transform training, accelerate the development of new investigative tools and enable cutting edge tactics to be tested within safe environments.
This is no longer wishful thinking; Principle One has developed a Synthetic Data solution capable of making this a reality.
So how have we done this?
We've brought together a multi-disciplinary team of Police SMEs, software developers and data scientists, who have developed a synthetic data environment capable of mirroring the complexity, scale, and nuance of the real world and the reality of today’s digital investigations. This allows us to model the digital footprints created as individuals travel to work, text their friends, post comments on social media, spend their money and commit crime.
Within a virtual model of the UK, the team have introduced a range of ‘scenario agents’ to engage in various forms of criminality, enabling us to play out a wide range of different digital investigations.

Across each scenario, these agents generate data that would be of analytical value to the police, including Communication Data, ANPR, financial data, telematics, Public Wi-Fi and even CCTV images. These scenarios include Violence against Women and Girls, considering offences that take place within the physical and virtual world, County Lines drug trafficking, Organised Immigration Crime and even Terrorist attack planning.
Data is generated not just for criminal activity, but to reflect the wider digital footprint left by each Subject of Interest’s wider day to day activities. Finally, these ‘needles’ of intelligence and evidence are seeded into a ‘haystack’ of data generated by the wider population, ready to test the ability of investigators, analysts and new technologies to find them.

Our Synthetic Data capability ensures the creation of diverse, interlinked datasets that mirror the relationships found in operational investigations. This consistency across multiple data sources delivers far greater value than developing Synthetic Data on a case-by-case basis - offering a richer, more realistic foundation for analysis. This also accelerates innovation through reducing the need to navigate complex and often onerous Data Protection Impact Assessments (DPIAs) at the early stages of an engagement. It creates an environment for experimentation and contributes to a deeper knowledge of emerging datasets.
We work with our customers to ensure the scenarios meet specific requirements in terms of investigative scenarios and data types. This could be because they have a particular crime type they wish to explore, a new tactic they wish to test, or perhaps a new tool that is being developed. The Synthetic Data team will then develop the scenario, align the necessary datasets to it and present the scenario and associated datasets back to the customer to ensure they meet requirements.

The method we use to generate synthetic data is scalable, efficient and automated and takes advantage of data science and emerging technologies around artificial intelligence. This has enabled us to accelerate and deepen the range of data sources we can develop and we have used a range of publicly available Generative AI products to complement and enhance our original capability. This allows us to rapidly generate and include ultra realistic intelligence reports, witness statements, social media posts, audio and CCTV imagery in the data generated. These products include ChatGPT4, Google Veo 3.1, Sora 2 Pro, Flux Kontext Max, and many others.
This week we will be showcasing the capability at the NPCC Innovation and Digital Summit in Liverpool with a demo scenario based around a kidnapping and manhunt and the digital breadcrumbs created. The demo highlights how innovative use of synthetic data and emerging AI technologies can be used to enhance police capabilities and showcase the art of the possible when synthetic data is available.
It focuses on a Tech Billionaire who is kidnapped in London on the way back from a state dinner in Windsor Castle. As the Senior Investigating Officer, where would you start?

Philip Tomlinson, Digital Intelligence Lead at Principle One, has been overseeing the development of the capability, which has already enabled multiple customer projects to access Synthetic Data to meet a wide range of different requirements.
He said, “For modern policing, synthetic data represents a new opportunity, one that enables testing of new investigative strategies, training of investigators and technology innovation without exposing real people or sensitive personal information. It allows the police to prepare for emerging threats, refine their digital investigative techniques and tools, and build trust in technology, all while protecting privacy and ensuring high standards in data ethics. Synthetic Data enables innovation through imitation.”
If you would like to know more about Principle One’s Synthetic Data capability please contact us directly policingapps@principleone.co.uk

Comments