Turing Data Study Groups: KE Award Winners

Guest blog from The Alan Turing Institute

Turing Data Study Groups (DSGs) are intensive collaborative hackathon-type-events hosted by the Alan Turing Institute, the UK National Institute for Data Science and Artificial Intelligence. The concept for DSGs was developed in 2016 by Dr Sebastian Vollmer along with Alan Turing Institute colleagues and it builds on the successful 1968 Mathematics Study Group concept. 

 

While the Turing’s DSGs have evolved, at their heart is the continued aim to bring together organisations from industry, government, and the third sector with talented multi-disciplinary researchers. Organisations act as DSG ‘Challenge Owners’ (COs) who put forward real-world problems to be tackled by small groups of gifted and carefully selected researchers. Researchers brainstorm and engineer and refine data science solutions, before presenting their work back to the COs.


DSGs are one of the flagship programmes the Turing uses to supports its aim of developing the leaders of the future. They offer training opportunities that are traditionally harder to come by. Participants are recruited from a broad spectrum of data science backgrounds and encouraged to join challenges which might be outside of their normal domain. Working in teams of around 10, there are lots of different experiences and perspectives to learn from. Included into this mix are the CO and DSG Principal Investigator (DSG PI) whose knowledge and contrasting perspectives of the challenge adds to create truly multidisciplinary teams. It is this intersection of industry and academia to solve real-world problems that form the basis of a unique learning environment.


One of the core features that make DSG special is how we prepare and design the challenge. The challenge is a framework for the participants to explore the data in any which way they choose. It should be broad enough to allow for multiple directions of exploration with as many different types of DS approaches, but constrained so that it guides participants towards answering the questions the CO is ultimately interested in.  There is no guarantee of a solution, just a report on what the data tells us, limitations and what could be done going forward to move closer to the answers the CO wants. It is a fine balancing act. 


In preparation for this, there is a process of transformation from a very business focused problem statement to an academic challenge. This process can be quite difficult for COs to realise – their primary motivation for solving the challenge is usually quite different to that of the researchers. But the definition of the challenge itself is only part of this transition. To help with this we have developed the DSG PI role, a training opportunity for post-doc level researchers.


So if the CO brings the business problem and the domain knowledge, it is the DSG PI that will convert this into an academic challenge for the DSG. The CO will have final say on the domain focus and direction of the overall challenge; the DSG PI will have final say on the structure of the challenge and the framing. They will need to compromise, but this is part of the learning process for both parties. 
Both COs and DSG PIs are guided and supported by Turing throughout the engagement.


The DSG Challenge Owner pack, guides partners in what they need to consider and evaluate about their own organisation and data before considering to submit a challenge. These include items such as how much time and resource they can commit to shaping the challenge, details about the data itself – if they can share it, how sensitive it is and what kind of impact they envisage – including impact to the wider community, not just themselves.


For the DSG PI we are in the process of developing and refining micro courses to help them develop the necessary skills for working with industry. Communicating academic ideas and constraints to non-academics can mitigate motivation mismatch. Using the Turing Way, a handbook to reproducible, ethical and collaborative data science we can guide DSG PIs to best practices for designing the projects. And then ethics.


It is expected that the CO is fully engaged throughout the process of scoping and preparation. Together with the DSG PI they begin to understand the nuances and differences of putting together their problem into an academic context, but also learn what it is like to work with academia. This leads to a more optimal research challenge that will be exciting and interesting for the event participants. 


Participants are usually PhD level, and we accept a broad range of skill levels into the event. At the beginning of the event they will choose which challenge is of most interest and then they will focus on that challenge for the rest of the event. Groups will include facilitators (picked from the pool of participants) to help the teams hit the ground running. The DSG PI will also be on hand to provide academic support, as well as the CO, who are also invited to get involved as participants as well as domain experts. It is this mixture of data science backgrounds, skill levels and with no expectation to 100% solve the challenge that we create a conducive environment for creativity, learning and opportunity for engineered serendipity.


“I was amazed at the amount of ground covered by the team in such a short space of time. They immediately grasped the nature of the challenge and all the specific difficulties around finding a solution. They wasted no time in sharing out tasks and applying a number of methods to the datasets.” - Louise Brown, Challenge Owner, Project Lead for STAMPEDE, Challenge: Identifying poor performance at recruitment sites participating in clinical trial research, Dec 2021.


This freedom then lays the foundation for interesting insights and future, sometimes novel, research directions. During the event, everyone learns something from one another, be that new techniques, new research ideas, or even what it is like working in a truly multidisciplinary team under time pressure. For the DSG these generate the most optimal results. 


DSG is still under active development. There is a lot more that we can do, and more organisations, both academic and industry to be able to benefit from this model. We are adding more training opportunities for participants and DSG PIs to better prepare themselves for the event, to optimise their experience and overall impact. We will be adding more opportunities for researchers to get involved in the preparation of the event itself to support the curation of challenges. We are creating more opportunities to benefit the researchers and the COs to either better prepare their challenges, or to continue the work after an event. When the time comes, we will look to incorporate many of the new lessons we learned moving to online back into the in-person event. And we want others to be able to host DSGs so they too can benefit and support their communities and partnerships.


The Turing can support in the implementation of this kind of event. We are particularly proud of some of the tools and platforms we have been developing to drive this growth – from the training modules to the IT systems, to supporting in challenge, participant and DSG PI selection; all are things we are looking to share with the wider to community to lower the barriers for organising these collaborative data explorations. And we hope with further adoption of this model, with scope to expand opportunities across the board the DSG will establish itself as an exciting way for research and industry to come together, create novel solutions and learn from each other. 


In five years, there have been 15 DSG events delivered (including 2 externally), over 70 challenges from across industry, third sector and government partners, over 600 participants (30% female). The model is being piloted at some of our university partners so we can develop models and processes to help others to run their own data study groups locally, using the open-source model developed by, and practical experience of the Alan Turing Institute team (e.g. Turing Data Study Group at Lida). In the future we will open this up further so other institutions can also benefit from running such a programme.


If you are interested in learning more or are thinking that you might like to organise your own Data Study group please do not hesitate to get in contact with us at datastudygroup@turing.ac.uk.

 

Turing Data Study Groups were awarded the Academic Engagement of the Year Award at the 2021 KE Awards Ceremony. Watch the finalist videos here.