Exercises on Real World Data Science will provide users with the opportunity to put knowledge, skills and new learnings into practice, helping them to challenge and refine problem-solving approaches while strengthening the analytical mindset.
We will achieve this by supporting contributors to design tasks that replicate real-world data scientific processes.
1 Structure
The structure of each published exercise will vary based on the nature of the task(s) being set by contributors and the outcomes they have in mind. But, in general, exercises must do more than simply ask users to, e.g., download dataset – analyse – report.
- Set the scene
- Provide a believable, realistic scenario for the exercise. Establish the “client challenge” within that context, the resources available to the data scientist, and the outputs expected.
- Give users space to map the problem themselves
- Have users translate the “client challenge” into a data analytic question that can be answered using the resources available.
- Make data exploration, cleaning and tidying part of the process
- Data exploration and preparation are an important part of most – if not all – data science projects, so let users loose on messy datasets so they can figure out for themselves (a) what they’re working with and (b) what analytical approach makes sense given the features of the data and the problem at hand.
- Integrate ethics
- Prompt users to think about and work through ethical questions and issues that are relevant to the exercise: the challenge they’ve been set, the data they’ve been given, their proposed approach to analysis and modelling, etc.
- Encourage users to document their work and their thinking
- Through computational notebooks (e.g., Jupyter notebooks), users can record not only what they’ve done and how they’ve done it, but why.
- Make data presentation part of the expected project outputs
- Ask users to think about presenting to specific audiences: not just fellow data scientists, but to decision-makers, policy experts, the public – whatever makes most sense given the exercise scenario.
2 Advice and recommendations
Invite users to share their work. If users have followed the advice to document their work and thinking, encourage them to share their computational notebooks (including their results and outputs) on their own websites, in a GitHub repository, through social media, etc. This could be a great way for others to discover your exercise, and we could also link to a selection of these notebooks through the exercise page itself.
Think about building in hints and tips. Some users – depending on their prior level of experience – might appreciate some additional guidance here and there.
Collapsible callouts like this make hints and tips visible only to those who want to see them.
Bring different resources together. Exercises provide an ideal opportunity to draw together different strands of the Real World Data Science platform: case studies could provide inspiration or pointers for how to tackle a particular challenge; explainers might contain useful information about the tools and techniques to apply to specific types of data; and if you’re looking for a suitable set of data, it may already be listed in our datasets section.