Data science skills

August 22, 2019

ds_for_research.png#f8c69fff3a56832b60094da67bea7657859f44649c1bee435a7bd67960239462

Data Science is dominating discussions in academia and private sector worlds these days. Everyone seems to agree that they need more data scientists, but they don’t tend to know how to get them or grow them internally. I’ve developed the following diagram to help split out the various types of data science skills. I’m focusing mostly on research applications of data science, but there really isn’t a huge difference between research and business applications.

Along the top of this skills chart we have an axis going from a local brain/computer to global/distributed cloud skills. I think this is an important way to organise skill development and learning and it is something that Software Carpentry has acknowledged for a long time, learners benefit from a mastery of local-scale compute before they grow to “beyond-the-desktop” compute. So as we think about ways to build training in Tensor Flow, scikit-learn, R, Python, Cloud systems, we need to acknowledge that we can’t teach everything at once before learners have a mental model for how things fit together.

This diagram seeks to make a stab at organising where and how skills might be developed, or not developed depending on the research or business needs of the learner. Many researchers can have incredibly productive careers in their area of research not ever leaving the desktop. Others will find immediate need to move into the cloud and may struggle if they don’t have a strong local-compute model. Still others will naively operate in clouds not really understanding how things work at basic level, but will have operating mental models that help them to be productive.

To build a robust and agile community of Data Scientists we need to acknowledge that all of these ways of using and seeing the infrastructure, methods and tools are ok.

What do you think is missing or needs to be added to this chat? Comment below or open an issue on GitHub

High resolution versions of the above graphic are available on Github