Skill Set for data scientists

General approach

Image credit: elavateleaders.com

On the Internet, you can find a lot of definitions of data science. My preferred description of this science I founded it in the Harvard Data Science Review (HDSR). Rafael A. Irizarry defines data science as “an umbrella term to describe the entire complex and multistep processes used to extract value from data” (Irizarry, 2020). Following this definition, a strong data scientist require to have expertise in the following areas (Diesinger, 2016):

  • Technical skills.
  • Analytical skills.
  • Business skills.

Usually, technical skill is related to informatics and coding abilities. Irizarry calls this area Backend data science or data engineering. Currently, the most popular framework for data science is R and Python (or a mix of both); although, Julia can have a big role in the future. For example, in this tweet, Viral B. Shah explains that Julia is better in High-performance computing than Python.

Viral B. Shah and Elon Musk tweet

Additionally, technical skills include the capacity to manage various computational architectures, such as databases and operating systems but also other skills such as parallel computing and high performance computing.

In summary, the abilities of managing, cleansing, consolidating, and modeling data must be a crucial requirement for data scientists. Also, (Diesinger, 2016) consider that some profiles for data scientists are focused only on this aspect.

On the other hand, analytical skills are related more to data analysis. Irizarry denominates this branch Frontend data science. Moreover, this specific area can be divided depending on the tasks within data analysts and machine learning engineers. Data analysts are involved in the process of modeling, simulation, and causal inference. Machine learning engineers design and develop prediction algorithms that need a large amount of data. Usually, both areas require high performance in advance statistics, math, experiment design, research expertise, and data visualization.

Finally, the goal of data science is to provide business problem solving using scientific approaches. The two previous skills focus on the scientific approach, but data scientists require to have a good perspective of business processes with the purpose of given effective solutions. Also, it is very easy to fall in the false authority fallacy when data scientists use technical terms for expressing their solutions. As a result, “it is important for data scientists to communicate effectively with business users utilizing business lingua” (Diesinger, 2016).

References

Avatar
Cesar Conejo Villalobos
Graduate Student/Data Scientist

My research interests include anomaly detection, imbalanced data, and fraud detection.