In light of the recent news of my article being chosen to be published (among other great pieces by subject matter experts) in EMC Corporations Proven Professional Knowledge Sharing Competition, I have decided to share a piece of the article now.
Lately I’ve been reading a lot about “what is a Data Scientist” and after much research, I wanted to showcase what qualities an individual must possess to don the white coat!
Data Scientists are unique individuals that push the boundary of machine and human learning in an effort to discover what cannot be seen by others. A Data Scientist is a balanced role and requires someone with the skills to organize, develop, create and share their work amongst their colleagues and upper management. This section is comprised of detailed explanations of three personas that every Data Scientist possesses: the Nerd, the Artist, and the Business Professional9 (Dutra 2015).
Figure 2- The Nerd9 (Dutra 2015)
The very fundamental roots of Data Science today are composed vastly of the more intricate educational fields of study. These educational backgrounds include mathematics, statistics and computer science. A Data Scientist is skilled to some degree (or at the very least has an interest) in these main educational areas. A Data Scientist should have some domain experience within their area of research. The combination of both educational and domain experience makes a Data Scientist the creating statistical models. It is important that a Data Scientist have experience and a curiosity to solve problems in this format. Data Scientists train these models over several data sets so that the results are as accurate as possible. If you want to be a Data Scientist, it’s probably best to read further on the below statistical fundamentals, as they will be the foundations of analysis when working on an issue or topic:
- Linear/Logistical Regression
- Naïve Bayes Classifiers
- K-Means Clustering
- Decision Trees
- Autoregressive integrated moving average (ARIMA) modeling
- Analysis of variance (ANOVA) modeling
There are also non-mathematical, more computer science based principles that a Data Scientist creates models based on. An example of such a principle is text mining. Text mining (also referred to as “text analytics”) is a way to derive worth from unstructured, qualitative data. I’ll show an example of this later in the article, but for now, it is central to understand that a Data Scientist must be well versed in each discipline.
These fundamental mathematical, statistical and computer science principles backed with a solid understanding of scripting languages and a basic understanding of Big Data hardware functions, are important aspects of a Data Scientists domain. Remember, these are areas of knowledge that can be taught. It is crucial that an aspiring Data Scientist learn and practice these fundamentals in as a daily routine. It is closed-minded to state that individuals that only hold a degree in mathematical or computer based backgrounds fit the bill. Data Scientists are curious individuals who challenge the norm, there are other, more natural, aspects of a Data Scientist that are crucial to the position.
These aspects are derived from a Data Scientists passion for finding and interpreting rich data sources. A Data Scientist must be able to managing large amounts of data despite hardware, software, and bandwidth constraints, merge data sources, ensure consistency of datasets and build mathematical models using such data. They must be able to utilize tools and learn the newest technologies as quickly as possible. For most, these tasks seem like a chore, for Data Scientists, this is only a portion of the job. An aspiring Data Scientist needs to embody and share the love of data analysis.
Figure 3- The Artist9 (Dutra 2015)
It’s one area of skill to analyze the data; it’s another to create with it. Our world is more visual than ever, we are constantly viewing videos and sharing images with one another. Each day there is a new photo or video created in an attempt to make a point on a specific topic. If the piece fails to grab the attention of its audience, the intended message is lost. In order to capture an audience, we must get them to listen with their eyes. The creative element of a Data Scientist is arguably the most important aspect of the role. It is not enough to be able to find patterns amongst the numbers, a Data Scientist has to be able to paint the picture of the data and inspire change amongst their peers and authority. A Data Scientist must be critical of their work and open to ideas and input from their colleagues.
This artistic function of Data Science requires an individual with visualization skills, meaning they, as the artist, has the ability to work the results of their findings into a comprehensive and creative masterpiece utilizing visualization tools.
A Data Scientist is passionate about the business they are involved in. The expertise in their respective domains gives an edge to the Data Scientist over the typical analyst, so it is crucial that when creating these visualizations, the passion is demonstrated in their work.
Figure 4- The Business Professional9 (Dutra 2015)
Even with the math skills of a thousand computational devices and the artistic skillset of Pablo Picasso, at the end of the day-the Data Scientist is still all business. A Data Scientist must have basic business soft skills in order to be successful. A Data Scientist must be proactively looking towards the broader landscape of any problem, and be able to explain the value of their contributions to the larger body of work. Of course, obstacles of this feat cannot be taken on alone, therefore a Data Scientist usually works as a part of a team-similar to other delivery (program management, solutions specialists, etc.) roles -whose members have knowledge and skills which complement his or hers.
By far one of the most critical aspects of a Data Scientists job is presenting and communicating not only the resulting insights of the data to colleagues, specialists and upper management in their team, but the value. Story-telling skills are immensely crucial to explaining the value of a long, resource-intensive project.
Final Thought on the Three Personas
The Nerd. The Artist. The Business Professional9 (Dutra 2015). In my findings, these three personas comprise the essence of each Data Scientist. All of these can be developed in each individual striving to don the white coat. A Data Scientist is an intense discipline, blending copious skillsets into one role.
Hope you enjoyed my post! Look for my full article in the coming months!