Machine learning and artificial intelligence are buzzwords that span well beyond the technology industry. No matter the industry, most companies are trying to figure out how to make the best use of their data and implement some type of ML and AI into their workflows.
To make the most of ML and AI, an organization needs data scientists – and not just one. They need a team of data scientists in place with a broad combination of skills that enable an organization to get the most out of a ML or AI implementation.
Leveraging data science has an almost immediate return on investment for any organization. According to McKinsey & Co., 69% of problems can be solved better with data science techniques.
Think consumer goods – traditional forecasting techniques are about 68% accurate, which isn’t bad – it’s enough to get the job done, but it’s not great. According to McKinsey, even a modest increase of 10-20% will result in an average 5% reduction in inventory costs and 2-3% increase in revenue for that organization.
That modest increase can be easily achieved by implementing data science techniques like machine learning. Forecasting is low-hanging fruit for a data scientist, for example. By bringing in external features, like holiday calendars and promotion calendars to upgrade a standard ARIMA model, you’ll often get that 10-20% increase.
The key to bringing this kind of value to an organization is a well-rounded data science team. There are specific skills to look for when hiring data scientists that can be broken down into hard skills and soft skills.
Communication – While communication is key for many positions, in data science the communication expectation is special. Technical people can’t only talk to technical people all day. You need your technical people to be able to communicate effectively with non-technical people to be successful.
Storytelling – In that same vein, your data scientists need to be able to tell the story of the work they’re doing. If an organization is working on a proof of concept, but your data scientists can’t explain to leadership the value that project is going to bring, the project will never move beyond the POC.
Collaboration – While most employees need to be able to collaborate, data scientists need to be able to collaborate outside of their technical world. They must collaborate with people who don’t know what a neural network is. Your data scientists may have to build one, but they also have to work with people who don’t know what it is and don’t want to be taught, so collaboration is critical.
Business acumen – This is an often overlooked trait of a data scientist. You need people who understand the big picture. They may be able to build a cool model to predict how long the lunch line will be in the cafeteria, but if there’s no ROI for the business, why would they do it? Your data scientists have to be able to prioritize what features should be added to projects and what projects should be pursued.
Leadership – This is important for any team, and it is equally critical for a team of data scientists.
Problem solving – If I had to pick one trait for data scientists, it would be problem-solving. They can’t just be experts in machine learning and deep learning, they have to be able to talk to stakeholders, understand their challenges and troubles and then turn those into a problem statement – an algorithm – that they can find a solution for.
Software development – There are a lot of open-source tools out there to help data scientists, like Jupyter Notebooks, but at some point, data scientists will have to pull code out of that tool and put it into production, and that is when software development skills are needed.
Data wrangling – This skill goes well beyond being handed an excel file and knowing what to do with it. You need people who can hunt down raw data to scrape, clean it, put it into databases for the rest of the team and transform it as needed.
Advanced mathematics – Data scientists are often described as statisticians. To understand how machine learning and deep learning algorithms work and be able to excel at attacking those kinds of problems, data scientists must possess advanced mathematics skills.
Machine learning – Although there is a lot of buzz around this phrase, there is some real merit to it. It’s important to have data scientists who know how to choose the right tool from the huge number of options available in the machine learning toolkit.
Deep learning – While deep learning is a subset of machine learning, they should be separate disciplines because you would not invest in them at the same time within your team. Designing experimental network architectures is an exciting area of research, but usually pretty far down the list of priorities for most data science teams.
Building out a data science team is a balance of finding a diverse set of data scientists who not only cover all the data science bases but also work well with the business side of the house. It should also happen in phases because when you’re just getting started, you likely don’t need a Ph.D. statistician on your team just yet.
Phase 1 - Prioritizing communication between teams
Different phases of a data science project require different things. When you’re testing the waters, trying to decide if data science can help your organization with a problem, an organization needs to be able to demonstrate value to the business.
The people you should be looking for in this phase not only have average data science hard skills but also lean heavily on soft skills. This is because early in a project the data scientists will be interfacing frequently with the business side of the house, trying to identify problems worth solving. Once the team has built some POCs, they have to tell the stories that will get the stakeholders on board, explaining things clearly so they will understand why the product will be useful, how it will create value and how it will impact the business. The people needed in this phase can often be found internally – they are often rockstar analysts who have been around a while.
Phase 2 - Managing data in production
As you go from exploration to productizing and need to expand your data science team, you should look for those with higher skill levels in a few key areas.
Organizations need data science engineers who specialize in software development and data wrangling. These are the people who will help you productize what you have. They’ll be better equipped to manage data consistently in production and maintain models over the long term. Data science in production is different than other tools in production. As your organization ingests more data, it is possible that models stop performing as they were, all things that need to be tracked in production, which is where data science engineers thrive.
Phase 3 - Strengthening your leadership
Once you’ve had success and you’ve put that product into someone’s hands, the next step is to fan out and find other problems to solve. This is when an organization needs to start bringing in data science leads. For each effort, you need someone who can guide. The skillset is similar to phase 1 – with the soft skills – but they need more experience in those areas. For these data science leads, you need to bring in external power. They need more expertise in machine learning, advanced math, problem solving and leadership. Leading a team of data scientists can be like herding cats. Data scientists are highly technical people who know they’re smart, so personality conflicts can happen – it’s part of the process. The key to bringing in that leader is hiring someone with enough technical merit to the team to be seen by the other members as a leader. If you bring in someone high on business acumen with lower technical skills, your team will be hesitant to follow that person.
Phase 4 - Developing customized tools for your business
Once you’ve landed on the other problems to solve and are ready to productize, you can’t follow the same path you did in phase 2. This is where it is time to bring in data science developers. These are people who live and breathe code and build frameworks themselves rather than using Jupyter Notebooks. This is also a good time to bring in some expertise in machine learning. Building frameworks and productizing machine learning tools are unique challenges separate from software. These people also need great collaboration skills – because they’ll be working across teams and need to understand what the challenges are and come up with patterns that work for everyone. In this instance, it doesn’t always mean finding external hires – you can level up people on your existing team.
Phase 5 - Bringing in hyperspecialists
When you make it to this point, you’re on the bleeding edge of AI and machine learning. You’ve productized many things, you’ve got a team of experts. The only hard skills left to cover are expertise in advanced math and deep learning. This phase is where advanced math will become really handy. If you’re on the bleeding edge of research, it will be critical to understand that math. This phase is also where you will start building a lot of frameworks in-house, so you’ll need rockstars in data wrangling and software development.
If this all sounds a little overwhelming, you can work with Blueprint, an experienced technology partner with a team of experienced data scientists that have diverse specializations. Let’s have a conversation about how our data scientists can apply their skills to help you get the most out of an ML/AI implementation.
How mature is your data science practice?
Assess your data science maturity (or readiness).
We'll cover these areas:
Data readiness | Organizational strategy | Data infrastructure | MLOps (Machine Learning, DevOps and Engineering)
MLOps (Machine Learning, DevOps and Engineering)