However, organizations need not fork out on human capital anymore: easy-to-use software is democratizing even those complex tasks that previously required data scientists. Artificial intelligence tools like DataRobot and AutoML help build models without the need to code. There are solutions to convert speech to text or detect objects, made available free-of-charge by open-source pioneers such as HuggingFace. This is great news for organizations that want to put data at the core of their business model.
The commoditization of data science
For most data scientists, a considerable chunk of the work involves “scrubbing” the data: collecting, cleaning, structuring, and processing it to avoid creating fatally flawed algorithms. But in the next few years, we expect that menial data preparation will be carried out using software tools, such as IBM InfoSphere Information Server or innovative solutions from newcomers like Sweephy, a low-code data cleaning startup. Gartner predicts that 70% of new applications developed by organizations will use low- or no-code solutions by 2025, up from under 25% in 2020.
Even the tasks where data scientists once stood out due to their creativity, analytical strength, and business acumen are slowly being automated. That includes the creation of algorithms, which transform clean and reliable data into predictions and recommendations. The rapid growth and adaptation of generative AI tools, such as ChatGPT, has given even more impetus to this automation. Specifically, three forces are slowly eroding the need for these individuals.
First, we are starting to see large-scale commoditization of frequently used data services such as recommender engines, chatbots, and matching systems. The argument for building a customized data service is quickly losing ground in the face of emerging libraries of affordable and off-the-shelf algorithms and low-code solutions.
Second, the sheer volume of data, rather than the algorithm itself, is what is most valuable to organizations, especially those using advanced machine-learning techniques. The importance of volume over sophistication is visible in many winning applications, including TikTok, Spotify, and Netflix, which all leverage simple recommendation engines to generate value for users.
Third, many data scientists are technical specialists who lack a business background. This can lead to a lack of appreciation of the actual problems their tool is meant to address, as well as a misunderstanding of how it will be used by customers.
The data scientist: 2.0
Taken together, these three forces are leading to a different role for the data scientist: from that of an astronaut (using state-of-the-art technology to venture into unchartered territory) to that of a champion race-car driver (using standardized technologies for real-world navigation).