The data science field has changed greatly with the advent of AI. Artificial intelligence has enabled the rise of citizen data scientists, the automation of data scientist’s workloads, as well as the need for more skilled data scientists.
Vincent Granville, co-founder of Data Science Central, a community and resource site for data specialists, expects to see an increase in AI and IoT in data science over the next few years, even as AI continues to change the data science field.
In this Q&A, Granville discusses data science trends, the impact of AI and IoT on data scientists, how organizations and data scientists will have to adapt to increased data privacy regulations, and the evolution of AI.
Data Science Central was acquired by TechTarget on March 4.
Will an increase in citizen data scientists due to AI, as well as an increase of more formal data science education programs, help fix the so-called data scientist shortage?
Vincent Granville: I believe that we will see an increase in two fronts. We will see more data science programs being offered by universities, perhaps even doctorates in addition to master degrees, as well as more bootcamps and online training aimed at practitioners working with data but lacking some skills such as statistical programming or modern techniques such as deep learning — something old but that became popular recently due to the computational power now available to train and optimize these models.
There is also a parallel trend that will increase, consisting of hiring professionals not traditionally thought of as data scientists, such as physicists, who have significant experience working with data. This is already the case in fintech, where these professionals learn the new skills required on the job. Along with corporations training staff internally via sending selected employees to tech and data bootcamps, this will help increase the pipeline of potential recruits for the needed positions.
Also, AI itself will help build more tools to automate some of the grunt work, like data exploration, that many data scientists do today, currently eating up to 80% of their time. Think of it as AI to automate AI.
Similarly, how will, or how has, data science changed with the advent of AI that can automate various parts of the data science workflow?
Granville: We will see more automation of data science tasks. In my day-to-day activities, I have automated as much as I can, or outsourced or used platforms to do a number of tasks — even automating pure mathematical work such as computing integrals or finding patterns in number sequences.
The issue is resistance by employees to use such techniques, as they may perceive it as a way to replace them. But the contrary is true: Anything you do [manually] that can be automated actually lowers your job security. A change in mentality must occur for further adoption of automated data science for specific tasks, simple or not so simple, such as the creation of taxonomies, or programs that write programs.
The trend probably started at least 15 years ago, with the advent of machine-to-machine communications, using API’s and the internet at large for machines, aka robots, to communicate between themselves, and even make decisions. Now with a huge amount of unexploited sensor data available, it even has a term of its own: IoT.
An example is this: EBay purchases millions of keywords on Google; the process, including predicting the value, ROI and set[ting] the pricing for keywords, is fully automated. There is a program at eBay that exchanges info with one running at Google to make this transparent, including keyword purchasing, via programmed APIs. Yet eBay employs a team of data scientists and engineers to make sure things run smoothly and are properly maintained, and same with Google.
How will increased data privacy regulations and a larger focus on cybersecurity change data science?
Granville: It will always be a challenge to find the right balance. People are getting concerned that their data is worth something, more than just $20, and don’t like to see this data sold and resold time and over by third parties, or worse, hijacked or sold for nefarious purposes such as surveillance. Anything you post on Facebook can be analyzed by third parties and end up in the hands of government agencies from various countries, for profiling purposes, or detection of undesirable individuals.
Some expectations are unrealistic: You cannot expect corporations to tell what is hidden in the deep layers of their deep learning algorithms. This is protected intellectual property. When Google shows you search results, nobody, not even Google, knows how what you see — sometimes personalized to you — came up that way. But Google publishes patents about these algorithms, and everyone can check them.
The same is true with credit scoring and refusal to offer a loan. I think in the future, we will see more and more auditing of these automated decisions. Sources of biases will have to be found and handled. Sources of errors due to ID theft, for example, will have to be found and addressed. The algorithms are written by human beings, so they are not less biased than the human beings who designed them in the first place. Some seemingly innocuous decisions such as deciding which features, or variables, to introduce in your algorithm, potentially carry a bias.
I could imagine some companies [may] relocate … or even stop doing business altogether in some countries that cause too many challenges. This is more likely to happen to small companies, as they don’t have the resources to comply with a large array of regulations. Yet we might see in the future AI tools that do just that: help your business comply transparently with all local laws. We have that already for tax compliance.
What other data science trends can we expect to see in 2020 and beyond?
Granville: We live in a world with so many problems arising all the time — some caused by new technologies. So, the use of AI and IoT will increase.
Vincent GranvilleCo-founder and executive data scientist, Data Science Central
Some problems will find solutions in the next few years, such as fake news detection or robocalls, just like it took over 10 years to fix email spamming. But it is not just a data science issue: if companies benefit financially short-term from the bad stuff, like more revenue to publishers because of fake news or clickbait, or more revenue to mobile providers due to robocalls, it needs to be addressed with more than just AI.
Some industries evolve more slowly and will see benefits in using AI in the future: Think about automated medical diagnostics or personalized dosing of drugs, small lawsuits handled by robots, or even kids at school being taught, at least in part, by robots. And one of the problems I face all the time with my spell-checker is its inability to detect if I write in French or English, resulting in creating new typos rather than fixing them.
Chatbots will get better too, eventually, for tasks such as customer support, or purchasing your groceries via Alexa without even setting foot in a grocery store or typing your shopping list. In the very long term, I could imagine the disappearance of written language, replaced by humans communicating orally with machines.