By Abhijit Dasgupta, Director, Bachelor of Data Science program, SP Jain School of Global Management
The second decade of 21st century the buzz is all about machine learning, artificial intelligence et all. You open a corporate news article and you get to find how the company is planning or executing projects on artificial intelligence. Just a random google search using ‘Data Science’ yields 2,90,00,00,000 results, ‘Machine Learning’ about 2,43,00,00,000 results and ‘Artificial Intelligence’ generates 88,10,00,000 results. These are all inter-related and mostly form the core subjects of a data science course. Interestingly, since 1980s, disciplines like Management Information Systems were in place and largely the work involved data wrangling, cleaning and presentation, often providing insights into the solution that a company might be investigating to find. So, what changed?
Beginning of the 90s. with the advent of Internet, firms started generating huge amounts of data, many of these are new companies primarily working on-line such as Amazon, eBay (the early pioneers of internet retail in the world), Hotmail email, Napster music sharing platform etc. Since the market was the whole world, therefore data generated out of the business activities increased substantially. The then available technologies were trying to fill the gaps and hence companies came out with newer technologies like virtualization of compute platforms, storage and also network. Early 2000 say the birth of companies like Google, Facebook, Amazon Web Services etc who were constantly trying to push the boundaries of the available technologies. Towards later half, technologies like Hadoop, Spark etc we born allowing users to store infinite amount of data. Alongwith this newer technologies for processing, storing and sharing, companies came up with packaged solutions integrating machine learning algorithms and made them open source. This process of democratization happened quite fast, all in about 7-10 years which made technologies those were only used by really large rich companies became available to a practitioner, free of cost. Hence data revolution started, and today as we speak, there is hardly any area where data science isn’t used. Below are a few such examples:
Marketing: Every student of marketing would have done a course on segmentation-targeting-positioning and often found it enormously complex to use customer data and generate the matrices to come up with a solution. Not anymore; todays’ students use libraries from Python or R (free of cost) to do this simple (yet mathematically challenging work) in a jiffy on his/her desktop. In India we have come across OTT Platforms, and one of the key technology that drives the marketing out there is called ‘Recommender Systems’. You watch a movie on Netflix, and you get recommendations on what to watch next. Recommender systems are ubiquitous in internet retail stores like Flipkart, Amazon too.
Finance : Ever worried how does a Wall Street investment banker does trade ? He looks up the past data, looks up few other resources like Bloomberg etc; to take a calculated guess on a buy or sell of a certain stock. He uses Time Series Analysis (a known statistical method) to predict the movements. In Banks, lending has always been fraught with risks. It is possible to manage risks for a few lenders individually, but what happens if the numbers are like in million customers? Artificial intelligence powered software agents would automatically identify the risks of lending or even a transaction being probably a fraud one.
Healthcare : Healthcare analytics is the process of analyzing current and historical industry data to predict trends, improve outreach, and even better manage the spread of diseases. It can reveal paths to improvement in patient care quality, clinical data, diagnosis, and business management.
In the times of Covid, it is important to understand how data science is helping the healthcare sector. A statistical tool called ‘Design of Experiment’ can help create the matrices of most effective drug combinations. This method is used by every vaccine, API and pharma companies in the world to roll out drugs, vaccines, supplements.
Climate Change : Many scientists agree on the fact that we are already too late, people are just becoming conscious about this problem. And with the people comes politics, and with politics comes the money. That’s why in the next years there will be a major push towards research in the Energy Sector, and Data Science is going to play a big role in this huge battle. Finding new patterns in the data is a clear path to obtaining powerful solutions for our energy-hungry world. Tools from Predictive Analytics method are being extensively used to identify and solve problems associated with climate change. A company called ‘Deepmind’ an AI company, acquired by Google, has helped the company to reduce carbon footprint of Google and improved energy efficiency by almost 40% in their California office.
Marine Biology : It is all about uncovering findings from marine data such as oceanographical data collected using both in situ methods and remote sensing. It typically starts with data exploration followed by quantitative techniques drawn from mathematics, statistics, information science, and computer science in order to get a level deeper, e.g., inferential models, segmentation analysis, time series forecasting or synthetic control experiments. The overall intent is to scientifically piece together a forensic view of what the data is really saying about marine system dynamics. The data to be used, however, does not need to represent exclusively marine biotic and abiotic data. In fact, any type of data describing a climatological, terrestrial, or socio-economic component that could affect the marine system could be analyzed. Amongst these, regional and large-scale climatological indices, fisheries, agricultural, and demographic data are commonly used to study the extent of external forcing on the system and potential feedback mechanisms.
Agriculture : Data science is changing the way farmers and agricultural professionals have been making decisions. Modern technology has made it possible to collect data of soil, water, and minerals from farms, and store them in a centralized system, popularly known as the Internet of Things (IoT). IoT refers to the idea of connecting interrelated devices to the Internet so that they can share and exchange data independently. Such data can be combined with data from external sources such as satellites, weather stations and even data from neighboring farms to form a bigger volume. Data analytics can be used in the accumulated bulk to obtain information which can be used by farmers to optimize their farming. Farmers can thus make smart farming decisions using that information throughout the production cycle; from planning, plantation, harvesting, all the way to its marketability.
Similarly other areas which uses data science extensively are like companies in Transportation & Logistics, Environmental engineering, government bodies trying to identify a possible terrorist, robotic warfare etc. and the list goes on. It’s a good time to learn data science, since the future is largely going to be driven by a collaboration between man and the machine and it is where newer technologies like AI is going to slowly get mainstreamed. There would be definite requirements of skills and the people driving those systems.