Following the path of digitalization in Slovenia and Europe: World of data: where the faster beat the bigger

In the age of digitalization, more and more companies are facing huge amounts of data, which are very quickly generated by many different sources. We call them big data. What can we do with this big data and what role do they play in the field of data science? How can big data impact businesses, the economy and society? Let’s try to explain some basic concepts and connect these data with each other into a meaningful, perhaps even understandable, whole.

Various data can be created by people or generated by machines or devices, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, and other information. The main advantage of big data analytics is that it can reveal patterns and connections between different sources and datasets, allowing for useful insights and better decisions.

On its website, the European Commission cites the areas of healthcare, manufacturing, food security, intelligent transport systems, energy efficiency and urban planning as examples of the use of big data. These areas ultimately allow for increased productivity and better services, which are a source of economic growth.

Generating value at the different stages of the data value chain will be at the heart of the future knowledge economy.  Photo: Getty images.
Generating value at the different stages of the data value chain will be at the heart of the future knowledge economy. Photo: Getty images.

The future is in the data

Generating value at the different stages of the data value chain will be at the heart of the future knowledge economy. Improved analytics and processing of data, especially of big data, will enable the transformation of Europe’s service industries by generating a wide range of innovative information products and services.

According to the European Commission, analytics and processing of big data will increase the productivity of all sectors of the economy through improved business intelligence and enable more efficient solutions to many of the challenges that face our societies. The Commission expects further improved research and speed up innovation, cost reductions through more personalised services and increased efficiency in the public sector.

Due to the exponential growth of the volume, variety and velocity of data, databases are becoming increasingly difficult to capture, manage and process with conventional means. Getting value from the vast amounts of data that users generate daily has become crucial for companies such as Google and Facebook.

In doing so, such companies benefit from real-time market data, as they make decisions easier for other companies, which in turn can lead to higher revenues and lower costs. Analytics of large volume of data can provide detailed business information on customer behaviour or consumer profiling.

Data science

Big Data is essentially a special application of data science, which involves many specific domains and skills. The general definition is that data science encompasses all the ways in which information and knowledge is extracted from data.

As already mentioned, data is everywhere and is found in huge and exponentially increasing quantities. Data science as a whole reflects the ways in which data is discovered, conditioned, extracted, compiled, processed, analysed, interpreted, modelled, visualized, reported on, and presented regardless of the size of the data being processed. Big data is therefore a special application of data science.

Data science is a very complex and intertwined field as it incorporates mathematics, statistics, computer science and programming, statistical modelling, database technologies, signal processing, data modelling, artificial intelligence and learning, natural language processing, visualization, predictive analytics and so on. It is applicable to all the areas we have mentioned in big data and many others.

Various data can be created by people or generated by machines or devices, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, and other information.
Various data can be created by people or generated by machines or devices, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, and other information.

How is the data processed?

The life cycle of useful and collected data in various ways usually includes its capture, pre-processing, storage, retrieval, post-processing, analysis, visualization, and so on. Once captured, data is usually referred to as being structured, semi-structured, or unstructured. These distinctions are important because they are directly related to the type of database and storage technologies required, the software and methods used to query and process data, and the complexity of dealing with the data.

Structured data refers to data that is defined by a structure or schema in a database or spreadsheet. Unstructured data is data that is not defined by any schema, model, or structure, and is not organized in a specific way. In other words, these is just stored raw data. It follows naturally, that Semi-structured data is a combination of the two.

In order for data to be used in a meaningful way, it must first be captured, pre-processed and stored, experts say. Following this process, the data can be mined, processed, described, analysed, and used to build models that are both descriptive and predictive. Descriptive statistics is a term used to describe the use of statistics to a data set in order to describe and summarize the information that the data contains. It basically includes describing data as well as other forms of analysis and visualization.

Inferential statistics and data modelling on the other hand are very powerful tools that can be used to gain a deep understanding of the data and predict meaning and results for conditions beyond of those that data has been collected. Using certain techniques, models can be created, and decisions can be made dynamically based on the data involved.

What have we learned?

We have never before collected as much varying data as we do today, nor have we needed to handle it as quickly. The variety and amount of data that we collect through different mechanisms is growing exponentially. This growth requires new strategies and techniques to capture, store, process, analyse and visualize data.

Data science is therefore an umbrella term that encompasses all of the techniques and tools used during the life cycle stages of useful data. On the other hand, big data typically refers to extremely large data sets that require specialized and often innovative technologies and techniques in order to use data efficiently.

Both of these fields are going to get bigger and become much more important with time. The demand for qualified practitioners in both fields is growing rapidly and they are becoming some of the hottest and most lucrative fields to work in. Armed with at least a basic explanation of key concepts involved with data science and big data, you may now be better able to understand some of the other technologies we already have or are going to introduce.

Author: Rok Žontar

Keywords: Big Data, Data science, technology, digitization, European Commission.

Disclaimer:

This article is part of joint project of the Wilfried Martens Centre for European Studies and the Anton Korošec Institute (INAK) Following the path of digitalization in Slovenia and Europe. This project receives funding from the European Parliament. 

The information and views set out in this article are those of the author and do not necessarily reflect the official opinion of the European Union institutions/Wilfried Martens Centre for European Studies/ Anton Korošec Institute. Organizations mentioned above assume no responsibility for facts or opinions expressed in this article or any subsequent use of the information contained therein.