Article by UniPi, Department of Computer Science. Author: Fernando De Nitto
What is pizza made of?
The concept of “granularity” in the data analysisOften the “granularity” concept comes up, when working with data.
What does it mean? Granularity indicates the level of detail of that data. High granularity level refers to a high level of detail, vice-versa low granularity level refers to a low level of detail.
Practically speaking, the more subdividable and specific a data is, the more granular it is considered to be. Thus, “granularity” and “level of detail” of data are the same thing.
However, the use of correct terms is a prerequisite to approach the culture of data and flexibility of the right terminology is a necessary condition (though often non sufficient) to make oneself understood when dealing with data.
The pizza exampleAn example to understand the importance of granularity in everyday life is the pizza (our data): in the initial stages, pizza is made up of many grains of flour, and then these grains are put together with water to form the basic pizza dough. No grains were lost during the mixing process. So the data remained the same. Anyway, the information we can extract from that data have diminished dramatically: indeed we can no longer distinguish grains of flour, nor divide them!
Computing power and granularityHaving a high level of detail means having many possibilities, but at the same time it requires more effort to process data. In the years when the first data warehouses came into the world, computers had much less computing power than any personal computer today, processing was therefore slower and the potential for analysis rather reduced. Having a low granularity was necessary at that time to provide results within the timeframe required by the market, scientific research or simply to cope with limited technological possibilities.
Today, thanks to the evolution of technology, it is possible to analyse the vast majority of data using any computer no more than 5 years old. It is important to bear in mind the limitations that a high level of detail brings with it so as not to get lost in projects that are far beyond our reach.
Name and surname: a concrete example in building a very simple databaseAnother good (more technical) example of granularity is a piece of data representing a person’s first and last name: if these are stored together, their level of detail will be less than if the two fields were stored separately.
Such a trivial example actually hides problems that could be enormous, and which well explain the importance of data granularity. Let us suppose that we want to count how many people are called ‘Mario’ in the visitors’ register of a museum. This would be possible, without ambiguity, only in the second case. In the first case, it would be necessary to separate the first name from the surname, since the information is not directly accessible.
With a minimum of technical skills, one could think of separating the first name from the surname using the space and by assuming that the first name is in the first place and the surname in the second, but this may not be sufficient. Just think of possible errors in compiling data: we may sometimes have first names and surnames in different positions (this also happens when filling in paper forms) and other times the two fields are not even separated by a space (e.g. due to a typing error on the keyboard). This means that we cannot carry out our count. Again, it is also possible that what is a first name for some people is a surname for others (e.g. John Ross).
By storing name and surname separately (and therefore with high granularity) these problems are simply not present and any processing (based on assumptions that are not always true and stringent) is no longer necessary.
Granularity in Me-Mind projectIn the Me-Mind project, we have often addressed the concept of data granularity within the data analysis in our two use cases.
In the data collection phase, it was possible to appreciate how important the concept of granularity is in defining a targeted strategy for achieving certain objectives. For example, in structuring the questionnaires proposed in the installation of the DDS during the Internet Festival 2021 the questions were designed to be able to analyse different details related to the multiple aspects that were to be analysed. At the same time, we realised that the lack of granularity in some of the data could be a strong limitation during the analysis process.
In the case of data collection from external sources, it was not always possible to find a level of detail that would meet our needs in terms of data analysis. In other cases, however, a high level of granularity allowed us to go well beyond our expectations.
ConclusionsIn conclusion, the concept of data granularity is very important because it involves every step within any data application. Practically speaking, when collecting data, it is important to precisely define the level of detail to be achieved in order to meet your needs.
At the same time, when analysing the data, it is important to keep in mind the level of detail that can be achieved in order to be clear about the options available.
For small and medium-sized enterprises that are still far from a complete data culture, the concept of granularity is a very first and invaluable step: knowing what can be extracted from data and reasoning in advance about its potential and limitations saves an enormous amount of time and money, scarce and precious resources for companies of this size.