Introduction
The purpose of the ME-MIND project is to enable small and medium-sized cultural and creative industries to evaluate their local impact through the collection, manipulation, cleaning and subsequent analysis of data.The two use cases of the project, the Estonian National Museum and the Internet Festival event, are very different examples with respect to their nature, the territorial extension, the way they are disseminated and the time span in which use cases are operational.
In this post we describe the main characteristics of the two use cases, useful to understand the different approaches used in the data analysis, but also challenges and opportunities we faced working with the two use cases.
The use cases
The Estonian National Museum, based in Tartu, is the reference point for the entire country of Estonia regarding Estonian history, customs and traditions. In addition to the permanent exhibition, the museum hosts temporary exhibitions and events every year. The museum also represents an avant-garde model in terms of logistics and technologies proposed to the visitors, who can, for example, interact with electronic ink devices to obtain the information in multiple languages.The Internet Festival event is an Italian national event (of 4 days) based in Pisa, organised by Fondazione Sistema Toscana every year in October. The festival is made up of hundreds of events scattered in various locations within the city of Pisa. Every year all the events, while dealing with substantially different themes, are linked to a common keyword that represents the common thread of the entire annual edition.
Data collection: similarities, differences, challenges and opportunities in the two use cases
The profound diversity of the two use cases intrinsically defines the challenges of data analysis to be carried out and more generally represents a challenge for the project, which must find a common pattern to propose to other small and medium-sized cultural and creative organisations.The collection of the data itself represented the first challenge of the data analysis for both the use cases. Initially, the two use cases collected their internal data such as financial data, data on their attendees and anything else that could be extracted from excel files or internal databases. In the case of the Estonian National Museum, we extracted technical data from the museum devices that relates the interactions of the individual tickets with the devices.
Furthermore we collected external data. In the case of the Internet Festival, we obtained external data from various enterprises, bodies and organisations operating in the city of Pisa. While, other data came from the questionnaires distributed to the audience during the 2021 festival edition.
Finally, for both use cases we adopted data extrapolation techniques, in particular from the TripAdvisor platform.
Once all the data were collected, we reflected on which analyses were the most useful and consequently what deductions could be made.
Three main questions guided our data analysis:
- Is it possible to get useful information about the visitors using questionnaires?
- Is it possible to correlate internal data with external data in order to better observe how the two use cases are impacting their local environment?
- Can purely technical data say something about the visitor behaviour within a museum?
For both use cases, the same data analyses were performed, barring substantial differences in the nature of the use cases themselves. A subset of the results will be shown below in the form of examples, also illustrating some of the difficulties encountered.
Internet Festival
For the Internet Festival, the most important data analysis concerned the data collected from external sources, such as- The hotel attendance data obtained through the extraction of data from the TripAdvisor platform
- The data on the flow of car traffic provided by the Tuscan region
- The questionnaires filled in by the participants in the event
- The attendance data provided by the museums of the city di Pisa
The statistical indices, calculated for the reference attributes, were then related to the quantities of the same attributes during the period of the Internet Festival. Subsequently, the percentage variation between the general trend (provided by the centrality indices) and the values during the festival period was calculated. The results obtained allowed us to evaluate the presence of trends in reference to the Festival period. In almost all cases the resulting trend was positive: from the data it is possible to evaluate how, in correspondence with the event, the quantities taken into consideration grow.
For example, as shown in the following figure, in the last 5 years (from 2016 to 2021) we can see an increase in hotel reviews in the city of Pisa during the Internet Festival week.
Weekly Review of Hotel in Pisa from 2016 to 2021
Another interesting example is given by the questionnaires, as shown in the next figure: it was possible to identify the distribution of the visitors’ gender that participated in the Internet Festival during the edition in 2021.
Visitors’ Gender Distribution during Internet Festival 2021
Results of Traffic Sensor Data Analysis
Estonian National Museum
Regarding the use case of the Estonian National Museum, we focused on analysing the technical data obtained from the internal logs of the interactive devices installed inside the museum. In fact, museum visitors can view a lot of information, in different languages, thanks to their ticket. The devices, on the other hand, record any type of interaction for each individual ticket. Each data within the dataset refers to a single interaction of a single ticket with a particular device. One difficulty with this part of the data analysis was the format of the data itself: most of the technical data recorded by the devices was in the form of codes. In addition to the timestamps, most of the codes were unusable for data analysis purposes.Fortunately, thanks to the correspondence tables provided by the Estonian National Museum, it was possible to extrapolate from the purely technical data the following derived data:
- Date and time of the interactions of the single ticket with a specific device.
- Language used by the single ticket in interacting with the device.
- Visited position, with respect to the museum, during the interaction of a ticket with a specific device.
- The time spent by a single ticket inside the museum.
- The most requested and visited topics within the permanent exhibition of the museum.
- The path inside the museum of each single ticket.
For the analysis of the time spent inside the museum it was possible not only to calculate general metrics on all visitors but also the different languages used. Languages themselves are an approximate indicator of the origin of visitors. The following figure shows the distribution of time spent inside the museum by French-speaking visitors.



Conclusion
It’s worth saying a few words about the difficulties we encountered during the data collection and analysis process in order to provide some suggestions to speed up and make the entire data analysis process scalable.1. Try to make your internal datasets homogeneous from the beginning
In the analysis of the internal data provided by the two use cases, one of the major difficulties was the profound difference in the formats of the data. While the nature of the data is impossible to change, it is possible to think from the outset about the type of format should have. Having the same data format, for example in excel sheets or in JSON format, speed up the process of data analysis and further manipulation actions. Also having the same format makes the process scalable in order to perform more complex data analysis. Having different data formats requires a conversion process, which is not always trivial. Therefore, thinking from the beginning to standardise the data format will facilitate the internal data analysis process.2. Turn everything into a source of data
In the previous paragraphs we have seen how in the case of the Estonian National Museum we were able to extrapolate information on the behaviour of museum visitors from purely technical data recorded by the devices inside the museum. This result suggests an important piece of advice to the entire cultural industries chain: you must use everything you have available as a source of data. For example, within the organisation of an event it is possible to transform the event registration process into a data acquisition process in order to directly evaluate trends of the participants or obtain other information. In this way it is possible to collect data indirectly and obtain the part of information you would get with questionnaires.3. Never underestimate networking with other organisations in the area
The collection of external data around the Internet Festival led us to get in touch with multiple organisations of the Tuscan territory. In some cases, getting the data from these organisations was difficult and in some cases impossible. In most cases, however, we obtained a lot of data from organisations willing not only to provide the data but also to view the results of our analysis. Establishing strong relationships with other organisations represents multiple advantages.4. Don’t confuse public data with open data
Finally, do not confuse public data with open data. Data that is publicly usable (free) is not always open data. A fundamental characteristic of open data is usability, which depends on the specific case: data that can be used for one purpose may not be used for another purpose. Our advice in this case is to immediately evaluate whether the open data you have available are usable for your purposes, otherwise you have to find valid alternatives.5. If you can’t read it, try to view it!
As regards the analysis of ticket routes, it is necessary to specify how the visualisation of the data (designed by the Domestic Data Streamers) was fundamental in obtaining useful information. In fact, the simple numerical data could not be of any help since the normal statistical measures cannot alone explain the spatial trend of a visitor inside the museum. The suggestion in this case is not to give up if some data are incomprehensible given the large number of entries: sometimes the colours explain the data better than the numbers!The difficulties explained and solved so far, although interesting within the single use case, represent a general lesson applicable to any context: every kind of data, if properly used, is a gold mine of information!