Visualizing 2018 lombok earthquake in Indonesia using crowdsouring data: how people experience it

Along with the development of science and technology, using big data, map makers can take advantage of crowdsourcing social media data on Twitter to obtain user location when uploading tweets, which can be called geolocated tweets. Earthquakes that occur very often in Indonesia often grab people's attention, especially netizens who use social media like Twitter. One of the major earthquakes that occurred in Indonesia in 2018 was the Lombok earthquake, which occurred twice in a row from July to August 2018. Using Twitter data, information and social responses related to the 2018 Lombok earthquake can be obtained, which can be used as evaluation material for public handling and responding. The information is then visualized in various forms, and one of the best visualization methods is selected. This study uses Twint package in Python as a way of obtaining location data from Twitter. The method used to collect Twitter data is a case study on the social impact of the Lombok earthquake in Indonesia in 2018. The data observation method used is simulation on several types of map visualization and survey methods in selecting the best type of visualization. The method of analysis used is by mapping the data on the number of tweets as the main object using various types of maps, as well as calculating survey results by scoring each group of questions. The results


Introduction
There are six known data acquisition sources for making maps, including terrestrial surveys, statistical data, aerial photographs, remote sensing imagery, existing maps, and census data.However, technological developments in the current digital era have given birth to new sources of spatial data acquisition, one of which is big data.Big data is a term for online database providers with a broad scope.The advantage of having big data is that it has characteristics known as 5V, namely volume, variety, veracity, velocity, and value (Hadi et al., 2015).One of the most widely accessed sources of big data is social media.Social media is a source of information from people expressing opinions via the internet or crowdsourcing (Hargrave, 2020).Around 63 million Indonesians are internet users, and 95% are used to accessing social media (Kominfo, 2013).Twitter receives more than 500 million tweets daily, with about 80% of them using mobile devices (Carley et al., 2016).Twitter has a geographic function with geotagging to obtain location information in longitude and latitude coordinates (Kryvasheyeu et al., 2016).Indonesia's position is on the meeting point of 3 tectonic plates, namely the Indo-Australian Plate, the Eurasian Plate, and the Pacific Plate, so it is an area prone to earthquakes.The Head of the 2018 Earthquake and Tsunami Center of the BMKG (Meteorology, Climatology and Geophysics Agency) said that the 2018 earthquakes experienced a drastic increase of 4,648 tectonic earthquake events from the previous year (Umasugi, 2018).BMKG also noted 23 damaging earthquakes in Indonesia in 2018 (BMKG Earthquake and Tsunami Center, 2018).The earthquake on Lombok Island, West Nusa Tenggara, was one of the worst earthquakes in 2018 (Widowati, 2019).This event has a unique and successive occurrence that causes various impacts.After the first earthquake occurred, several aftershocks were of even greater strength.There were several large earthquakes within a short period, namely on July 29 with a magnitude of 6.4 Mw; August 5 with a magnitude of 7 Mw; August 9 with a magnitude of 5.9 Mw; August 19 occurred two earthquakes measuring 6.3 Mw and 7 Mw; and August 25 with a magnitude of 5.5 Mw.The BMKG recorded that of the entire Lombok earthquakes in 2018, six earthquakes with a magnitude of more than 5.5 Mw and a total of 2,000 earthquakes, both felt and not felt (Zulfakriza, 2018).Earthquake recordings can be obtained from physical and social sensors (Kropivnitskaya et al., 2017).There are only 175 physical sensors in the form of earthquake measuring devices (seismographs) in Indonesia from a total of 183 BMKG stations (BMKG n.d., CNN Indonesia 2019).Besides that, social censors are witnesses of those who experienced the earthquake themselves (Kropivnitskaya et al., 2017).Community responses from social media users such as Twitter can be classified as a data source from social censorship.Physical impacts can quickly be observed and interpreted through remote sensing imagery, in contrast to other impacts, such as the social impact of earthquakes, which cannot be observed visually.The map is one of the best forms of data visualization to describe the spatial distribution of data.Geovisualization is a way or technique of visually representing spatial data at a geographic scale to explore, analyze, and synthesize spatial data (Longley et al., 2015).Geovisualization is one of the keys that are well used for compiling data sets for information delivery media, especially for large amounts of data such as big data (Dharmawan, 2017).New data acquisition sources, such as big data, will be assessed for their ability to extract geospatial data.Spatial data obtained from social media in the form of many point data sets is one of the challenges in data visualization so that new information and knowledge can be obtained from the data.Therefore, this study will utilize data acquisition sources from social media crowdsourcing Twitter to map the social impact of the 2018 Lombok earthquake.The best type of visualization for mapping point data sets from big data will be sought to display spatial data by adjusting to data characteristics and information conveyed.

Methods
This study uses two different data collection methods.Case study method for map data sources and sampling method for the best visualization method.The method of observing data in this study was carried out using a simulation method on various maps resulting from Twitter data visualization and a survey method in choosing the best visualization method.The observations and data acquisition results were then analyzed qualitatively and quantitatively, namely by visualizing each tweet location from netizens with quantitative maps but qualitatively using various types of maps, then selecting the best visualization results using a quantitatively calculated questionnaire.

Tools and materials
1. Computer devices connected to the internet 2. Python 3.8 software with Twint and pandas packages 3. Microsoft Excel software 4. ArcGIS software 5. Vector data of Indonesian administrative boundaries per province and district 6. Web site qualtrics.com

Object of research
This research was conducted to map the social impacts of earthquake events for the Lombok earthquake case study from 29 July 2018 to 25 August 2018.The objects to be mapped include several categories of community tweets related to social impacts due to the earthquake.The first category is the concern response and the Indonesian people's aid movement for the victims of the Lombok earthquake.The second category is the response of empathy and sympathy from the Indonesian people for the victims of the Lombok earthquake.The third category is complaints about the social impacts of the Lombok earthquake within the earthquake coverage area by the people who were directly or indirectly affected.Tweets related to care and assistance responses that will be taken include fundraising movements, donations, distribution of aid, volunteer movements, and social services.Tweets related to empathy and sympathy responses were taken in the form of expressions of feelings and concern, whether involving shared emotions or not, including condolences, pity, sorrow, and prayers for the victims of the Lombok earthquake (Dian, 2017).Tweets related to the social impact of the earthquake will be taken, are related to several aspects, such as household conditions, local economic conditions, social and cultural functions, tourism conditions, political conditions, education, and community conditions.

Research sites
The scope of the selected research area covers all of Indonesia because tweets related to caring, sympathy and empathy responses can come from all over Indonesia.The coordinate boundaries of the mapped research area are polygons 94.674264° East to 141.322472° East and 6.543635° North Latitude to -11.267858°South Latitude.

Twitter Data Extraction
Data acquisition in this study was carried out by extracting (scraping) Twitter data in a case study of the 2018 Lombok earthquake disaster.The data to be extracted is the distribution of the location of community tweets regarding the earthquake incident related to the response of care, sympathy, empathy, and assistance, as well as location distribution.Community tweets about the social impact of the earthquake.The time range for the tweets taken was from the first earthquake to the end of the year, namely July 29 2018, to December 31 2018.Twitter social media data was scraped with the Twint package (Twitter Intelligence Tool) in Python 3.8.The scraping system on Twint is carried out by utilizing search operators on the Twitter website to search for data based on the required categories, such as users, tweets about specific events, hashtags, and followers (Irekponor, 2019).Twitter data extraction from the install package to the scraping process is done by writing the required Twint script lines on the Windows command line, namely the command or cmd prompt.The computer device must be connected to the internet.Configuration on Twint is required to search for the specific tweets required.The main argument used is "--search" to search for tweets based on keywords.Another required parameter is a timeout which can be set with the argument "--since" to specify the initial timeout and "--until" to specify the last timeout.In order to get tweets with the geo-tagging feature, an area coverage limitation argument is needed with the argument "--geo" or "-g," which contains the parameter coordinates and the area coverage radius from the coordinate centre point (latitude, longitude, km).The argument for storing data obtained from the data collection process is "--JSON" for the JSON format.Data downloaded in JSON format needs to be converted into CSV format so that the data is easily read and further processed.The data conversion process is carried out using Panda's package in Python.The command to read the JSON file is "read_json", while the command to convert the panda's data frame to CSV is "to_csv".

Spatial Data Mining
Spatial Data Mining (HR) is a big data processing stage in KDD (Knowledge Discovery Data), including selection, cleaning, pre-processing, transformation, and analysis to produce hypotheses and knowledge (Mennis & Guo, 2009).The data selection stage involves reselecting the tweets collected with the appropriate keywords.Tweets about the Lombok earthquake related to caring, sympathy and empathy responses can be selected using various keywords that state this, such as "donation", "assistance", "support", "sharing", and "condolences".The data cleaning stage that is carried out is to eliminate data duplication and remove unnecessary data table attribute columns.The data pre-processing stage is carried out by rearranging the coordinate components so that they are more organized and can be read by the system that transforms the data.Latitude and longitude coordinate values are rearranged in separate columns.The data transformation stage is done by converting table data into spatial data using ArcGIS.

Data Visualization
Location data related to earthquake events and the social impacts of earthquakes in point dimensions are visualized using five different types of maps, including choropleth maps, proportional symbol maps, dot maps, hexagonal tesselation maps, and heat maps.On choropleth maps, proportional symbol maps, and dot maps, the selected mapping units are administrative districts of Indonesia.Choropleth maps are visualized using colour gradation symbols per mapping unit.The number of tweets per district is classified into five classes using geometric intervals.The proportional symbol map is visualized with a circle symbol proportional to the number of tweets per district.The radius of each proportional circle is calculated using formula (1) Dot maps are visualized with dot symbols representing the number of tweets within each district.One dot represents a specific number, so the dot symbol is repeated in each district according to the number of tweets represented.The hexagonal tessellation map is visualized using a hexagon-shaped tessellation.The size of the hexagon cell in the tessellation used is 32.5 kilometres, or the same as a hexagon with an area of 1372.1 km2.The tweet dot data related to the social impact of the earthquake used is then overlaid with the tessellation cells so that the number of tweets per hexagon cell is obtained.The number of tweets per hexagon is classified into five classes with geometric intervals and then visualized with colour gradations in each class.The heat map is visualized using the analysis of the density clustering method, namely Kernel Density.According to Ivan and Horak (2015) and Anderson (2009), kernel density is one of the primary methods that can be used to create visualizations with heat maps for data points that overlap each other (Netek, 2018).This method is part of data interpolation.The calculation of the estimated density at Kernel Density is calculated by the formula (2), namely: The output obtained is a raster with the interpolated pixel values represented by colour gradation symbols for each pixel value.

Best Visualization Selection
The best visualization is selected to determine which type of visualization is more appropriate for point data with a large capacity, especially related to the social impact of the 2018 Lombok earthquake.Parameters can be used to support the selection of the best visualization, namely what visualization is more appropriate, what kind of visualization has the best appearance, and what kind of visualization is more accessible for map users to read and understand.The best visualization is selected by considering the results of the choice, reading, and understanding of the map by map users through a questionnaire survey.Filling in the questionnaire is done using a web-based method because it is more convenient, cheap, fast, can provide multimedia displays, can be done anywhere (mobile), and reduces the data input stage (Fraenkel, Wallen and Hyun 2012).One of the websites providing free survey creation services used in this study is Qualtrics.com.Determination of respondents who will fill out the questionnaire is divided into two groups, namely groups of people with cartographic understanding and groups of ordinary people in general.The specified age limit category for respondents is adults, namely 17-65 years, because they are considered to be able to think critically so that answers can be accounted for.The sampling technique is convenience sampling, based on the respondents researchers can reach.This technique was chosen because the total population of map users is unknown, making it very difficult and impossible to use a random sampling technique (Fraenkel, Wallen and Hyun 2012).Three aspects will be assessed, namely related to the visual appearance of the map, including symbolization, figure-ground concept, and overall appearance on the face of the map, legend, and map edge information; the user's reading and understanding of map contents includes reading quantitatively and qualitatively; as well as the best visualization order according to map readers.Respondents were asked to rate with a score of 1 to 5 on each question for each map.

Result and Discussion
There are 31 keywords selected for scraping Twitter data based on words related to the Lombok earthquake disaster and the resulting social impacts.All keywords used are as follows.
The time limit for the tweets used is from one day before the first day of the Lombok earthquake, namely July 26 2018, to the end of 2018, December 31 2018.Parameters in the two arguments are written in the format yyyy-mm-dd (year-month-date).The total number of tweets obtained from scraping results is 11,584 tweets.The raw Twitter data obtained cannot be directly used for visualization and analysis because there may be noise and an inappropriate data structure.Different processes are needed in the form of data selection, cleaning, pre-processing, and transformation to be visualized and analyzed.This activity is included in the HR (Spatial Data Mining) stage.Data selection (filtering) used several words related to the community's concern, sympathy and empathy for the 2018 Lombok earthquake, referring to some of the words that appear most frequently in Table 1.needs to be rearranged in the data preprocessing stage so the data structure is according to needs.Latitude and longitude coordinate data are separated from other components and rearranged in two different columns.The data transformation stage is done by changing the coordinate text data into spatial data to be visualized as a map.Data visualization in the form of maps is compiled using choropleth maps, proportional symbol maps, dot maps, hexagonal tessellation maps, and heat maps.The map title is "Map of Concern, Sympathy, and Empathy of the Indonesian People's Response to the 2018 Lombok Earthquake Disaster Based on Twitter Data".The selected colour symbol is a purple gradation from light to dark, indicating the difference in the number of tweets.The tweet count value increases from light purple to dark purple.The tweet interval class visualization does not include areas with zero tweets (0) or no data.The map layout used is the same on every map.However, because the district mapping unit is quite detailed, it can represent the location of the tweets quite well.However, districts with tiny areas cannot be seen clearly.The proportional symbol map depicts the general location of tweets on district centroids with the proportional circle symbol.A small district area with many tweets produces a prominent circle symbol covering the district and the districts around it.The appearance on the map is quite complicated because the symbols overlap.The dot map depicts the distribution of tweets in each district, with each dot representing 20 tweets.The symbols on the dot map cannot be seen clearly and are difficult to calculate because they are too small.However, if the dot size is enlarged, more is needed if it is included in a small district with many tweets so that they will overlap.The heat map depicts the absolute location of the tweets well.The resulting interpolation represents the density of the number of tweets at each location, but the absolute value of the number of tweets cannot be known.This is caused by the value obtained from the kernel density is the pixel value calculated by the algorithm, so it is not easy to understand.The hexagonal tessellation map represents the location of tweets that are spread evenly with regular tessellations.Location bias persists, influenced by the cell size used.The five types of visualization were compared by respondents using a questionnaire.The respondents obtained were 86 people, of which 55 people had a cartographic background, and 31 of them did not have a cartography background.The gender of respondents consisted of 41 times males and 45 females.Most cartographic respondents are university students, especially those majoring in Cartography and Remote Sensing at Gadjah Mada University.Others are students, employees/employees, freelancers, and other jobs.Questions related to the visual appearance of the map face yielded answers, as shown in Graph 1. Non-cartographic respondents gave the most value, 5, for the choropleth map and the most value, 4, for the proportional symbol map.The most values are 1 and 2 on the dot map.Questions related to the visual appearance of the map legend yielded almost the same answers between the two groups of respondents, as shown in Graph 2. Almost all of the respondents' answers gave a value of 5.According to cartographic and non-cartographic respondents, the highest score of 5 was for choropleth maps.An assessment of the appearance of the map composition as a whole is shown in Graph 3. Most respondents scored 5 and 4, with the most value of 5. Therefore, the composition of the map layout is reasonable enough and appropriate.
Graph 1. Bar graph of the total comparison of answers related to the visual appearance of the map face Graph 2. Bar graph of the total answer comparison in relation to the visual appearance of the map legend Graph 3. Bar graph of total assessment of map composition as a whole The results of a qualitative assessment of map reading and understanding by respondents can be seen in Graph 4. According to non-cartographic respondents, choropleth and proportional symbol maps are qualitatively easy to read.According to cartographic respondents, choropleth maps and heatmaps are the easiest to read qualitatively.The easiest-to-read assessment of reading and understanding quantitatively by noncartographic respondents was the choropleth map, the proportional symbol map and the heat map.Cartographic respondents can easily read and understand the choropleth map; the heat map is in the second position.The overall assessment can be seen in Graph 5. Questions about the conclusion of the assessment between maps generate answers in the form of rankings given for each map.Rank 1 indicates that the selected map is the best and most appropriate while ranking five indicates that the selected map is the worst and inappropriate.Most cartographic and non-cartographic respondents rated the choropleth map at rank 1, while the dot map at rank 5.The complete assessment can be seen in graphs six and graphs 7. The sum of all scores scored by respondents can be seen in graph 8.The total score by the two groups of respondents shows that the choropleth map is the most suitable type of map visualization for point data collection, especially Twitter data related to the social impact of the Lombok earthquake in Indonesia in 2018.Map dot is the least suitable type of visualization for point data sets especially Twitter data on the social impact of the Lombok earthquake in Indonesia in 2018.

Conclusions
Twitter data crowdsourcing can extract spatial data related to the social impact of the Lombok earthquake in Indonesia in 2018 through data scraping using the Twint package in Python.Five types of visualizations to show the density of dots that can be used to visualize the social impact data of the Lombok earthquake in Indonesia in 2018 from Twitter data crowdsourcing are choropleth maps, proportional symbol maps, dot maps, hexagonal tessellation maps, and heat maps.The five maps have different characteristics and ways of symbolizing their strengths and weaknesses.The Choropleth map is the best type of visualization chosen by 86 selected respondents.The dot map is the most inappropriate type of visualization, according to the 86 selected respondents.

Suggestion
Further studies are needed on acquiring spatial data through social media or other acquisition methods.Following the development of existing technology, you can choose other types of visualization that are more varied and interesting besides digital static maps.
The selection of respondents with an equal population needs to be expanded so that the assessment results can represent the wider population.

Figure 2 .
Figure 2. Results of scraping at the command prompt

FigureFigure 8 .
Figure 4. Choropleth Map Bar graph of comparison of total answers related to qualitative reading and understanding Graph 5. Bar graph comparison of total answers related to quantitative reading and understanding Bar graph of total comparison giving ranking to questions about the conclusion of assessment between maps by non-cartographic respondents Graph 7. Bar graph of total comparison giving ranking to questions about the conclusion of assessment between maps by cartographic respondents Types Selected in Rank 1-5 by Non-Cartographic Respondents Choropleth Map Proportional Symbol Map Dot Map Hexagonal Tessellation Map Heat Map Graph 8. Bar graph of the total score of the conclusion of the assessment between maps by cartographic and non-cartographic respondents Graph 9. Bar graph of the total score of all comparisons between maps

Table 1 .
Words related to the Lombok earthquake with the most mentions in SDM is data cleaning, which eliminates noise in the form of duplication of data and unimportant table attribute columns.After the selection process and data cleaning, the number of final tweets was 2032 tweets out of 11,584 tweets.Clean tweet data ://www.bmkg.go.id/profil/stasiun-upt.bmkg Carley, K. M., Malik, M., Landwehr, P. M., Pfeffer, J., & Kowalchuck, M. (2016).Crowd sourcing disaster management: The complex nature of Twitter usage in Padang Indonesia.