Enhancing Granular Data by Leveraging Geolocation Approach
Conference
65th ISI World Statistics Congress 2025
Format: CPS Abstract - WSC 2025
Keywords: granular, haversine, standardisation
Session: CPS 72 - Enhancing Data Quality and Analysis through Spatial and Geolocation Techniques
Monday 6 October 4 p.m. - 5 p.m. (Europe/Amsterdam)
Abstract
Granular data, known for its detailed-level nature, is crucial for in-depth insights and accurate decision-making. Location data, an essential component of detailed data, frequently encounters accuracy issues caused by typographical errors, incomplete data, and inconsistencies in address formats. This study tackles these challenges by utilising address standardisation and enrichment techniques on administrative data sources.
We showcase the efficacy of merging textual similarity (Levenshtein distance) with spatial distance (Haversine formula) to link records between two datasets obtained from administrative records. The methodology starts with the process of address standardisation and enrichment to resolve errors and ensure consistency, subsequently followed by geocoding to acquire geographical coordinates. The evaluation of textual similarity is conducted by employing the Levenshtein distance, whereas the estimation of spatial proximity is achieved by utilising the Haversine formula. Addresses that have a high degree of similarity in their text and are physically close to each other are considered to be matches, which improves the accuracy of location data.
The results of our study suggests that the utilisation of this geolocation method enhances the precision of location data. Through the process of standardisation, enrichment and geocoding, we are able to rectify typographical and format inconsistencies. Additionally, by combining textual and spatial matching, we are able to improve the accuracy of address matching. To summarise, this approach not only improves the accuracy of detailed data but also facilitates well-informed decision-making in different industries and fields.