StructAM: Enhancing Address Matching through Semantic Understanding of Structure-aware Information

Zhaoqi Zhang, Pasquale Balsebre, Siqiang Luo, Zhen Hai, Jiangping Huang


Abstract
The task of address matching involves linking unstructured addresses to standard ones in a database. The challenges presented by this task are manifold: misspellings, incomplete information, and variations in address content are some examples. While there have been previous studies on entity matching in natural language processing, for the address matching solution, existing approaches still rely on string-based similarity matching or manually-designed rules. In this paper, we propose StructAM, a novel method based on pre-trained language models (LMs) and graph neural networks to extract the textual and structured information of the addresses. The proposed method leverages the knowledge acquired by large language models during the pre-training phase, and refines it during the fine-tuning process on the address domain, to obtain address-specific semantic features. Meanwhile, it also applies an attribute attention mechanism based on Graph Sampling and Aggregation (GraphSAGE) module to capture internal hierarchy information of the address text. To further enhance the accuracy of our algorithm in dirty settings, we incorporate spatial coordinates and contextual information from the surrounding area as auxiliary guidance. We conduct extensive experiments on real-world datasets from four different countries and the results show that StructAM outperforms state-of-the-art baseline approaches for address matching.
Anthology ID:
2024.lrec-main.1333
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
15350–15361
Language:
URL:
https://aclanthology.org/2024.lrec-main.1333
DOI:
Bibkey:
Cite (ACL):
Zhaoqi Zhang, Pasquale Balsebre, Siqiang Luo, Zhen Hai, and Jiangping Huang. 2024. StructAM: Enhancing Address Matching through Semantic Understanding of Structure-aware Information. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15350–15361, Torino, Italia. ELRA and ICCL.
Cite (Informal):
StructAM: Enhancing Address Matching through Semantic Understanding of Structure-aware Information (Zhang et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1333.pdf