Automatic Consistency Checking of Table and Text in Financial Documents
- A company's financial documents use tables along with text to organize the data containing key performance indicators (KPIs) (such as profit and loss) and a financial quantity linked to them. The KPI’s linked quantity in a table might not be equal to the similarly described KPI's quantity in a text. Auditors take substantial time to manually audit these financial mistakes and this process is called consistency checking. As compared to existing work, this paper attempts to automate this task with the help of transformer-based models. Furthermore, for consistency checking it is essential for the table's KPIs embeddings to encode the semantic knowledge of the KPIs and the structural knowledge of the table. Therefore, this paper proposes a pipeline that uses a tabular model to get the table's KPIs embeddings. The pipeline takes input table and text KPIs, generates their embeddings, and then checks whether these KPIs are identical. The pipeline is evaluated on the financial documents in the German language and a comparative analysis of the cell embeddings' quality from the three tabular models is also presented. From the evaluation results, the experiment that used the English-translated text and table KPIs and Tabbie model to generate table KPIs’ embeddings achieved an accuracy of 72.81% on the consistency checking task, outperforming the benchmark, and other tabular models.
Document Type: | Article |
---|---|
Language: | English |
Author: | Syed Musharraf Ali, Tobias Deußer, Sebastian Houben, Lars Hillebrand, Tim Metzler, Rafet Sifa |
Parent Title (English): | Proceedings of the Northern Lights Deep Learning Workshop |
Volume: | 4 |
Number of pages: | 9 |
ISSN: | 2703-6928 |
URN: | urn:nbn:de:hbz:1044-opus-65950 |
DOI: | https://doi.org/10.7557/18.6816 |
Publisher: | Septentrio Academic Publishing |
Place of publication: | Tromsø, Norway |
Publishing Institution: | Hochschule Bonn-Rhein-Sieg |
Date of first publication: | 2023/01/23 |
Copyright: | Copyright (c) 2023 Syed Musharraf Ali, Tobias Deußer, Sebastian Houben, Lars Hillebrand, Tim Metzler, Rafet Sifa. This work is licensed under a Creative Commons Attribution 4.0 International License. |
Keyword: | deep learning; natural language processing; text mining |
Departments, institutes and facilities: | Fachbereich Informatik |
Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) | |
Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 005 Computerprogrammierung, Programme, Daten |
Entry in this database: | 2023/02/01 |
Licence (German): | Creative Commons - CC BY - Namensnennung 4.0 International |