64th ISI World Statistics Congress

64th ISI World Statistics Congress

Recognizing Fintechs: A graph approach. Automatically classifying companies for statistical purposes.

Author

UVK
Ulf von Kalckreuth

Co-author

  • A
    Andy Bosyi
  • M
    Maximilian König

Conference

64th ISI World Statistics Congress

Format: IPS Abstract

Keywords: artificial intelligence, classification, fintech, graph, master-data

Session: IPS 239 - Financial innovation and official statistics

Tuesday 18 July 2 p.m. - 3:40 p.m. (Canada/Eastern)

Abstract

In dealing with fintechs, we have to do without the cornerstones of traditional statistics. There are few if any standardised reporting requirements, no developed taxonomy and no established set of quantitative measures. By definition, innovation involves new activities, and this is intrinsically difficult for traditional statistics, which need stable classifications. Company registers are mostly useless for recognising fintechs. The business environment and market structures are changing rapidly. The segment is characterised by a high rate of “metabolism”: entries, mergers and acquisitions, exits. Any list of fintech companies is rapidly outdated.
Following the guidance of the final report of the IFC Working Group on Fintech Data Issues, we in-vestigate non-standard ways of collecting and organising information. Specifically, we explore the use of AI to find new fintechs and to monitor changes in the characteristics of known entities. The project aims at identifying fintechs where they are active – in the web. In a pilot carried out by the Deutsche Bundesbank in co-operation with an external developer, we use a graph approach that classifies companies on the basis of where a company is placed within a network of nodes and edges of named entities. The graph is built by scraping the websites of the firms to be classified, as well as of related entities. The results are incomplete, but encouraging.
The proof of concept is carried out on the basis of 1200 technology-oriented German companies: 400 of them fintech, the rest non-fintech. The graph embodies the relationships between companies, named entities (organisations, persons, locations) and key words. The baseline specification results in an overall rate of correct predictions of 88%, with a precision of 86% and a recall of 75%. The information set was very limited, and richer training data ought to result in better performance.
The methodology is of specific help for dealing with financial innovation, but there are potential applications to many other issues in company level statistics. It may show a way towards more informative and timely statistics using information in the public domain, without onerous new reporting requirements.