Project

Duration: 06/2023 - ongoing

Project partner: Bencont

Project type: Industry research project

Principal investigator: Martin Tamajka

Bencont: Streamlining the debt collection process

Debt collection is a crucial component of a sustainable financial system. The key challenge in this context lies in managing the substantial volume of debts that require attention. Artificial intelligence and machine learning emerge as effective tools for optimizing the debt collection process, assisting sellers and service providers in obtaining at least partial compensation for delivered products or services.

Companies engaged in the debt collection area assist sellers and service providers in obtaining at least a portion of the owed amount in cases when standard communication with the customer fails.

The biggest challenge in this area is the huge amount of debts that need to be processed. In a brief period, tens to hundreds of thousands of debts may require processing. Each debt has a different probability of successful recovery — some chances are higher, others lower. An essential aspect of the work for analysts and other specialists is the prioritization of debts based on the estimated likelihood of successful collection.

Artificial intelligence and machine learning offer effective solutions for enhancing the debt collection process. Primarily, they can automate various tasks usually performed by employees, such as extracting essential data from official documents or contracts.

Additionally, machine learning has the potential to assist in prioritizing debts, reducing the human effort dedicated to unsuccessful debt recovery.

The pilot project consisted of two key components:

Addressing the issue of classifying official texts

Conducting a series of workshops on artificial intelligence and natural language processing

Classification of Official Texts

As part of the pilot project, we addressed the issue of classifying official texts (court decisions) related to the debt collection process. We introduced and evaluated various approaches based on natural language processing and language models, including the SlovakBERT model. Our primary focus was on approaches addressing the problem as a classification task and the task of determining the semantic similarity of texts.

The first solution, addressing the classification task, is based on a standard approach. We first processed the text and then proceeded to training and comparing a variety of classifiers based on transformer architecture.

The second approach, focusing on semantic similarity, initially converts input texts into “embeddings.” These embeddings are high-dimensional vectors of real numbers that carry semantic information. In essence, texts sharing similar content should be represented by vectors that are somehow close to each other (in this case, we measure their cosine similarity). Subsequently, we can use these representations to classify unknown texts by identifying the K most similar known texts and assigning a class to the unknown document based on their “voting.” The notable advantage of this approach lies in its ability to dynamically add a new class for categorizing new documents without the need to retrain the machine learning model — all that’s needed is to incorporate such texts into the set of known texts.

After the analysis and detailed comparison of both approaches, we have determined that, despite the undeniable advantages of the semantic similarity-based approach, the classifier-based approach demonstrated better performance.

Series of Workshops and Knowledge Transfer

We also organized and facilitated a series of six interactive half-day workshops within the project. These workshops were designed to support knowledge transfer and cultivate a strong proficiency in the practical application of machine learning at Bencont.

Project team

Martin Tamajka

Technology Lead

Marcel Veselý

Research Engineer

Miroslav Blšták

AI Specialist