When you are setting out on a remarkable, yet long and strenuous journey, it is a good idea to have quality guides. It will make your journey pleasant and entertaining. Thank you, KInIT.
Juraj Štít
Software Architect, Bencont
Companies engaged in the debt collection area assist sellers and service providers in obtaining at least a portion of the owed amount in cases when standard communication with the customer fails.
The biggest challenge in this area is the huge amount of debts that need to be processed. In a brief period, tens to hundreds of thousands of debts may require processing. Each debt has a different probability of successful recovery — some chances are higher, others lower. An essential aspect of the work for analysts and other specialists is the prioritization of debts based on the estimated likelihood of successful collection.
Artificial intelligence and machine learning offer effective solutions for enhancing the debt collection process. Primarily, they can automate various tasks usually performed by employees, such as extracting essential data from official documents or contracts.
Additionally, machine learning has the potential to assist in prioritizing debts, reducing the human effort dedicated to unsuccessful debt recovery.
The pilot project consisted of two key components:
As part of the pilot project, we addressed the issue of classifying official texts (court decisions) related to the debt collection process. We introduced and evaluated various approaches based on natural language processing and language models, including the SlovakBERT model. Our primary focus was on approaches addressing the problem as a classification task and the task of determining the semantic similarity of texts.
The first solution, addressing the classification task, is based on a standard approach. We first processed the text and then proceeded to training and comparing a variety of classifiers based on transformer architecture.
The second approach, focusing on semantic similarity, initially converts input texts into “embeddings.” These embeddings are high-dimensional vectors of real numbers that carry semantic information. In essence, texts sharing similar content should be represented by vectors that are somehow close to each other (in this case, we measure their cosine similarity). Subsequently, we can use these representations to classify unknown texts by identifying the K most similar known texts and assigning a class to the unknown document based on their “voting.” The notable advantage of this approach lies in its ability to dynamically add a new class for categorizing new documents without the need to retrain the machine learning model — all that’s needed is to incorporate such texts into the set of known texts.
After the analysis and detailed comparison of both approaches, we have determined that, despite the undeniable advantages of the semantic similarity-based approach, the classifier-based approach demonstrated better performance.
We also organized and facilitated a series of six interactive half-day workshops within the project. These workshops were designed to support knowledge transfer and cultivate a strong proficiency in the practical application of machine learning at Bencont.
Juraj Štít
Software Architect, Bencont