A goal of Financial Crime Investigations is to identify activities that deviate from the ordinary. Unlike most other forms of crime analysis, financial data rarely offers a “smoking gun.” On the surface, illicit activity often mimics legitimate transactions. There may be nothing wrong with a transfer to a specific country or a transaction to a particular account. Logging in from a different continent might appear routine.
What matters is recognizing when something is out of place. To do this, one must first understand what “normal” looks like. What is normal for one customer may not be for another, and norms also vary by industry. Additionally, what was considered abnormal last year may be entirely routine this year.
The methods for identifying anomalies are constantly evolving. How many investigations have been solved simply because someone noticed an odd detail that connected the dots? It’s tempting to think that AI and Large Language Models (LLMs) can replicate this intuition by processing vast amounts of data. And perhaps they can, but simply launching an LLM like Ollama’s llama3.1:8b and expecting it to analyze the financial activity of a carpenter client will likely lead to disappointment.
For starters, an LLM doesn’t possess inherent knowledge of your customer’s financial activities — you must provide the data along with your question. Today, many rely on Retrieval Augmented Generation (RAG) to enhance these models. The idea is to pinpoint the most relevant parts of your data and incorporate them into the query sent to the LLM.
However, applying RAG to an entire database can be equally underwhelming. First, it can take an impractically long time to index everything. Moreover, vector databases used for RAG struggle to represent relationships between data. While metadata helps, this approach can feel like an attempt to recreate the structure of a traditional SQL database. The result is often an imprecise search, leading to irrelevant context and equally irrelevant answers.
For the use of AI in Financial Crime Investigations, a critical step is missed at the outset: identifying all relevant data for an entity or a set of entities. By “entity,” I refer to customers, accounts, and transactions — not just individual records, but every Know Your Customer (KYC) response, every login location, and every transaction of certain types or from specific regions, etc. It requires two sets of data: one representing the current activity being examined and another serving as a baseline for comparison. The latter often represents normal behavior, such as last year’s activity or that of similar customers, while the former is scrutinized for anomalies.
Once these datasets are prepared, they can either be fully sent to the LLM (if they fit within the model’s context window) or processed using RAG to focus on the most relevant parts of already relevant data. The LLM should then be tasked with assisting in comparing these datasets. The user defines which types of comparisons matter — such as transaction frequency, size, and counterpart relationships.
Some examples of useful comparisons include:
- Comparing customer transactions from the past 0–12 months to those from 12–24 months
- Comparing customer counterpart countries to those of customers with a similar NACE code
- Comparing customer counterpart countries and transaction amounts to the latest KYC submission
Even smaller models like llama3.1:8b can offer valuable insights if they are fed the right data, properly normalized for analysis.
If you are curious about what this looks like in action, I invite you to check out Convier, a company I co-founded.