Softology

Saving you time and money.

Clover - Identity Resolution and Matching

So, fuzzy searching: let us refer you to the left hand clover leaf labelled Identity Resolution.

This is where we tie up data from an incoming document, say a scanned invoice, to a record on a host computer, which may be a supplier from the supplier table, or an order (it’s all agnostic and driven by parameters). p>

If the information to be matched has come from an unstructured document, probably by scanning and OCR processing, we need to be able to check the extracted text from the document against the relevant host tables in a database in order to resolve the identity of the document.

This presents a whole host of problems: the addresses may not match ordinally word for word, some words may be lost in the OCR or mangled and others may just be spelled incorrectly. Depending on the quality of the incoming document, whole batches might never match with equality searching, and need approximate matching as a matter of course.

This is where Fuzzy Matching comes into the equation. It is not a new technology, but the downside is, that because it is doing so much processor intensive work, it can be quite slow, even against small target data sets, and the only way to get them matched is with a server process that might take several minutes to process a batch. So, you just sit and wait.

We focused on real time fuzzy matching with one million addresses to match against. We chose one million addresses as the test set because it exceeded the largest of our customers source tables many times over. We initially thought that if we could get to a match in 10 seconds (i.e. 1 second per hundred-thousand addresses) then we would be looking at very quick server matching or even real time identity resolution for our customers.

In testing, on a quick PC, we achieved a match against 1M in under 2 seconds. The impact on the CPU was a quick spike and it sipped RAM, even though we ran the code in one thousand threads.

This puts real-time Identity Resolution and Matching firmly on the table for Softology’s Clover Node. Mind you, we had to fall back on skills learnt thirty odd years ago to cut the code in X86 Assembly.

Development