Check out my Open Domain Question Answering series for more context and technical details reading question-answering modeling.
Organizations across the legal and compliance industry have to deal with complex unstructured documents like contracts, invoices, agreements, etc., daily. Managing these documents, including searching and validating information, is essential to reduce risk and staying compliant.
Traditional search tools use simple keyword matching to find information in a document (for example, using ctrl+f in a pdf viewer like Adobe Acrobat to search or validate relevant information). However, exact keyword matching is an inefficient approach to searching for information.
Limitations of the exact keyword matching:
Today’s state of the art AI models can enable search tools to use semantics (meaning) and layout (structure) of documents to find information accurately.
How can AI models use semantics?
It is essential to map synonyms and acronyms while extracting information. However, explicitly adding rules to map synonyms and acronyms is a tedious task at scale. Today, many AI language models can understand semantic relationships as humans do. They know “apple as a company” is different from “apple as a fruit.” These language models have complex neural architecture and are trained on the large corpus to mimic how a human brain understands the language. As of November 2022, BLOOM is the world’s largest open-source language model. It has 176 billion parameters and was trained for 3.5 months on 384 A100–80GB GPUs.
In the above screenshot, I showed an example from Cognistx’s search tool, SQUARE, where a user asked, “Is DNA considered personal data?” in an Official Journal of the European Union GDPR document. SQUARE understood “DNA” is related to “genetic data” and retrieved a relevant answer, “Genetic data should be defined as personal data.”
How can AI models use layout?
Often, unstructured data like contracts, bills, and standard documents might contain some structural information in tables and forms. The location of that structured information can help to label information respectively (For example, Amount = $ 552,500 in the above figure). Models like LayoutLM can use this structured information to answer a question in a natural language form (for example, what is the base loan amount?).
An intelligent search tool can improve organizations’ productivity and reduce costs by efficiently automating the manual process of searching information across documents.
Semantic and layout-aware AI models for search can empower many use cases in the legal and compliance industry. The following describes the top use cases that can benefit from these models.
In the open search use case, a centralized repository is created to store data that is extracted from trusted sources either by crawling the internet, referring to relevant databases, or manually adding relevant documents in different formats. Users are naive and sometimes not sure what questions to ask in the repository. Users interact with a google search like interface (as shown in the above SQUARE GDPR example) and get relevant answers for their natural language queries. Often, users explore using this interface, get answers, and frame new questions based on knowledge from previous answers. This tool helps users to understand the domain better and avoid spending money on consulting domain experts. This tool can also help to avoid bias from different consultants. Also, the information in the centralized repository can be updated every day or every hour so that the information that users get is the latest.
In the constrained search use case, a template of entities is created by consulting the customer. For example, the entities for the lease contract due-diligence process include the licensee name, revenue share percentage and buy-out amount, etc. These pre-defined entities need to be extracted and validated. Often the users of constrained search are domain experts. However, manually extracting and verifying entities at scale in a limited time is not feasible. Hence, these tools help domain experts like attorneys to validate pre-defined entities in a constrained time.
I explained open and constrained search use cases for the legal and compliance domain. However, these applications can also be applied to other industries like aerospace, healthcare, information security, etc.
SQUARE is our scalable and production-ready intelligence search system that takes user queries and provides granular results across millions of documents in just a few seconds.
SQUARE powers Odds on Compliance’s PlayBook AI platform and enables its customers to ask natural language questions and get accurate granular responses for gaming compliance documents and websites across multiple states in the United States.
SQUARE powers Solvaire and enables its attorneys to avoid human dependent review process. It extracts and validates information from various lease documents to conduct its operations efficiently in a constrained time.
Add me on LinkedIn. Happy Learning!