At long last, organizations have a really powerful tool that boosts employees’ efficiency by speeding up their content search experience.
In the previous article in this series, Dawid discussed a great number of benefits of implementing a Knowledge Mining solution. This time, I would like to explain how Azure services make it work wonders.
Before we discuss the architecture specifics, we need to be clear on how the process works. Without further ado, let’s explain its three basic steps.
In the previous article on AI-driven search, we have discovered that using Knowledge Mining you can extract information from structured and unstructured data stored in different sources. To do this, the solution uses a range of pre-trained and custom AI services.
In a nutshell, Knowledge Mining works by orchestrating the entire enrichment pipeline. There are three main phases to the process. We call them:
Here is a visualization of this structure:
Knowledge Mining orchestration steps (source: Microsoft)
What does each of this phases do? Let’s go over them one by one.
Ingestion is the first phase of Knowledge Mining. Here, both structured and unstructured data is processed.
What is the difference between these two types of data?
Structured data has a defined data model, and it typically resides in a relational database like Azure SQL. Unstructured data, on the other hand, does not have a predefined data model. It can come from sources such as NoSQL databases or file stores. File types include PDFs, images, Word documents, PowerPoint presentations, and more.
What about data ingestion? It is a process in which raw data, structured or unstructured, coming from various sources, is aggregated into a persistent, centralized data store.
At the end of the ingestion phase, documents are cracked. In other words, the solution extracts or creates text content from non-text sources. In this process, optical character recognition (OCR) is very useful. It proves especially helpful when we want to extract data from images or PDF files.
Enrichment is the second phase of Knowledge Mining, utilizing AI. In this process, the solution identifies patterns, obtains information, and gains understanding from texts coming from images, PDF files, and other unstructured data sources.
Knowledge Mining performs enrichment on individual documents as a sequence of calls to AI models. What’s interesting, you can use Azure cloud-based AI services, but you may also build your own custom models, which you will use in this phase.
In the final stage, Knowledge Mining exposes the newly enriched, structured documents. They are now ready for exploration and analysis.
During exploration, the solution reviews the added enrichment to learn more about the collected data. The results are then available via search indexes or end-user and line-of-business applications, such as customer relationship management (CRM) or enterprise resource planning (ERP) systems.
After exploration, it’s time for analysis. Typically, this process involves applying analytics tools, such as Power BI. They serve to explore and gain a deeper understanding of the gathered information.
Let’s talk about the role the Azure cloud plays in building an intelligent search solution.
Microsoft Azure provides a number of useful services which make the solution work smoothly and effectively. Here are some examples.
Azure Cognitive Search is a search-as-a-service solution. It gives developers the tools to provide a rich search experience.
Let me give you an example. Imagine a mobile app that you use to shop. You are looking for a specific product, so you immediately use the search box. There, you have a list of similar products. So, you decide to apply filtering by price. Still, the list of products is endless and your search experience leaves a lot to be desired.
You know very well that implementing your own search engine can be time-consuming. In such cases, Azure Cognitive Search speeds up the whole process and increases the quality of the search experience.
In general, there are two basic approaches that Azure Cognitive Search uses to ingest data and populate an index. We call them pull data and push data. Dawid has already touched upon them in the previous post. Let me just remind you briefly how they work.
In this case, Azure Cognitive Search pulls data into the index from supported Azure data sources, such as:
The second approach is quite different. The push model relies on custom applications to send documents directly into a search index. This is done programmatically. Applications can use either Azure Cognitive Search REST API or Azure Search SDK for .NET to send data into the index.
When you use Azure Cognitive Search to build a Knowledge Mining solution, there is a wide range of pre-trained services in the Azure cloud that you can integrate. We call them Microsoft Cognitive Services. They serve e.g. to apply PDF file scanning using OCR, and to extract relevant content.
Here are some examples of pre-defined Azure Cognitive Services that a Knowledge Mining solution can use during the enrichment phase:
To see how these services work, you check out the Microsoft website for a number of demos.
You can try the Face API for yourself here. You simply submit an image and the service detects the faces on it.
If you would like to check how Form Recognizer API extracts data from documents, then a useful visualization is available here.
The Azure Cognitive Search architecture is extensible, so it allows you to assemble an enrichment pipeline from both predefined and custom cognitive skills.
The custom skills I’ve just mentioned may prove useful for many different organizations, as they provide a way to insert transformations that are unique to your content.
A custom skill executes independently. It applies any enrichment step we require. A good example would be data extraction from Word document tables.
Should you decide to apply custom skills for the AI enrichment phase, Azure Functions are ideal for implementing them.
In the enrichment process, the solution can call Azure Functions, which in turn call other Cognitive Services, for example Form Recognizer. This service will analyze document content, and the solution will then pass the results to Azure Cognitive Search.
Below, you’ll find a sample architecture which presents custom skills in use.
Intelligent search using Azure Functions
The image shows an Azure Function that calls the Form Recognizer API to perform two tasks: first, analyzing form document content, and then inserting the results to the Azure Cognitive Search index.
Together with my team at Predica, we delivered a Knowledge Mining solution to one of our clients. It was quite a big project related to extracting content from aircraft technical documentation to help users respond to technical queries and service requests.
Our goal was to develop a quick and effective tool that would allow the users to quickly find potential answers and solutions through a web app user interface.
As we have already had some positive experience with Knowledge Mining and Azure services, the answer was clear from the very start. We used the services I described earlier to create an intelligent search service.
First, we implemented Azure Cognitive Search, combined with Azure Cognitive Services. To answer project requirements, we also added Form Recognizer which would localize different parts of the forms.
Using Azure Functions, we developed custom skills. We also used Azure Cosmos DB to keep the additional configuration values.
Finally, all source files were uploaded to the Azure Blob Storage, which provides a single source of data for the search engine.
Below you’ll find the architecture diagram.
If you would like to find out more about the solution we implemented, then keep an eye on our blog. Soon, there will be a dedicated article where I will provide more details.
Creating a Knowledge Mining solution certainly poses some challenges. It may be especially tricky if there are a lot of documents in many different formats.
This is why I recommend using the Azure cloud when implementing the solution. That’s because with the help of Azure Search and other Microsoft Cognitive Services, you can build the required service faster and more efficiently.
Begin with the three process stages I outlined at the beginning of this article. Then, add custom skills and required cloud services to process and analyze your data. With a single solution, you can fight off inefficient work practices for good.
Interested in what Knowledge Mining can do for you? Book a free demo!
Read similar articles