Azure Search Power Skills
Power Skills are a collection of useful functions to be deployed as custom skills for Azure Cognitive Search. The skills can be used as templates or starting points for your own custom skills, or they can be deployed and used as they are if they happen to meet your requirements. We also invite you to contribute your own work by submitting a pull request.
Skills
This project provides the following custom skills:
Skill | Description | Type | Language | Environment | Deployment |
---|---|---|---|---|---|
GeoPointFromName | retrieves coordinates from place names and addresses. | Geography | ARM Template | ||
AcronymLinker | provides definitions for known acronyms. | Text | ARM Template | ||
Anonymizer | Uses Presidio to analyze and anonymize PII entities. | Text | Manual | ||
BingEntitySearch | finds rich and structured information about public figures, locations, or organizations. | Text | ARM Template | ||
CustomEntityLookup | finds custom entity names in text. A custom skill implementation of the custom entity lookup skill, consider using in the cognitive skill instead of this custom skill implementation. | Text | ARM Template | ||
CustomNER | extracts your custom entities, using Natural Language Processing with Text Analytics Custom NER | Text | ARM Template | ||
CustomTextClassifier | extracts your custom text classification, using Natural Language Processing with Text Analytics Custom Text Classification | Text | Arm Template | ||
Distinct | de-duplicates a list of terms. | Text | ARM Template | ||
Summarizer | Uses a HuggingFace/FaceBook BART model to summarize text BART-Large-CNN. | Text | Manual | ||
TextAnalyticsForHealth | A wrapper for the Text Analytics for Health API | Text | ARM Template | ||
TextQualityWatchdog | Uses a pretrained language model to detect low quality text extracted during document cracking | Text | Manual | ||
Tokenizer | extracts non-stop words from a text. | Text | |||
AbbyyOCR | OCR to extract text from images using ABBYY Cloud OCR. | Vision | ARM Template | ||
FormRecognizer | Use Form Recognizer to analyze a document. Form Recognizer skill supports the following model types Layout, Invoice, Receipt, ID, Business Card, General key value pairs, Custom Form | Vision | Manual | ||
AutoMLVisionClassifier | Gets your latest Data Labelling AML AutoML Vision model and runs inference on it | Vision | Manual | ||
CustomVision | classifies documents using Custom Vision models. | Vision | ARM Template | ||
HocrGenerator | transforms the result of OCR into the hOCR format. | Vision | ARM Template | ||
ImageClustering | Uses clustering to automatically group and label images | Vision | Manual | ||
ImageSegmentation | Breakdown a full image or PDF page in subimages and upload them on Azure Blob Storage | Vision | Manual | ||
ImageSimilarity | Uses ResNet to find the top-n most similar images | Vision | Manual | ||
P&ID Parser | Extracts equipment tags and text blocks from piping and instrumentation diagrams | Vision | Manual | ||
DecryptBlobFile | downloads, decrypts and returns a file that was previously encrypted and stored in Azure Blob Storage. | Utility | ARM Template | ||
GetFileExtension | returns the filename and extension as separate values allowing you to filter on document type. | Utility | ARM Template | ||
ImageStore | Stores and fetches base64-encoded images to and from blob storage. The knowledge store is a cleaner implementation of the pattern to save images to storage. | Utility | ARM Template | ||
Open AI Embeddings | Generates vector embeddings through the Azure Open AI service | Vector | Manual | ||
HelloWorld | A minimal skill that can be used as a starting point or template for your own skills. | Template | ARM Template | ||
PythonFastAPI | A production web server and api scaffold for a python power skill | Template | Terraform template |
Getting Started
Prerequisites
In order to use the functions in this project, you'll need an active Azure subscription. Most of the functions can be used on their own for quick evaluation and experimentation, but they are meant to be used as part of an Azure Cognitive Search pipeline. Each function may also add its own specific requirements, such as API keys for services they leverage.
Visual Studio 2019 is recommended, but not required. You need a recent version of the C# compiler. Postman is highly recommended as a way to experiment and test skills.
Installation and deployment
If using Visual Studio with the Azure workload installed, no installation is required, and the functions can just be run locally using F5.
Deployment of a function to Azure can be done through Visual Studio, the Deploy to Azure button, or continuous deployment.
Some functions may require setting environment variables or configuration entries. Please refer to the readme file in the function's directory.
Quickstart
- Clone the repository
- Open the PowerSkills solution in Visual Studio
- Set the project for the function to test as the startup project
- Hit F5
- Experiment with calling the function using Postman
You can also create your own skills using our Hello World template skill as a starting point or if you are using python our FastAPI template skill.
Up for grabs
Here are a few suggestions of simple contributions to get you started:
- Improve documentation: sample code, better documentation are great ways to improve your understanding of existing code and to help other do the same.
- Configuration: some skills can be configured through application settings and environment variables. Some others still have hard-coded configuration in the code, that could be moved to be easier to configure.
- For skills that rely on an external Azure resource (such as Bing Entity Search), improve the deployment file so it gives the user the option to create and configure that service automatically.