Giter VIP home page Giter VIP logo

functionappcustomskillcogsearch's Introduction

Cognitive Search Index Setup

This repo will provide you with 2 search indexes. One search index includes the entire document, the second index includes a vector type and chunks of the document.

The concept here is to leverage BM25 for searching on the entire document, and then allowing a developer to filter the search for chunks based on the most approroate BM25 serach index.

Chunking is cool, but cosine simularity will only take you so far.

BM25 takes into account the length of the entire document. Use the first search index to determine the correct documents, then leverage hybrid search filtering on the documents specified by your initial search.

The first index will have the name that is in your .env file, "COG_SEARCH_INDEX" The second index will be the name that is in your.nev file "COG_SEARCH_INDEX" with "-vector" added to the end of it.

Azure Cognitive Service Indexers manage updates/inserts and now deletes (in preview to your index).

The following code will handle:

  1. Inserts/Updates to your index.
  2. In the event that a file is deleted from your index, an additional function is required to handle deleting from the second index. This is not yet implemented. This will be a function triggered when a delete occurs from blob storage to delete the item from the second index.

Cog Search Index Configuration

This repo holds 2 items.

  1. An Azure Function to be leveraged as a custom skill for Cog Search
  2. A notebook for configuring your Azure Cognitive Search index.

After cloning the notebook create a .env file in your directory, this will hold parameters.

To be able to run the notebook, you will need to do a pip install python-dotenv library

pip install python-dotenv

The .env file should contain the following items:

AZURE_OPENAI_ENDPOINT="https://xmm.openai.azure.com/"
AZURE_OPENAI_KEY="XXXXXXXX"
TEXT_EMBEDDING_ENGINE="text-embedding-ada-002"
COG_SEARCH_RESOURCE="mmx-cog-search" 
COG_SEARCH_KEY="YYYYYYY" 
COG_SEARCH_INDEX="myindex"

#function app configuration
STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=mmxcogstorage;AccountKey=4bKKMUnJjxdYN+DUo3WMJ6Sqm+AStDmU2EA==;EndpointSuffix=core.windows.net"
STORAGE_ACCOUNT="mmxcogstorage" 
STORAGE_CONTAINER="test"
STORAGE_KEY="XXXXU2EA=="
COG_SERVICE_KEY="786XXXXXf06f"
DEBUG="1"
functionAppUrlAndKey="https://funcationapp.azurewebsites.net/api/HttpTrigger1?code=1xVGCqCG3Txs0r6Q=="```

Deploy Instructions

  1. Create a storage account if you don't have one already. This will be the storage account that Cognitive Search will target with the indexer.

  2. Azure Cognitive Search - enable Semantic Config

  3. Deploy Azure AI services in SAME region as Azure Cog Search

  4. Deploy Azure Function App - python linux in SAME region as Azure Cog Search

  5. In VS Code editor right click on "HttpTrigger1" and selected deploy.

  6. Configure you Azure Function App to connect

    • Restart your function app.
    • On your overview - you will see the function "HttpTrigger1". Click on it and go "Code + Test"
    • Click on "Get function URL" copy and save the URL. You will need it to configure Cognitive Search.

    Example: https://mygithubfuncryh.azurewebsites.net/api/Embeddings?code=6zKqoduc-ezFurl==

{
  "IsEncrypted": false,
  "Values": {
    "API_BASE": "https://mmx-openai.openai.azure.com/",
    "API_KEY": "XXXXX",
    "API_TYPE": "azure",
    "API_VERSION": "2022-12-01",
    "APPINSIGHTS_INSTRUMENTATIONKEY": "YYYYY",
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
    "AzureWebJobsStorage": "DefaultEndpointsProtocol=https;Ac;AccountKey=udK0239QU4j6cwQ==;EndpointSuffix=core.windows.net",
    "COG_SEARCH_KEY": "FOo5exG5D3bLg1QEfW8",
    "FUNCTIONS_EXTENSION_VERSION": "~4",
    "FUNCTIONS_REQUEST_BODY_SIZE_LIMIT": "360000000",
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "INDEX_NAME": "ithelpdeskv6-vector",
    "SERVICE_ENDPOINT": "https://mmx-cog.search.windows.net",
    "TEXT_EMBEDDING_MODEL": "text-embedding-ada-002",
    "STORAGE_ACCOUNT": "storage"
    "STORAGE_ACCOUNT_CONTAINER": "container"
  },
  "ConnectionStrings": {}
}

functionappcustomskillcogsearch's People

Contributors

memasanz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.