Giter VIP home page Giter VIP logo

azuredataretrievalaugmentedgenerationsamples's Introduction

Samples for Retrieval-Augmented LLMs with Azure Data

This repo contains code samples and links to help you get started with retrieval augmentation generation (RAG) on Azure. The samples follow a RAG pattern that include the following steps:

  1. Add sample data to an Azure database product
  2. Create embeddings from the sample data using an Azure OpenAI Embeddings model
  3. Link the Azure database product to Azure Cognitive Search (for databases without native vector indexing)
  4. Create a vector index on the embeddings
  5. Perform vector similarity search
  6. Perform question answering over the sample data using an Azure OpenAI Completions model

Resources and Coverage

Table below provides a high level guidance. Please follow the links to the relevant resources.

Azure data product Native vector indexing OR Azure Cognitive Search (ACS) Guidance: repo, blog or docs
Azure Database for PostgreSQL Native Sample in this repo (Python)
CosmosDB - MongoDB vCore Native Docs, Blog, Repo, Sample in this repo (C#, Python)
Azure Cache for Redis Native Sample in this repo (Python)
CosmosDB - PostgreSQL ACS Sample in this repo (Python)
CosmosDB - MongoDB ACS Sample in this repo (C#, Python)
CosmosDB - NoSQL ACS Sample in this repo (C#, Python), Repo
AzureSQL ACS Sample in this repo (Python)
Fabric OneLake ACS Fabric Notebook

Responsible AI

Microsoft is committed to the advancement of AI driven by ethical principles. - Learn more about responsible use of Azure OpenAI and LLMs here. - Learn more about responsible AI at Microsoft here.

Maintainer

As the maintainer of this project, please make a few updates:

  • Improving this README.MD file to provide a great experience
  • Updating SUPPORT.MD with content about this project's support experience
  • Understanding the security reporting process in SECURITY.MD
  • Remove this section from the README

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

azuredataretrievalaugmentedgenerationsamples's People

Contributors

abhinavtrips avatar aydan-at-microsoft avatar claudiopadilha avatar davidedelvecchio avatar enteli avatar hosseinheris avatar jcodella avatar jdubeau avatar journeyman-msft avatar markjbrown avatar microsoftopensource avatar sdk-ai avatar theovankraay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azuredataretrievalaugmentedgenerationsamples's Issues

Azure Cosmos DB create IVF index error cosmosSearchOptions code 197 InvalidIndexSpecificationOption

I am following this tutorial about taking advantage of Azure Cosmos DB for Mongo DB vCore's vector similarity search functionality. To do so, I created a Cosmos DB resource using "Try Azure Cosmos DB" with a resource group located in East US.

I connected to the database using this connection string:

import urllib 
import pymongo
COSMOS_MONGO_USER = 'cosmosrgeastus3xxxxxxxxxxxxxxxxxxxxxb'
COSMOS_MONGO_PWD = 'zxxxxxxxxxxxxxxxxxxxxxxxxxxxxx='
COSMOS_MONGO_SERVER = 'cosmosrgeastus318282c5-ac03-48af-82f4db.mongo.cosmos.azure.com'
COSMOS_MONGO_PORT = '10255'

mongo_conn = "mongodb://"+urllib.parse.quote(COSMOS_MONGO_USER)+":"+urllib.parse.quote(COSMOS_MONGO_PWD)+"@"+COSMOS_MONGO_SERVER+':'+COSMOS_MONGO_PORT+"?ssl=true&replicaSet=globaldb&retrywrites=false&maxIdleTimeMS=120000&appName=@cosmosrgeastus318282c5-ac03-48af-82f4db@"

mongo_client = pymongo.MongoClient(mongo_conn)

Despite a warning ("You appear to be connected to a CosmosDB cluster"), the client seems to be created successfully.

Note:
According to the tutorial, the connection string is supposed to be

mongo_conn = "mongodb+srv://"+urllib.parse.quote(COSMOS_MONGO_USER)+":"+urllib.parse.quote(COSMOS_MONGO_PWD)+"@"+COSMOS_MONGO_SERVER+"?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"

However, using that raises an exception "ConfigurationError: The DNS query name does not exist: _mongodb._tcp.cosmosrgeastus318282c5-ac03-48af-82f4db.mongo.cosmos.azure.com."
That is why I changed it to the actual connection string provided by the Azure CosmosDB resource alongside the user, password and server values.

Then I created a database and a collection

# create a database called TutorialDB
db = mongo_client['TutorialDB']

# Create collection if it doesn't exist
COLLECTION_NAME = "CarrierManualCollection"

collection = db[COLLECTION_NAME]

if COLLECTION_NAME not in db.list_collection_names():
    # Creates a unsharded collection that uses the DBs shared throughput
    db.create_collection(COLLECTION_NAME)
    print("Created collection '{}'.\n".format(COLLECTION_NAME))
else:
    print("Using collection: '{}'.\n".format(COLLECTION_NAME))

Which results as expected printing Created collection 'CarrierManualCollection'.

Then, I try to create an IVF index, since "IVF is supported on all cluster tiers, including the free tier".

db.command({
  'createIndexes': COLLECTION_NAME,
  'indexes': [
    {
      'name': 'VectorSearchIndex',
      'key': {
        "contentVector": "cosmosSearch"
      },
      'cosmosSearchOptions': {
        'kind': 'vector-ivf',
        'numLists': 1,
        'similarity': 'COS',
        'dimensions': 1536
      }
    }
  ]
})

But I got this error message:

OperationFailure: cosmosSearchOptions, full error: {'ok': 0.0, 'errmsg': 'cosmosSearchOptions', 'code': 197, 'codeName': 'InvalidIndexSpecificationOption'}

The expected behavior is to get a success message that allows me to continue with the tutorial adding data to the collection.

What am I missing?

Postgres notebook rerun keeps hanging

Postgres notebook has a rewrite issue. On my side, notebook cell keeps running. Discussed with Hossein. A different but related issue occurs on his side. Data keeps appending.

Tasks

No tasks being tracked yet.

Timeout error doing db.list_collection_names() in CosmosDB-MongoDB-vCore_AzureOpenAI_Tutorial.ipynb

Hi, I'm trying the tutorial notebook for CosmosDB-MongoDB-vCore.
I have no problems connecting with:

mongo_conn = "mongodb+srv://"+urllib.parse.quote(COSMOS_MONGO_USER)+":"+urllib.parse.quote(COSMOS_MONGO_PWD)+"@"+COSMOS_MONGO_SERVER+"?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
mongo_client = pymongo.MongoClient(mongo_conn)

But then when creating the the DB and listing collection names with:

db = mongo_client['ExampleDB']
COLLECTION_NAME = "ExampleCollection"
collection = db[COLLECTION_NAME]
if COLLECTION_NAME not in db.list_collection_names():
    db.create_collection(COLLECTION_NAME)

the call of:

db.list_collection_names()

doesn't go through and return the error:

ServerSelectionTimeoutError: c.cosmos-db-openai-explore.mongocluster.cosmos.azure.com:10260: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 6602e5685fa5c332a820d706, topology_type: Unknown, servers: [<ServerDescription ('c.cosmos-db-openai-explore.mongocluster.cosmos.azure.com', 10260) server_type: Unknown, rtt: None, error=NetworkTimeout('c.cosmos-db-openai-explore.mongocluster.cosmos.azure.com:10260: timed out (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

Any advice? Thanks in advance!

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Unable to connect to Azure Cosmos DB for MongoDB vCore using pymongo

I'm trying to setup the connection for my MongoDb vCore cluster using pymongo, here is the code I have used

mongo_conn = "mongodb+srv://"+COSMOS_MONGO_USER+":"+COSMOS_MONGO_PWD+"@"+COSMOS_MONGO_SERVER+"?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"

mongo_client = pymongo.MongoClient(mongo_conn)

Following is the error I have received.

ConfigurationError: nameserver is not a dns.nameserver.Nameserver instance or text form, IP address, nor a valid https URL

COSMOS_MONGO_SERVER values was in this format :

sample-db.mongocluster.cosmos.azure.com/

Environment:

Python: 3.11.5

Pymongo : 4.5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.