Giter VIP home page Giter VIP logo

Comments (5)

midneet avatar midneet commented on May 22, 2024 1

I've tried some examples to build index with metadata using IndexBuild. And I use a simple word2vec data to build, so my input data would be like
for each line: apple\t0.229233|0.21099|0.108552|0.135154|-0.045957|....|-0.040005|0.1802|0.103172|-0.202125|-0.135632|-0.057288
and if you test the query with IndexSearch, it will return results, and for each line, it shows the most similar (shortest distance) data in the index in format of "distance@metadata" separated by
"|", so it would be like
apple:0.00@apple|0.001@Apple|0.05@apple pen|....|0.5@fruit|
so I think the metadata could be associated with anything you want to annotate the vectors you input. In my case the metadata is the original word before transforming to embedding. Still it's my guess :P

from sptag.

shashi-netra avatar shashi-netra commented on May 22, 2024 1

I just couldn't get it to work for any kind of data. And my questions here have remained unanswered. It seems the Microsoft team couldn't be bothered, unfortunately, and I have given up on using this tool.

from sptag.

joskei avatar joskei commented on May 22, 2024

I still didn't get how to make this work for words. Do you have a sample simple data?

from sptag.

shashi-netra avatar shashi-netra commented on May 22, 2024

BTW we have recently open-sourced pgANN that solves this problem with a PostgreSQL backend. HTH.

from sptag.

joskei avatar joskei commented on May 22, 2024

So here's my code. This returns something but I'm not sure why it returns it that way. Can somebody explain? This is based on the sample code from the github site.

`using Microsoft.ANN.SPTAGManaged;
using System;
using System.IO;
using System.Text;

namespace SPTAG_Tester
{
class Program
{
static int dimension = 2;
static int n = 14;
static int k = 3;

    static byte[] createFloatArray(int n)
    {
        byte[] data = new byte[n * dimension * sizeof(float)];
        
        for (int i = 0; i < n; i++)
            for (int j = 0; j < dimension; j++)
                Array.Copy(BitConverter.GetBytes((float)i), 0, data, (i * dimension + j) * sizeof(float), 4);
        return data;

     
    }

    static byte[] createMetadata(int n)
    {
        StringBuilder sb = new StringBuilder();

        sb.Append("kitten\n");
        sb.Append("hamster\n");
        sb.Append("tarantula\n");
        sb.Append("puppy\n");
        sb.Append("crocodile\n");
        sb.Append("dolphin\n");
        sb.Append("panda bear\n");
        sb.Append("lobster\n");
        sb.Append("capybara\n");
        sb.Append("elephant\n");
        sb.Append("mosquito\n");
        sb.Append("goldfish\n");
        sb.Append("horse\n");
        sb.Append("chicken\n");

        return Encoding.ASCII.GetBytes(sb.ToString());
    }

    static void Main()
    {
        {
            AnnIndex idx = new AnnIndex("BKT", "Float", dimension);
            idx.SetBuildParam("DistCalcMethod", "L2");
            byte[] data = createFloatArray(n);

            byte[] meta = createMetadata(n);
            idx.BuildWithMetaData(data, meta, n, true);
            idx.Save("testcsharp");
        }

        AnnIndex index = AnnIndex.Load("testcsharp");
        BasicResult[] res = index.SearchWithMetaData(createFloatArray(1), k);
        for (int i = 0; i < res.Length; i++)
            Console.WriteLine("result " + i.ToString() + ":" + res[i].Dist.ToString() + "@(" + res[i].VID.ToString() + "," + Encoding.ASCII.GetString(res[i].Meta) + ")");
        Console.WriteLine("test finish!");

        Console.ReadLine();
    }
}

}`

The result is:
result 0:0@(0,kitten ) result 1:2@(1,hamster ) result 2:8@(2,tarantula )

So some question:

  • Why is it returning the first 3?
  • Can I do a search base on my metadata? How?
  • What should be the content of my "data" variable (the one generated from createFloatArray(int n)?

from sptag.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.