Giter VIP home page Giter VIP logo

Comments (10)

cuichenxu avatar cuichenxu commented on May 24, 2024

Got the same case, have you solved it?

from raptor.

fatlism avatar fatlism commented on May 24, 2024

I also encountered the same problem, is there any solution?

from raptor.

isConic avatar isConic commented on May 24, 2024

@jeffreyzhanghc
can you pinpoint where in the repo this line of code is?

from raptor.

jeffreyzhanghc avatar jeffreyzhanghc commented on May 24, 2024

@cuichenxu @fatlism Hi, I have not totally understand the case yet, but my initial guess will be during the embedding process I use the original raptor model to train Chinese content, which in longer context yield to this bug very often, yet when I customize my embedding/summarization model for Chinese, this does not shows up for a while. My suggestion will be, if you are training longer text in different language, you might consider try a customized embedding methods specifically for that language, but I am not sure if that can solve the issue

from raptor.

jeffreyzhanghc avatar jeffreyzhanghc commented on May 24, 2024

@jeffreyzhanghc can you pinpoint where in the repo this line of code is?

it is under raptor/cluster_utils.py, line 33

from raptor.

jeffreyzhanghc avatar jeffreyzhanghc commented on May 24, 2024

@jeffreyzhanghc can you pinpoint where in the repo this line of code is?

and for the umap package it is in umap_.py line 2379 in .fit, and lead to error from line 1777 from _validate_parameters()

from raptor.

cuichenxu avatar cuichenxu commented on May 24, 2024

@cuichenxu @fatlism Hi, I have not totally understand the case yet, but my initial guess will be during the embedding process I use the original raptor model to train Chinese content, which in longer context yield to this bug very often, yet when I customize my embedding/summarization model for Chinese, this does not shows up for a while. My suggestion will be, if you are training longer text in different language, you might consider try a customized embedding methods specifically for that language, but I am not sure if that can solve the issue

Hi, thanks for your insights!
I just use texts that include English only. And the embedding model is SBertEmbeddingModel in raptor/EmbeddingModels.py, and it still suffer this, I really do not understand why.

By the way, can you run this to satisfy your aims successfully? Could you please share your custom embedding model code? I tried to implement one, but an error occurred.....

from raptor.

fatlism avatar fatlism commented on May 24, 2024
if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))

I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

from raptor.

cuichenxu avatar cuichenxu commented on May 24, 2024
if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))

I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

How long does it take when the context is long?

from raptor.

fatlism avatar fatlism commented on May 24, 2024
if n_neighbors is None:
        # n_neighbors = int((len(embeddings) - 1) ** 0.5)
        n_neighbors = max(2, int((len(embeddings) - 1) ** 0.5))

I found that the length of the aggregated vector array is 2. This error will only occur if dimensionality reduction is called, because the default parameter n_neighbors value is not set. I temporarily solved it through the above code.

How long does it take when the context is long?

A single-threaded execution might take several hours.

from raptor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.