Giter VIP home page Giter VIP logo

Comments (4)

catle2aurecon avatar catle2aurecon commented on June 4, 2024 1

Here is how I avoid the above error:

  • I change the reduction_dimension to 5 instead of 10
  • I restricted the number of layers 2 for constructing the tree.
    The mentioned configuation works for my type of data; hence, you might need to do a bit of trial and error.
2024-03-18 21:48:13,833 - Successfully initialized TreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB
        
        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}
        
2024-03-18 21:48:13,833 - Successfully initialized ClusterTreeBuilder with Config 
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB
        
        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}
        
2024-03-18 21:48:13,833 - Successfully initialized RetrievalAugmentation with Config 
        RetrievalAugmentationConfig:
            
        TreeBuilderConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Max Tokens: 100
            Num Layers: 2
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Summarization Length: 100
            Summarization Model: <__main__.ROOTSummarizationModel object at 0x7f3ee5622a10>
            Embedding Models: {'EMB': <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>}
            Cluster Embedding Model: EMB
        
        Reduction Dimension: 5
        Clustering Algorithm: RAPTOR_Clustering
        Clustering Parameters: {}
        
            
            
        TreeRetrieverConfig:
            Tokenizer: <Encoding 'cl100k_base'>
            Threshold: 0.5
            Top K: 5
            Selection Mode: top_k
            Context Embedding Model: EMB
            Embedding Model: <__main__.SBertEmbeddingModel object at 0x7f3db828bf50>
            Num Layers: None
            Start Layer: None
        
            
            QA Model: <__main__.ROOTQAModel object at 0x7f3dbc9ec6d0>
            Tree Builder Type: cluster

from raptor.

catle2aurecon avatar catle2aurecon commented on June 4, 2024

Ran into the same problem, it relates to a tree_builder.build_from_text function.
It would be the problem regardless of LLM model choices.

from raptor.

JacksonCakes avatar JacksonCakes commented on June 4, 2024

I had the same problem as well. Seems like related to MaartenGr/BERTopic#97 (comment)

from raptor.

parthsarthi03 avatar parthsarthi03 commented on June 4, 2024

Should be fixed with #16

from raptor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.