Giter VIP home page Giter VIP logo

Comments (10)

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

Can I get the full stack trace on the first error, so that I know which function it might come from? :)

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

If there's no GDPR issue it would also be useful to know what data you used and what hyperparameters you supplied to the model.

from topicwizard.

vshourie-asu avatar vshourie-asu commented on May 20, 2024

Hello, thanks for the response. :)

Can I get the full stack trace on the first error, so that I know which function it might come from? :)

Absolutely. Here you go:

ValueError                                Traceback (most recent call last)
Cell In[10], line 1
----> 1 topicwizard.visualize(vectorizer=vectorizer, topic_model=dmm, corpus=corpus_cleaned, port=8080)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:245, in visualize(corpus, vectorizer, topic_model, pipeline, document_names, topic_names, port, enable_notebook)
    242     (_, vectorizer), (_, topic_model) = pipeline.steps
    244 print("Preprocessing")
--> 245 app = get_dash_app(
    246     vectorizer=vectorizer,
    247     topic_model=topic_model,
    248     corpus=corpus,
    249     document_names=document_names,
    250     topic_names=topic_names,
    251 )
    252 return run_app(app, port=port)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:73, in get_dash_app(vectorizer, topic_model, corpus, document_names, topic_names)
     42 def get_dash_app(
     43     vectorizer: Any,
     44     topic_model: Any,
   (...)
     47     topic_names: Optional[List[str]] = None,
     48 ) -> Dash:
     49     """Returns topicwizard Dash application.
     50 
     51     Parameters
   (...)
     71         Dash application object for topicwizard.
     72     """
---> 73     blueprint = get_app_blueprint(
     74         vectorizer=vectorizer,
     75         topic_model=topic_model,
     76         corpus=corpus,
     77         document_names=document_names,
     78         topic_names=topic_names,
     79     )
     80     app = Dash(
     81         __name__,
     82         blueprint=blueprint,
   (...)
     92         ],
     93     )
     94     return app

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:31, in get_app_blueprint(vectorizer, topic_model, corpus, document_names, topic_names)
     24 def get_app_blueprint(
     25     vectorizer: Any,
     26     topic_model: Any,
   (...)
     29     topic_names: Optional[List[str]] = None,
     30 ) -> DashBlueprint:
---> 31     blueprint = prepare_blueprint(
     32         vectorizer=vectorizer,
     33         topic_model=topic_model,
     34         corpus=corpus,
     35         document_names=document_names,
     36         topic_names=topic_names,
     37         create_blueprint=create_blueprint,
     38     )
     39     return blueprint

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:31, in prepare_blueprint(vectorizer, topic_model, corpus, create_blueprint, document_names, topic_names)
     29 if topic_names is None:
     30     topic_names = [f"Topic {i}" for i in range(n_topics)]
---> 31 blueprint = create_blueprint(
     32     vocab=vocab,
     33     document_term_matrix=document_term_matrix,
     34     document_topic_matrix=document_topic_matrix,
     35     topic_term_matrix=topic_term_matrix,
     36     document_names=document_names,
     37     corpus=corpus,
     38     vectorizer=vectorizer,
     39     topic_model=topic_model,
     40     topic_names=topic_names,
     41 )
     42 return blueprint

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\app.py:35, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, topic_names)
     23 def create_blueprint(
     24     vocab: np.ndarray,
     25     document_term_matrix: np.ndarray,
   (...)
     33 ) -> DashBlueprint:
     34     # --------[ Collecting blueprints ]--------
---> 35     topic_blueprint = topics.create_blueprint(
     36         vocab=vocab,
     37         document_term_matrix=document_term_matrix,
     38         document_topic_matrix=document_topic_matrix,
     39         topic_term_matrix=topic_term_matrix,
     40         document_names=document_names,
     41         corpus=corpus,
     42         vectorizer=vectorizer,
     43         topic_model=topic_model,
     44         topic_names=topic_names,
     45     )
     46     documents_blueprint = documents.create_blueprint(
     47         vocab=vocab,
     48         document_term_matrix=document_term_matrix,
   (...)
     55         topic_names=topic_names,
     56     )
     57     words_blueprint = words.create_blueprint(
     58         vocab=vocab,
     59         document_term_matrix=document_term_matrix,
   (...)
     66         topic_names=topic_names,
     67     )

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\topics.py:65, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, topic_names, **kwargs)
     56 (
     57     topic_importances,
     58     term_importances,
   (...)
     61     topic_term_matrix, document_term_matrix, document_topic_matrix
     62 )
     64 # --------[ Collecting blueprints ]--------
---> 65 intertopic_map = create_intertopic_map(
     66     topic_positions, topic_importances, topic_names
     67 )
     68 blueprints = [
     69     intertopic_map,
     70     relevance_slider,
   (...)
     74     wordcloud,
     75 ]
     76 # layouts = [blueprint.layout for blueprint in blueprints]
     77 
     78 # --------[ Creating app blueprint ]--------

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\components\topics\intertopic_map.py:29, in create_intertopic_map(topic_positions, topic_importances, topic_names)
     20 x, y = topic_positions
     22 intertopic_map = DashBlueprint()
     24 intertopic_map.layout = dcc.Graph(
     25     id="intertopic_map",
     26     responsive=True,
     27     config=dict(scrollZoom=True),
     28     animate=True,
---> 29     figure=plots.intertopic_map(
     30         x=x,
     31         y=y,
     32         topic_importances=topic_importances,
     33         topic_names=topic_names,
     34     ),
     35     className="flex-1",
     36 )
     38 intertopic_map.clientside_callback(
     39     """
     40     function(currentTopic, topicNames, currentPlot) {
   (...)
     61     prevent_initial_call=True,
     62 )
     63 return intertopic_map

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\plots\topics.py:18, in intertopic_map(x, y, topic_importances, topic_names)
     11 def intertopic_map(
     12     x: np.ndarray,
     13     y: np.ndarray,
     14     topic_importances: np.ndarray,
     15     topic_names: List[str],
     16 ) -> go.Figure:
     17     n_topics = x.shape[0]
---> 18     topic_trace = go.Scatter(
     19         x=x,
     20         y=y,
     21         mode="text+markers",
     22         text=topic_names,
     23         marker=dict(
     24             size=topic_importances,
     25             sizemode="area",
     26             sizeref=2.0 * max(topic_importances) / (100.0**2),
     27             sizemin=4,
     28             color="rgb(168,162,158)",
     29         ),
     30         customdata=np.atleast_2d(np.arange(x.shape[0])).T,
     31     )
     32     fig = go.Figure([topic_trace])
     33     fig.update_layout(
     34         clickmode="event",
     35         modebar_remove=["lasso2d", "select2d"],
   (...)
     40         margin=dict(l=0, r=0, b=0, t=0, pad=0),
     41     )

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\graph_objs\_scatter.py:3378, in Scatter.__init__(self, arg, alignmentgroup, cliponaxis, connectgaps, customdata, customdatasrc, dx, dy, error_x, error_y, fill, fillcolor, fillpattern, groupnorm, hoverinfo, hoverinfosrc, hoverlabel, hoveron, hovertemplate, hovertemplatesrc, hovertext, hovertextsrc, ids, idssrc, legend, legendgroup, legendgrouptitle, legendrank, legendwidth, line, marker, meta, metasrc, mode, name, offsetgroup, opacity, orientation, selected, selectedpoints, showlegend, stackgaps, stackgroup, stream, text, textfont, textposition, textpositionsrc, textsrc, texttemplate, texttemplatesrc, uid, uirevision, unselected, visible, x, x0, xaxis, xcalendar, xhoverformat, xperiod, xperiod0, xperiodalignment, xsrc, y, y0, yaxis, ycalendar, yhoverformat, yperiod, yperiod0, yperiodalignment, ysrc, **kwargs)
   3376 _v = marker if marker is not None else _v
   3377 if _v is not None:
-> 3378     self["marker"] = _v
   3379 _v = arg.pop("meta", None)
   3380 _v = meta if meta is not None else _v

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:4865, in BasePlotlyType.__setitem__(self, prop, value)
   4863 # ### Handle compound property ###
   4864 if isinstance(validator, CompoundValidator):
-> 4865     self._set_compound_prop(prop, value)
   4867 # ### Handle compound array property ###
   4868 elif isinstance(validator, (CompoundArrayValidator, BaseDataValidator)):

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5276, in BasePlotlyType._set_compound_prop(self, prop, val)
   5273 # Import value
   5274 # ------------
   5275 validator = self._get_validator(prop)
-> 5276 val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
   5278 # Save deep copies of current and new states
   5279 # ------------------------------------------
   5280 curr_val = self._compound_props.get(prop, None)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:2475, in CompoundValidator.validate_coerce(self, v, skip_invalid, _validate)
   2472     v = self.data_class()
   2474 elif isinstance(v, dict):
-> 2475     v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate)
   2477 elif isinstance(v, self.data_class):
   2478     # Copy object
   2479     v = self.data_class(v)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\graph_objs\scatter\_marker.py:1674, in Marker.__init__(self, arg, angle, angleref, anglesrc, autocolorscale, cauto, cmax, cmid, cmin, color, coloraxis, colorbar, colorscale, colorsrc, gradient, line, maxdisplayed, opacity, opacitysrc, reversescale, showscale, size, sizemin, sizemode, sizeref, sizesrc, standoff, standoffsrc, symbol, symbolsrc, **kwargs)
   1672 _v = size if size is not None else _v
   1673 if _v is not None:
-> 1674     self["size"] = _v
   1675 _v = arg.pop("sizemin", None)
   1676 _v = sizemin if sizemin is not None else _v

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:4873, in BasePlotlyType.__setitem__(self, prop, value)
   4869         self._set_array_prop(prop, value)
   4871     # ### Handle simple property ###
   4872     else:
-> 4873         self._set_prop(prop, value)
   4874 else:
   4875     # Make sure properties dict is initialized
   4876     self._init_props()

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5217, in BasePlotlyType._set_prop(self, prop, val)
   5215         return
   5216     else:
-> 5217         raise err
   5219 # val is None
   5220 # -----------
   5221 if val is None:
   5222     # Check if we should send null update

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5212, in BasePlotlyType._set_prop(self, prop, val)
   5209 validator = self._get_validator(prop)
   5211 try:
-> 5212     val = validator.validate_coerce(val)
   5213 except ValueError as err:
   5214     if self._skip_invalid:

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:777, in NumberValidator.validate_coerce(self, v)
    772             v_invalid = np.logical_not(v_valid)
    773             some_invalid_els = np.array(v, dtype="object")[v_invalid][
    774                 :10
    775             ].tolist()
--> 777             self.raise_invalid_elements(some_invalid_els)
    779     v = v_array  # Always numeric numpy array
    780 elif self.array_ok and is_simple_array(v):
    781     # Check numeric

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:303, in BaseValidator.raise_invalid_elements(self, invalid_els)
    301     def raise_invalid_elements(self, invalid_els):
    302         if invalid_els:
--> 303             raise ValueError(
    304                 """
    305     Invalid element(s) received for the '{name}' property of {pname}
    306         Invalid elements include: {invalid}
    307 
    308 {valid_clr_desc}""".format(
    309                     name=self.plotly_name,
    310                     pname=self.parent_name,
    311                     invalid=invalid_els[:10],
    312                     valid_clr_desc=self.description(),
    313                 )
    314             )

ValueError: 
    Invalid element(s) received for the 'size' property of scatter.marker
        Invalid elements include: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]

    The 'size' property is a number and may be specified as:
      - An int or float in the interval [0, inf]
      - A tuple, list, or one-dimensional numpy array of the above

If there's no GDPR issue it would also be useful to know what data you used and what hyperparameters you supplied to the model.

There are data privacy concerns (FERPA, to be exact). Therefore, it's not a good idea for me to share my dataset.

But here's a bit of domain context:

  • We're repurposing tweetopic to do STTM on Salesforce case descriptions written by university financial aid employees.
  • We have a preprocessing pipeline built out via Spacy to do tokenization, POS removal, stop word removal, etc.
  • We supply in non-null preprocessed documents to be modeled

Hyperparameters for DMM model on Tweetopic:

  • alpha and beta both set to 0.1
  • num_topics: 50
  • iterations: 25 (get the same error when I set to 50 or 100)

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

Thanks for the info, I will try to deliver a fix as quickly as possible, I think you were right in your judgment and it has to be the nans being output by tweetopic. In the meantime you can try to identify which texts are problematic (aka result in nans) and perhaps remove them before you pass the corpus as a list of texts to topicwizard.

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

I checked your Colab notebook and I think when you try to remove the texts there's some pandas shenanigans going on. I would try:

transformed_corpus = topic_pipeline.transform(corpus)
# Turning it into an array so you can index it with an array
filtered_corpus = np.array(corpus)
# Getting the indices where something is nan
problematic_indices = np.isnan(transformed_corpus).any(axis=1)
# Removing them
filtered_corpus = filtered_corpus[~problematic_indices]
topicwizard.visualize(pipeline=pipeline, corpus=filtered_corpus)

I think this should work fine, I will try to address these issues in the meantime.

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

I managed to reproduce the error with a custom version of NMF that randomly assigns nans to certain observations.

class RandomNanNMF(NMF):
    def transform(self, X):
        res = super().transform(X)
        n_docs = res.shape[0]
        nans = np.random.choice(np.arange(n_docs), size=30, replace=False)
        res[nans, :] = np.nan
        return res

    def fit_transform(self, X, y=None, W=None, H=None):
        res = super().fit_transform(X, y, W, H)
        n_docs = res.shape[0]
        nans = np.random.choice(np.arange(n_docs), size=30, replace=False)
        res[nans, :] = np.nan
        return res

The solution was to filter out the nan values in the preprocessing step of topicwizard and throw a warning to the user informing them about the removal of these documents.

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

Fix merged into main, new version built and published to PyPI, you should try installing topicwizard 0.2.6 and run your code again :)

from topicwizard.

vshourie-asu avatar vshourie-asu commented on May 20, 2024

Thank you! I'll give it a shot now.

from topicwizard.

x-tabdeveloping avatar x-tabdeveloping commented on May 20, 2024

Can you confirm that the fix worked?

from topicwizard.

vshourie-asu avatar vshourie-asu commented on May 20, 2024

Hi!

Sorry for the long wait time, this query got lost in my massive work email mountain.

I reran the visualization command with version 0.2.6 installed.

I get the following error after running. Note that the UserError shows up, which means your validation is working as intended.

`C:\Users\vshourie\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:33: UserWarning: 31 documents had nan values in the output of the topic model, these are removed in preprocessing and will not be visible in the app.
  warn(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 topicwizard.visualize(vectorizer=vectorizer, topic_model=dmm, corpus=corpus_cleaned, port=8080)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:245, in visualize(corpus, vectorizer, topic_model, pipeline, document_names, topic_names, port, enable_notebook)
    242     (_, vectorizer), (_, topic_model) = pipeline.steps
    244 print("Preprocessing")
--> 245 app = get_dash_app(
    246     vectorizer=vectorizer,
    247     topic_model=topic_model,
    248     corpus=corpus,
    249     document_names=document_names,
    250     topic_names=topic_names,
    251 )
    252 return run_app(app, port=port)

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:73, in get_dash_app(vectorizer, topic_model, corpus, document_names, topic_names)
     42 def get_dash_app(
     43     vectorizer: Any,
     44     topic_model: Any,
   (...)
     47     topic_names: Optional[List[str]] = None,
     48 ) -> Dash:
     49     """Returns topicwizard Dash application.
     50 
     51     Parameters
   (...)
     71         Dash application object for topicwizard.
     72     """
---> 73     blueprint = get_app_blueprint(
     74         vectorizer=vectorizer,
     75         topic_model=topic_model,
     76         corpus=corpus,
     77         document_names=document_names,
     78         topic_names=topic_names,
     79     )
     80     app = Dash(
     81         __name__,
     82         blueprint=blueprint,
   (...)
     92         ],
     93     )
     94     return app

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:31, in get_app_blueprint(vectorizer, topic_model, corpus, document_names, topic_names)
     24 def get_app_blueprint(
     25     vectorizer: Any,
     26     topic_model: Any,
   (...)
     29     topic_names: Optional[List[str]] = None,
     30 ) -> DashBlueprint:
---> 31     blueprint = prepare_blueprint(
     32         vectorizer=vectorizer,
     33         topic_model=topic_model,
     34         corpus=corpus,
     35         document_names=document_names,
     36         topic_names=topic_names,
     37         create_blueprint=create_blueprint,
     38     )
     39     return blueprint

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:44, in prepare_blueprint(vectorizer, topic_model, corpus, create_blueprint, document_names, topic_names)
     42 if topic_names is None:
     43     topic_names = [f"Topic {i}" for i in range(n_topics)]
---> 44 blueprint = create_blueprint(
     45     vocab=vocab,
     46     document_term_matrix=document_term_matrix,
     47     document_topic_matrix=document_topic_matrix,
     48     topic_term_matrix=topic_term_matrix,
     49     document_names=document_names,
     50     corpus=corpus,
     51     vectorizer=vectorizer,
     52     topic_model=topic_model,
     53     topic_names=topic_names,
     54 )
     55 return blueprint

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\app.py:46, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, topic_names)
     23 def create_blueprint(
     24     vocab: np.ndarray,
     25     document_term_matrix: np.ndarray,
   (...)
     33 ) -> DashBlueprint:
     34     # --------[ Collecting blueprints ]--------
     35     topic_blueprint = topics.create_blueprint(
     36         vocab=vocab,
     37         document_term_matrix=document_term_matrix,
   (...)
     44         topic_names=topic_names,
     45     )
---> 46     documents_blueprint = documents.create_blueprint(
     47         vocab=vocab,
     48         document_term_matrix=document_term_matrix,
     49         document_topic_matrix=document_topic_matrix,
     50         topic_term_matrix=topic_term_matrix,
     51         document_names=document_names,
     52         corpus=corpus,
     53         vectorizer=vectorizer,
     54         topic_model=topic_model,
     55         topic_names=topic_names,
     56     )
     57     words_blueprint = words.create_blueprint(
     58         vocab=vocab,
     59         document_term_matrix=document_term_matrix,
   (...)
     66         topic_names=topic_names,
     67     )
     68     blueprints = [
     69         topic_blueprint,
     70         words_blueprint,
     71         documents_blueprint,
     72     ]

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\documents.py:32, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, **kwargs)
     19 def create_blueprint(
     20     vocab: np.ndarray,
     21     document_term_matrix: np.ndarray,
   (...)
     29 ) -> DashBlueprint:
     30     # --------[ Preparing data ]--------
     31     n_topics = topic_term_matrix.shape[0]
---> 32     document_positions = prepare.document_positions(
     33         document_term_matrix=document_term_matrix
     34     )
     35     dominant_topics = prepare.dominant_topic(
     36         document_topic_matrix=document_topic_matrix
     37     )
     38     # Creating unified color scheme

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\prepare\documents.py:47, in document_positions(document_term_matrix)
     41 perplexity = np.min((40, n_docs - 1))
     42 manifold = umap.UMAP(
     43     n_components=2,
     44     n_neighbors=perplexity,
     45     metric="cosine",
     46 )
---> 47 x, y = manifold.fit_transform(document_term_matrix).T
     48 return x, y

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:2772, in UMAP.fit_transform(self, X, y)
   2742 def fit_transform(self, X, y=None):
   2743     """Fit X into an embedded space and return that transformed
   2744     output.
   2745 
   (...)
   2770         Local radii of data points in the embedding (log-transformed).
   2771     """
-> 2772     self.fit(X, y)
   2773     if self.transform_mode == "embedding":
   2774         if self.output_dens:

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:2516, in UMAP.fit(self, X, y)
   2510     nn_metric = self._input_distance_func
   2511 if self.knn_dists is None:
   2512     (
   2513         self._knn_indices,
   2514         self._knn_dists,
   2515         self._knn_search_index,
-> 2516     ) = nearest_neighbors(
   2517         X[index],
   2518         self._n_neighbors,
   2519         nn_metric,
   2520         self._metric_kwds,
   2521         self.angular_rp_forest,
   2522         random_state,
   2523         self.low_memory,
   2524         use_pynndescent=True,
   2525         n_jobs=self.n_jobs,
   2526         verbose=self.verbose,
   2527     )
   2528 else:
   2529     self._knn_indices = self.knn_indices

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:328, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
    325     n_trees = min(64, 5 + int(round((X.shape[0]) ** 0.5 / 20.0)))
    326     n_iters = max(5, int(round(np.log2(X.shape[0]))))
--> 328     knn_search_index = NNDescent(
    329         X,
    330         n_neighbors=n_neighbors,
    331         metric=metric,
    332         metric_kwds=metric_kwds,
    333         random_state=random_state,
    334         n_trees=n_trees,
    335         n_iters=n_iters,
    336         max_candidates=60,
    337         low_memory=low_memory,
    338         n_jobs=n_jobs,
    339         verbose=verbose,
    340         compressed=False,
    341     )
    342     knn_indices, knn_dists = knn_search_index.neighbor_graph
    344 if verbose:

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\pynndescent_.py:804, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
    793         print(ts(), "Building RP forest with", str(n_trees), "trees")
    794     self._rp_forest = make_forest(
    795         data,
    796         n_neighbors,
   (...)
    802         self._angular_trees,
    803     )
--> 804     leaf_array = rptree_leaf_array(self._rp_forest)
    805 else:
    806     self._rp_forest = None

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\rp_trees.py:1097, in rptree_leaf_array(rp_forest)
   1095 def rptree_leaf_array(rp_forest):
   1096     if len(rp_forest) > 0:
-> 1097         return np.vstack(rptree_leaf_array_parallel(rp_forest))
   1098     else:
   1099         return np.array([[-1]])

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest)
   1088 def rptree_leaf_array_parallel(rp_forest):
-> 1089     result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
   1090         joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
   1091     )
   1092     return result

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:1098, in Parallel.__call__(self, iterable)
   1095     self._iterating = False
   1097 with self._backend.retrieval_context():
-> 1098     self.retrieve()
   1099 # Make sure that we get a last message telling us we are done
   1100 elapsed_time = time.time() - self._start_time

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self)
    973 try:
    974     if getattr(self._backend, 'supports_timeout', False):
--> 975         self._output.extend(job.get(timeout=self.timeout))
    976     else:
    977         self._output.extend(job.get())

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\multiprocessing\pool.py:771, in ApplyResult.get(self, timeout)
    769     return self._value
    770 else:
--> 771     raise self._value

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\multiprocessing\pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    123 job, i, func, args, kwds = task
    124 try:
--> 125     result = (True, func(*args, **kwds))
    126 except Exception as e:
    127     if wrap_exception and func is not _helper_reraises_exception:

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\_parallel_backends.py:620, in SafeFunction.__call__(self, *args, **kwargs)
    618 def __call__(self, *args, **kwargs):
    619     try:
--> 620         return self.func(*args, **kwargs)
    621     except KeyboardInterrupt as e:
    622         # We capture the KeyboardInterrupt and reraise it as
    623         # something different, as multiprocessing does not
    624         # interrupt processing for a KeyboardInterrupt
    625         raise WorkerInterrupt() from e

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:288, in BatchedCalls.__call__(self)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:288, in <listcomp>(.0)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

ValueError: cannot assign slice from input of different size`

from topicwizard.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.