Giter VIP home page Giter VIP logo

Comments (5)

pbosch avatar pbosch commented on August 27, 2024 3

@loicduffar @Chima-21 Do you use Windows or Linux? I encountered the issue myself in Windows using Miniconda (4.10.3) and using WSL or a native Ubuntu doesn't produce the issue. In WSL/Ubuntu it works with native Python and Miniconda.

The culprit is in in calculating the granularity:

       if len(time_diff.unique()) == 1:  # check constant frequenccy
            freq = time_diff.unique()[0].astype("int")
            self.granularity_days = freq / (24 * 3600 * (10 ** 9))
        else:
            raise RuntimeError(
                "Frequency of metrics is not constant."
                "Please check for missing or duplicate values"
            )

In WSL/Ubuntu I get a straight 1.0 for the example data. In Windows I get -2.149413925925926e-05. More precisely, the issue is in the astype call. It seems to default to int32 on Windows, which causes an overflow. Using int64 instead of int solves the problem.

I would recommend in this instance, and in general, to use explicit types instead of assuming that the default type is correct. Int on Windows systems usually defaults to 32bit while float usually defaults to 64bit. By marking them explicit with int64 and float64, whether it is with numpy or pandas, you would avoid that issue completely and make it a bit more robust. A quick search for astype in the repository shows that it's mostly implicit, so it could be that this kind of problem occurs in other places as well.

from kats.

Chima-21 avatar Chima-21 commented on August 27, 2024

Excellent package.
Have the same issue when running the example and other multivariate anomaly detections.
Seems to stem from pandas handling of the 'time' column.

`
~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1575417598142906000

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.DatetimeEngine.get_loc()

KeyError: Timestamp('2019-12-03 23:59:58.142906')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py in get_loc(self, key, method, tolerance)
    701         try:
--> 702             return Index.get_loc(self, key, method, tolerance)
    703         except KeyError as err:

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 

KeyError: Timestamp('2019-12-03 23:59:58.142906')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-51-5b7def4515a6> in <module>
      1 params = VARParams(maxlags=2)
      2 d = MultivariateAnomalyDetector(multi_anomaly_ts, params, training_days=40)
----> 3 anomaly_score_df = d.detector()
      4 d.plot()

~\anaconda3\lib\site-packages\kats\detectors\outlier.py in detector(self)
    300         while fcstTime < self.df.index.max():
    301             # forecast for fcstTime+ 1
--> 302             pred_df = self._generate_forecast(fcstTime)
    303             # calculate anomaly scores
    304             anomaly_scores_t = self._calc_anomaly_scores(pred_df)

~\anaconda3\lib\site-packages\kats\detectors\outlier.py in _generate_forecast(self, t)
    244             "index"
    245         )
--> 246         test = self.df.loc[t + dt.timedelta(days=self.granularity_days), :]
    247         pred_df["actual"] = test
    248 

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
   1098     def _getitem_tuple(self, tup: tuple):
   1099         with suppress(IndexingError):
-> 1100             return self._getitem_lowerdim(tup)
   1101 
   1102         # no multi-index, so validate all of the indexers

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_lowerdim(self, tup)
    836                 # We don't need to check for tuples here because those are
    837                 #  caught by the _is_nested_tuple_indexer check above.
--> 838                 section = self._getitem_axis(key, axis=i)
    839 
    840                 # We should never have a scalar section here, because

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1162         # fall thru to straight lookup
   1163         self._validate_key(key, axis)
-> 1164         return self._get_label(key, axis=axis)
   1165 
   1166     def _get_slice_axis(self, slice_obj: slice, axis: int):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1111     def _get_label(self, label, axis: int):
   1112         # GH#5667 this will fail if the label is not present in the axis.
-> 1113         return self.obj.xs(label, axis=axis)
   1114 
   1115     def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

~\anaconda3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3771                 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
   3772         else:
-> 3773             loc = index.get_loc(key)
   3774 
   3775             if isinstance(loc, np.ndarray):

~\anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py in get_loc(self, key, method, tolerance)
    702             return Index.get_loc(self, key, method, tolerance)
    703         except KeyError as err:
--> 704             raise KeyError(orig_key) from err
    705 
    706     def _maybe_cast_for_get_loc(self, key) -> Timestamp:

KeyError: Timestamp('2019-12-03 23:59:58.142906')

`

from kats.

Chima-21 avatar Chima-21 commented on August 27, 2024

@loicduffar @Chima-21 Do you use Windows or Linux? I encountered the issue myself in Windows using Miniconda (4.10.3) and using WSL or a native Ubuntu doesn't produce the issue. In WSL/Ubuntu it works with native Python and Miniconda.

The culprit is in in calculating the granularity:

       if len(time_diff.unique()) == 1:  # check constant frequenccy
            freq = time_diff.unique()[0].astype("int")
            self.granularity_days = freq / (24 * 3600 * (10 ** 9))
        else:
            raise RuntimeError(
                "Frequency of metrics is not constant."
                "Please check for missing or duplicate values"
            )

In WSL/Ubuntu I get a straight 1.0 for the example data. In Windows I get -2.149413925925926e-05. More precisely, the issue is in the astype call. It seems to default to int32 on Windows, which causes an overflow. Using int64 instead of int solves the problem.

I would recommend in this instance, and in general, to use explicit types instead of assuming that the default type is correct. Int on Windows systems usually defaults to 32bit while float usually defaults to 64bit. By marking them explicit with int64 and float64, whether it is with numpy or pandas, you would avoid that issue completely and make it a bit more robust. A quick search for astype in the repository shows that it's mostly implicit, so it could be that this kind of problem occurs in other places as well.

Many thanks. Issue resolved!!

from kats.

krishpn avatar krishpn commented on August 27, 2024

@pbosch May be related but when I run the command I get the same error. Is there a different requirement if the dates column has to be some specific data types?

I have a pandas dataframe multi_ts is a dataset with time as time series data object, created using pandas.to_datetime(multi_ts['time'])

f```
rom kats.models.var import VARModel, VARParams
from kats.detectors.outlier import MultivariateAnomalyDetector, MultivariateAnomalyDetectorType
from kats.models.var import VARParams
params = VARParams(maxlags=2)
m = VARModel(multi_ts, params)
m.fit()
steps=100

params = VARParams(maxlags=2)
d = MultivariateAnomalyDetector(multi_ts, params, training_days=60)
anomaly_score_df = d.detector()
d.plot()

multi_ts is a dataset with 'time' as time series object, created using pd.to_datetime(multi_ts['time']
/Documents/personal/resume/test.ipynb Cell 80' in <cell line: 10>()
      

File ~/miniconda3/lib/python3.8/site-packages/kats/detectors/outlier.py:191, in MultivariateAnomalyDetector.init(self, data, params, training_days, model_type)
189 self.granularity_days: float = freq / (24 * 3600 * (10 ** 9))
190 else:
--> 191 raise RuntimeError(
192 "Frequency of metrics is not constant."
193 "Please check for missing or duplicate values"
194 )
196 self.training_days = training_days
197 self.detector_model = model_type

RuntimeError: Frequency of metrics is not constant.Please check for missing or duplicate values

from kats.

michaelbrundage avatar michaelbrundage commented on August 27, 2024

Renaming the issue to the root cause.

Most likely, we should mark Kats as requiring 64-bit. I think we don't intend to support legacy 32-bit Python installations.

from kats.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.