I am using the following Python and R bfast versions at the date of the creation of th

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I think that I have misunderstood you answer earlier <a class="user-mention notranslat

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

R and Python version give different break dates about bfast HOT 10 CLOSED

diku-dk commented on July 21, 2024

R and Python version give different break dates

from bfast.

Comments (10)

mortvest commented on July 21, 2024

Thank you for reporting this. I'll look into it

from bfast.

mirt001 commented on July 21, 2024

made some minor edits to the initial comment that do not affect the gist of the issue.
In the R version I was wrongly assuming that calendars are 0 based, so I was miscalculating the day of year by 1.

from bfast.

mirt001 commented on July 21, 2024

after some more tests, in which it happened that there was a break in the last day of the monitoring period, I noticed that the breaks property is a 1 based index. Is this correct?
By this I mean that the first date in the monitoring period must have an index of 1, in order for the breaks property to be used correctly as an index. The implication being that my above script is wrong by 1 date. Unfortunately it moves the detected break even further, not closer to the R version.
Am I doing something wrong with the indexing?
I am editing the comment to correct this bug in my script, but please check the edits to better understand what I mean.

from bfast.

Carolina710 commented on July 21, 2024

Good evening,

I subscribed to this issue since I'm dealing with the same problem for some of my currents tests, both in Python and OpenCL.

I just was reading the paper (https://arxiv.org/pdf/1807.01751.pdf), and take some time seeing the R files implementation (https://www.rdocumentation.org/packages/strucchange/versions/1.5-2/source - monitoring.R), focusing on breaks calculation.

From the paper, I could understand that (among other things) the calculation of the breaks is dependent on the calculation of bounds values. In the paper the bounds are computed with the following expression:

with:

being t the monitoring period.

After inspecting the R files I found the "OLS-MOSUM" method the most identical to what is done in Python. In the method, the bounds are calculated as:

The formula is identical to what is done in Python since 'lam' (critval) is previously multiplied by the sqrt of 2. However, I think that in Python the division value inside the log is calculated differently from the R code. 't', as stated in the paper should be the monitoring period and 'n' the last value of the history, but in Python, you do this:

I think you divide by the last mapped_indice of all time serie. What do you think about:

Can you give me some feedback about this practice?

Thank you Dmitry!

from bfast.

mortvest commented on July 21, 2024

Hi @Carolina710. Python is 0-indexed, in comparison to R, which is 1-indexed. Furthermore, in Python, you can index with -1 to get the last value in the array, hence array[-1] and array[n-1] are equivalent. Se here for more information

I am currently looking into the issue

from bfast.

Carolina710 commented on July 21, 2024

Hi, thanks for your reply.

I understand that logic, I just thought that "n" or "self.n" (depending on your version) was the number of data images from the period of history, right? Like: "self.n = self._compute_end_history (dates)". I also thought that the mapped indices were calculated for all time series, including the history and the monitoring periods, with a total of "N" data images.

So, in this scenario, you take the last mapped index of all time series, that is also the last data from the monitoring period [-1] or [N-1], not the last one in history [n-1], I think ...? And my question was, shouldn't it be the last mapped index in history? Like, "histsize" in R.

Thank you very much!

from bfast.

mortvest commented on July 21, 2024

Hi @mirt001 and @Carolina710
This particular bug should be fixed now. However, I need to look more thoroughly into the boundary computation soon (including the issue @Carolina710 mentioned).

Thanks everyone!

from bfast.

mortvest commented on July 21, 2024

I think that I have misunderstood you answer earlier @Carolina710. You are right, it should be divided by the last index in the history period. I have pushed it to the develop branch. Nice catch!

from bfast.

mirt001 commented on July 21, 2024

Thanks @mortvest and also thanks @Carolina710
I just checked the results using the latest develop branch, on the above Python test, and while the results are closer, they are still different in the sense that the break is detected at a different date.
The new detected date for python is 2018 146 , or 2018-05-26, which is one date too early, when compared to the R version, which detects the break on 2018 162, or 2018-05-26. Are you getting different results?

I will test this more to see if Python is always 1 date away, or if there are more inconsistencies.

from bfast.

mortvest commented on July 21, 2024

The returned breakpoint is 0-indexed, so you don't need to subtract 1 from it. I get 2018 163 this way. The small error is due to the way the dates are converted, which is outside of the scope of this implementation. The output breakpoint index is the same as in R version.

from bfast.

R and Python version give different break dates about bfast HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent