Giter VIP home page Giter VIP logo

Comments (24)

mikemccand avatar mikemccand commented on June 12, 2024

Not sure how related these two are but they're at least in the same vicinity.

[Legacy Jira: Erick Erickson (@ErickErickson) on Nov 09 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Hi @mar-kolya, I like the idea of the patch. Actually not much about fsync is guaranteed on any OS, anywhere.So i agree with having an extra dir sync before the rename, just to be safe, although i don't know what filesystems might be affected by this.

[Legacy Jira: Robert Muir (@rmuir) on Nov 10 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

The guarantees about fsync are weak anyways (as Robert said), and on top - fsyncing directory metadata is a hack, the Java API does not allow to do it via API, you need a hack with a file handle - but it works in our testing ( @mikemccand had/has a test computer with a remote powerswitch to stress all this for weeks). The directory sync is at least documented in Linux Man Pages, for other UNIX-ial operating systems it not defined (lack of POSIX standard for it).

In short:

  • On Linux, the fsync on the directory really works, but we only know about usual file systems (ext4 and xfs I think)
  • In addition because the atomic rename use case is very common in Unix world to commit stuff, the kernel already does the right thing. If it sees an atomic rename, it automatically fsyncs directory under certain conditions (read source code). @rmuir, is this right? - it's long ago when I last looked at that code!
  • On MacOSX/Solaris the same applies like for linux, although it does not have the automatism in kernel. And we don't know if fsyncing directory is really done for all file systems. The Man page does not say anything and POSIX does not define it.
  • On Windows, the fsync on directory does not work at all (it is a no-op in Lucene -> we have a try-catch around it with an assertion on Windows in the exception block). But Windows file systems guarantee that after the atomic rename the directory is in an consistent state (it's documented). Happens-before also works.

[Legacy Jira: Uwe Schindler (@uschindler) on Nov 10 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Just to clarify:

  • On Linux directory fsync does work, but it doesn't solve the problem currently because this fsync currently happens after segment list has been created. Essentially guarantees about fsync (weak as they are) are in case of failure: before fsync you can see changes in any combination, after fsync you are guaranteed to see exactly what was written.
  • This can potentially affect any FS that uses non trivial storage for directories (which is pretty much everything these days). Word on the internet is that btrfs is capable of doing out of order directory writes.
  • 'kernel automatically detects rename pattern' - I think this only works on some FSs (ext4) and only if certain mount options are present (auto_da_alloc). And I think generally this is about syncing file data with directory, not syncing directory as a whole on rename.

[Legacy Jira: Nikolay Martynov on Nov 10 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

I think any filesystem that behaves in this way on crash is choosing to corrupt users data by reflecting some later directory entries but not former ones.

But linux filesystems are straight up buggy as shit around this functionality so it wouldnt surprise me. So like i said, i like the extra sync before rename to try to coerce better behavior on crash during fsync(dir).

[Legacy Jira: Robert Muir (@rmuir) on Nov 11 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

+1 to fsync directory metadata after writing segments and before writing segments file.

[Legacy Jira: Michael McCandless (@mikemccand) on Nov 14 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

So it seems like the consensus is to go ahead and make this change. one-line patch attached, I'll run precommit and test on it but don't expect any issues.

Any objections to checking this in? If not I'll take care of it over the next day or two.

[Legacy Jira: Erick Erickson (@ErickErickson) on Nov 22 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

This change causes TestIndexWriter.testExceptionsDuringCommit() to fail.

The first two conditions fail:
new FailOnlyInCommit(false, FailOnlyInCommit.PREPARE_STAGE), // fail during global field map is written
new FailOnlyInCommit(true, FailOnlyInCommit.PREPARE_STAGE), // fail after global field map is written
new FailOnlyInCommit(false, FailOnlyInCommit.FINISH_STAGE) // fail while running finishCommit

Are these two conditions something that should fail with this change? I've attached a screen shot of the exception in question. With this change, this exception is not thrown.

[Legacy Jira: Erick Erickson (@ErickErickson) on Nov 22 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Erick, can you hold off a bit. I don't think its a good idea to rush in changes to commit over a holiday, these changes really need review.

[Legacy Jira: Robert Muir (@rmuir) on Nov 22 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

But this is when I have spare time ;)...

Point taken, it can wait until next week. I'm a bit worried that the test case that now fails indicate something wouldn't be right anyway, it'd be good to give people a chance to look if I can't find anything...

[Legacy Jira: Erick Erickson (@ErickErickson) on Nov 22 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

OK, now that people have had a chance to catch up after Thanksgiving, any comments? Particularly for two bits:

1> whether the idea of the patch seems OK?
2> whether the failing tests are now obsolete and should be removed?

[Legacy Jira: Erick Erickson (@ErickErickson) on Nov 29 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

I couldn't get a repeated test failure in 100 tries, trying,1,000 over the night. If they succeed I'm probably going to check this in tomorrow unless there are objections.

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 04 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

1,000 runs later, no failures with TestIndexWriter so I'll chalk it up to hallucinations.

Running precommit and full test suite now, if the pass I'll check this in later today unless there are objections.

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 04 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Well, that was short lived optimism. I was running the wrong test. So I'm back to looking at whether these two lines in TestIndexWriteExceptions.testExceptionDuringCommit are a test error or not:

    new FailOnlyInCommit(false, FailOnlyInCommit.PREPARE_STAGE), // fail during global field map is written
    new FailOnlyInCommit(true, FailOnlyInCommit.PREPARE_STAGE), // fail after global field map is written

Any help appreciated of course.

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 04 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

I think you need to modify the test and add this as a valid case, like the patch I attached.

[Legacy Jira: Simon Willnauer (@s1monw) on Dec 04 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Fella goes out for a run and comes back and some kind person shows him how to solve the problem. Or maybe solves it for me... I'll run it through the rest of its paces, precommit test and the like, as well as, perhaps, finally figure out what that MockDirectoryWrapper is doing.

Thanks a bunch Simon!

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Final patch with CHANGES.txt, Simon's test changes and orlginal.

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 4a5900728db6a84b4aea42545567a393989f25cf in lucene-solr's branch refs/heads/master from @ErickErickson
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4a59007

LUCENE-8048: Filesystems do not guarantee order of directories updates

[Legacy Jira: ASF subversion and git services on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 0688be6c667760c1a636d5426ee4355d878a47a7 in lucene-solr's branch refs/heads/master from @ErickErickson
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0688be6

LUCENE-8048: Filesystems do not guarantee order of directories updates

[Legacy Jira: ASF subversion and git services on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit 2bfe7ed0f2f9ddfdd2049c60829869bbd9327d80 in lucene-solr's branch refs/heads/branch_7x from @ErickErickson
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2bfe7ed

LUCENE-8048: Filesystems do not guarantee order of directories updates

[Legacy Jira: ASF subversion and git services on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Commit eb434dc47ac96e27676006216dd308d8f45beb81 in lucene-solr's branch refs/heads/master from @ErickErickson
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb434dc

LUCENE-8048: correcting misplaced entry in lucene/CHANGES.txt

[Legacy Jira: ASF subversion and git services on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Sorry for the noise, apparently I double-commented when I merged my local branch and misplaced the update to lucene/CHANGES.txt...

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

latest patch / commit LGTM sorry for the late response.

[Legacy Jira: Simon Willnauer (@s1monw) on Dec 05 2017]

from stargazers-migration-test.

mikemccand avatar mikemccand commented on June 12, 2024

Hey, thank you for fixing up the tests for me, it'd have taken me much longer to figure out. And even then I wouldn't have been very confident.

[Legacy Jira: Erick Erickson (@ErickErickson) on Dec 06 2017]

from stargazers-migration-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.