Giter VIP home page Giter VIP logo

Comments (11)

grasmanek94 avatar grasmanek94 commented on May 27, 2024 2

@ErickTamayo I've discussed the issue with elasticsearch devs (both the php and application) in elastic/elasticsearch#25156 .

This issue still applies now as I think it's an issue with this package.

In short: As of now, whenever we save a model into the database or update it, it's always successful even when the indexing fails (I saw that models are updated using bulk update if this matters). The hooks that this package sets up, get run but the results get capsulated away from our application, which makes error-detection hard, and no exception is being thrown when indexing fails, which makes it impossible to catch such errors. Can you suggest steps to get access to the json response body when saving a model, as mentioned by @polyfractal in elastic/elasticsearch#25156 ?

Else I suggest that this package checks the response body when calling searchable() and if any errors are detected in the response body, that it throws the response body as message.

We have updated everything to the newest versions that we can support:

"require": {
    "laravel/framework": "5.4.*",
    (...)
    "tamayo/laravel-scout-elastic": "^3.0",
    (...)
},

(as for the scout version, it is determined by the dependency list, as of now only this package uses scout in our application)

If it should throw on indexing when the bulk update response body contains errors, can you tell us where we could look / determine why we don't get exceptions when an invalid index operation is issued?

Edit:
Me and the other dev have tracked the code down to here

elasticsearch-php does not throw on bulk update, and this package does nothing with the result from the bulk operation.

I would suggest to rewrite the code to something that checks the result and lets the user know in any way, in my opinion with an exception, like:

public function update($models)
{
    $params['body'] = [];
    $models->each(function($model) use (&$params)
    {
        $params['body'][] = [
            'update' => [
                '_id' => $model->getKey(),
                '_index' => $this->index,
                '_type' => $model->searchableAs(),
            ]
        ];
        $params['body'][] = [
            'doc' => $model->toSearchableArray(),
            'doc_as_upsert' => true
        ];
    });
    $result = $this->elastic->bulk($params);
    if (isset($result['errors']) === true && $result['errors'] === true)
    {
        throw new Exception(json_encode($result));
    }
}

or either expose an hasErrors()/getErrors() or similar function.

from laravel-scout-elastic.

grasmanek94 avatar grasmanek94 commented on May 27, 2024 1

Hey, sorry it took so long, we analysed the issue and it's a bug in Elasticsearch engine / application. (it took us a long time to update everything to newest versions)

See elastic/elasticsearch#25156

It seems that elasticsearch-php does not throw on a HTTP 200 OK status code, but does on everything else. What actually happens here is that elasticsearch returns a 200 OK code when it fails (ironically, in the JSON the status is 400).

We decided to take out the big guns and sniff the network and compare the traffic.

from laravel-scout-elastic.

ErickTamayo avatar ErickTamayo commented on May 27, 2024

Thats weird, I'm getting exceptions from ES just fine, I do not enclose anything in a try catch can you tell me which version of ES are you using?

from laravel-scout-elastic.

grasmanek94 avatar grasmanek94 commented on May 27, 2024

"tamayo/laravel-scout-elastic": "^1.0", <- That's our composer.json require entry for your plugin. Is this fixed in 2.x and 3.x?

from laravel-scout-elastic.

ErickTamayo avatar ErickTamayo commented on May 27, 2024

Hmm not sure, as I don't recall doing changes on that at all.
What version of Laravel/Scout are you using?

from laravel-scout-elastic.

grasmanek94 avatar grasmanek94 commented on May 27, 2024

5.3 at the moment but it does come from a laravel 4 project (that has been manually upgraded / ported). That is was L4 shouldn't matter. I'll try to investigate further asap when I'm at work.

from laravel-scout-elastic.

ErickTamayo avatar ErickTamayo commented on May 27, 2024

Alright, If you can please update to the latest version as Scout have been updated too

from laravel-scout-elastic.

polyfractal avatar polyfractal commented on May 27, 2024

Just chiming in to say that I agree with @grasmanek94's suggested code change (or something similar). Bulk requests will only throw exceptions when something about the coordinating node goes wrong, like the node being unavailable or a gross parsing exception (like if the bulk is improperly formatted).

The rest of the errors a user may care about will be embedded in the bulk response. Ignoring bulk responses can be especially dangerous because some documents may return with EsRejectedExecution exceptions, which is basically saying "the node is too busy and the bulk queue is full, you need to retry this document later". If you don't catch those, you can easily drop data on the floor and not realize it unless you do a comparison between source and Elasticsearch later.

from laravel-scout-elastic.

LMCom avatar LMCom commented on May 27, 2024

@grasmanek94 Elasticsearch not throwing any exceptions has caused me a lot of trouble. When some documents (of about 80000) were not indexed, it was pretty tiresome to find the reason. I see, you tracked the code down to there.
I went a little deeper and found this in the repo elastic/elasticsearch-php.

$future->promise()->then(
            //onSuccess
            function ($response) {
                $this->retryAttempts = 0;
                // Note, this could be a 4xx or 5xx error
            },
            //onFailure
            function ($response) {
                // Ignore 400 level errors, as that means the server responded just fine
                if (!(isset($response['code']) && $response['code'] >=400 && $response['code'] < 500)) {
                    // Otherwise schedule a check
                    $this->connectionPool->scheduleCheck();
                }
            }
        );
return $future;

So the devs are fully aware, that there could be errors, and ignore them. In the failure callback, all status codes in the range 400-499 are ignored. If the status code is outside this range, the implementation of scheduleCheck() of \Elasticsearch\ConnectionPool\AbstractConnectionPool is called. I looked through the implementations, but they do pretty much nothing, and definitely no error handling or exception throwing.
To put it in a nutshell: All requests fail silently. 🤐

But wait, it gets worse:
I added some logging into the aforementioned code to log all responses. It turned out, even when the requests were failing (I reproduced that via curl on the command line), they were logged with properties like successful: 1, failed: 0, status: 201 and so on. There wasn't a single status 400 in the log after trying to insert some 80000 documents one-by-one, of which some failed. Yet, when I tried the same via curl on the commandline, with the documents I identified as erroneous, I got a status 400, because of a mapping error.
It all totally made sense, and I was able to pinpoint the problems and fix all documents/mappings, although it was a chore. If only there was some form of error reporting!

from laravel-scout-elastic.

polyfractal avatar polyfractal commented on May 27, 2024

Hi @grasmanek94, I'm afraid you're interpreting that bit of code incorrectly. The onFailure section in that snippet only deals with if the connection pool should schedule a liveness check. E.g. if there was a 5xx error (something on the server went wrong), it's likely that a node died or the master is down or something and we should allow connection pools a chance to rebuild their list of nodes. The sniffing pool will go fetch a new list of nodes, the static connection pool is just a no-op, etc.

The way it works is like this:

  1. We default to using RingPHP's CurlMultiHandler to handle the request
  2. When the Transport object is built, we wrap the handler (CurlMultiHandler or something else, if the user overrides it).
  3. This wrapping does things like logging, handling network-level curl exceptions, non-network fatal errors, as well as HTTP status code errors. You'll notice that if we encounter a 4xx or 5xx HTTP status code error, those are turned into actual exceptions and thrown.
  4. Since all that happens in a wrapper around the handler, by time we get to the snippet you linked we've already executed the request, dealt with the appropriate logging/exception throwing, etc and we are merely chaining an additional execution to the promise, which does the connection pool rechecks if it's a non-4xx error.

I know the control flow here is a bit convoluted, mostly due to refactoring in async code at some point. I'm sure it could be cleaned up a bit.

What was happening in this ticket, and likely in your code, is what I described in #37 (comment). The status code of an Elasticsearch bulk request only relates to whether the bulk itself was successful, meaning acknowledged and processed by the server. All the bulk "action items" could have failed in the bulk (each with their own individual error code in the response body) and the API call itself will still have a successful HTTP code because the API call worked... just not the contents of the request.

You mentioned for example that you were indexing them one-by-one (e.g. not a bulk) so I suspect the bulk was slightly different from your single-indexing test, or your tests were not the same as the curl.

Having received many 4xx and 5xx errors myself while using the client, I can definitely say they get thrown :)

from laravel-scout-elastic.

LMCom avatar LMCom commented on May 27, 2024

Hi @polyfractal, you mentionend the wrong person, I guess. ;)
Thanks for taking the time to explain in such detail. In the meantime, I noticed the thing with the bulk requests. That's absolutely understandable.
But when indexing one-by-one via laravel/scout, e .g. $singleModel->searchable(), it still makes a bulk request with a single item, because a single model is turned into a collection first. See https://github.com/laravel/scout/blob/6.0/src/Searchable.php#L135
And that's, why we never get notified about failed inserts/updates.

I second @grasmanek94 's proposal to throw an exception in the update-method or expose a method to get errors from the last bulk operation.

from laravel-scout-elastic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.