Giter VIP home page Giter VIP logo

Comments (6)

tw4l avatar tw4l commented on June 17, 2024 1

@SuaYoo in the draft PR, this currently only sorts on the new order if you explicitly pass the ?sortBy=lastCrawlTime param (modified the existing sort by field instead of adding a new one, making it effectively lastUpdated).

I see your point though and will look into changing it to return a lastUpdated field and use that instead

from browsertrix.

tw4l avatar tw4l commented on June 17, 2024

Another option:

  • Add currCrawlStartTime to db
  • Sort desc by {currCrawlStartTime -1, lastCrawlTime -1}, which should return current crawls first, then finished crawls that have completed ordered descending by lastCrawlTime. A multi-field index would speed this up as well.

or a variant: sort on lastCrawlStartTime (as current crawl), lastCrawlTime, and status.

Will try a few options and see what is the simplest solution with the desired result.

from browsertrix.

SuaYoo avatar SuaYoo commented on June 17, 2024

Sorry for the late response--to clarify, is this work handling default sorting by the backend, or adding a new field lastUpdated? The latter would be preferred, so that the frontend both sort by the value and show the value without having to re-calculate it.

from browsertrix.

SuaYoo avatar SuaYoo commented on June 17, 2024

Can sortBy accept a secondary/additional sort fields? Then we can keep lastCrawlTime and the frontend can request to filter sortBy=lastCrawlTime,finished,started,modified,created.

from browsertrix.

tw4l avatar tw4l commented on June 17, 2024

Can sortBy accept a secondary/additional sort fields? Then we can keep lastCrawlTime and the frontend can request to filter sortBy=lastCrawlTime,finished,started,modified,created.

We can do that! The trade-off would be either we need to create a bunch of indices for the possible combinations, or lookups might be slower than they would with a predictable combination and accompanying index.

from browsertrix.

tw4l avatar tw4l commented on June 17, 2024

@SuaYoo if you're cool with it I think I'd prefer to just set a lastUpdated field in the db and build an index around that to keep the query and sorting fast. Otherwise we might get back into the slow lookups territory we moved away from recently.

Sorting options for workflows would then be: ("created", "modified", "firstSeed", "lastCrawlTime", "lastUpdated"), with lastUpdated being either the current crawl start time if a workflow is running or the finished time of the last crawl if not.

from browsertrix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.