Giter VIP home page Giter VIP logo

Comments (3)

shawnmjones avatar shawnmjones commented on June 27, 2024

This work can not truly be completed until other work is done because many of the algorithms run by sample require identify (#44), score (#42), order (#43), cluster (#45), and filter (#47).

from hypercane.

shawnmjones avatar shawnmjones commented on June 27, 2024

At this point, sample supports the following (not completely tested) algorithms out of the box:

# hc sample --help                                                                                                                                                                                                                                                        
usage: hc sample [-h] {DSA1,DSA2,DSA3,DSA4,filtered-random,order-by-memento-datetime-then-systematically-sample,simple-search-engine,true-random,systematic,stratified-random,stratified-systematic,random-cluster,random-oversample,random-undersample} ...

'sample' produces a list of exemplars from a collection by applying an existing algorithm

positional arguments:
  {DSA1,DSA2,DSA3,DSA4,filtered-random,order-by-memento-datetime-then-systematically-sample,simple-search-engine,true-random,systematic,stratified-random,stratified-systematic,random-cluster,random-oversample,random-undersample}
                        sampling methods
    DSA1                An implementation of the algorithm from AlNoamany's dissertation.
    DSA2                An implementation of the DSA2 algorithm from Jones' dissertation.
    DSA3                An implementation of the DSA3 algorithm from Jones' dissertation.
    DSA4                An implementation of the DSA4 algorithm from Jones' dissertation.
    filtered-random     Filter the collection for off-topic mementos and exclude near duplicates before randomly sampling from remainder.
    order-by-memento-datetime-then-systematically-sample
                        Select exemplars from a web archive collection by first ordering a colleciton, then systematically sampling every jth memento from the remainder.
    simple-search-engine
                        Search for mementos with a specific pattern, score results by BM25, order by descending score.
    true-random         sample probabilistically by randomly sampling k mementos from the input
    systematic          returns every jth memento from the input
    stratified-random   returns j items randomly chosen from each cluster, requries that the input be clustered with the cluster action
    stratified-systematic
                        returns every jth URI-M from each cluster, requries that the input be clustered with the cluster action
    random-cluster      return j randomly selected clusters from the sample, requires that the input be clustered with the cluster action
    random-oversample   randomly duplicates URI-Ms in the smaller clusters until they match the size of the largest cluster, requires input be clustered with the cluster action
    random-undersample  randomly chooses URI-Ms from the larger clusters until they match the size of the smallest cluster, requires input be clustered with the cluster action

optional arguments:
  -h, --help            show this help message and exit

The arguments for these all appear in Wooey, so it looks like sample works properly in the GUI as well.

I developed a method of annotating BASH scripts with some JSON so that Hypercane is aware of the arguments supported by the BASH script. This seems to have worked well. I will not implement any more algorithms until after we have tested more with NLA.

from hypercane.

shawnmjones avatar shawnmjones commented on June 27, 2024

This works now that caching is enabled. Closing.

from hypercane.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.