Comments (3)
This work can not truly be completed until other work is done because many of the algorithms run by sample
require identify
(#44), score
(#42), order
(#43), cluster
(#45), and filter
(#47).
from hypercane.
At this point, sample
supports the following (not completely tested) algorithms out of the box:
# hc sample --help
usage: hc sample [-h] {DSA1,DSA2,DSA3,DSA4,filtered-random,order-by-memento-datetime-then-systematically-sample,simple-search-engine,true-random,systematic,stratified-random,stratified-systematic,random-cluster,random-oversample,random-undersample} ...
'sample' produces a list of exemplars from a collection by applying an existing algorithm
positional arguments:
{DSA1,DSA2,DSA3,DSA4,filtered-random,order-by-memento-datetime-then-systematically-sample,simple-search-engine,true-random,systematic,stratified-random,stratified-systematic,random-cluster,random-oversample,random-undersample}
sampling methods
DSA1 An implementation of the algorithm from AlNoamany's dissertation.
DSA2 An implementation of the DSA2 algorithm from Jones' dissertation.
DSA3 An implementation of the DSA3 algorithm from Jones' dissertation.
DSA4 An implementation of the DSA4 algorithm from Jones' dissertation.
filtered-random Filter the collection for off-topic mementos and exclude near duplicates before randomly sampling from remainder.
order-by-memento-datetime-then-systematically-sample
Select exemplars from a web archive collection by first ordering a colleciton, then systematically sampling every jth memento from the remainder.
simple-search-engine
Search for mementos with a specific pattern, score results by BM25, order by descending score.
true-random sample probabilistically by randomly sampling k mementos from the input
systematic returns every jth memento from the input
stratified-random returns j items randomly chosen from each cluster, requries that the input be clustered with the cluster action
stratified-systematic
returns every jth URI-M from each cluster, requries that the input be clustered with the cluster action
random-cluster return j randomly selected clusters from the sample, requires that the input be clustered with the cluster action
random-oversample randomly duplicates URI-Ms in the smaller clusters until they match the size of the largest cluster, requires input be clustered with the cluster action
random-undersample randomly chooses URI-Ms from the larger clusters until they match the size of the smallest cluster, requires input be clustered with the cluster action
optional arguments:
-h, --help show this help message and exit
The arguments for these all appear in Wooey, so it looks like sample
works properly in the GUI as well.
I developed a method of annotating BASH scripts with some JSON so that Hypercane is aware of the arguments supported by the BASH script. This seems to have worked well. I will not implement any more algorithms until after we have tested more with NLA.
from hypercane.
This works now that caching is enabled. Closing.
from hypercane.
Related Issues (20)
- Finish Hypercane GUI script for report action HOT 2
- Finish Hypercane GUI scripts for filter include-only and exclude actions HOT 2
- Create Hypercane GUI script for identifying Memento objects based on collection IDs HOT 2
- Create a Hypercane GUI convenience script that runs sample, report, and synthesize commands for a Raintale story HOT 1
- Create a Linux install for Hypercane HOT 1
- Replace Hypercane GUI's Download button HOT 8
- Remove Wooey's Re-run and Resubmit buttons from the Hypercane GUI
- Create a Linux installer for Hypercane HOT 2
- Update raiseversion.sh to also update the version and date-released on CITATION.cff
- Fix poor responsiveness for script placement in GUI HOT 1
- Hypercane does not tell the user when the HC_STORAGE_CACHE variable has not been set HOT 1
- Write Documentation for Hypercane WUI HOT 2
- Hypercane WUI suspends rendering in Firefox
- Move Hypercane from MongoDB to PostgreSQL for storage and caching
- Add provenance fields to the WARCs created by synthesize
- Fix typo in the DSA1 implementation
- Improve the HALG file format HOT 1
- Add a command for managing the cache
- Add functionality to synthesize warcs from archive.today HOT 2
- Synthesize warc using regular vs raw stream HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hypercane.