Giter VIP home page Giter VIP logo

hash-dataset's Introduction

hash-dataset

Implementing a hashing technique to compare large scale, out of core machine learning datasets

Instructions

To compare two datasets:

python3 src/dataset-hash.py data/small-dataset-1 data/small-dataset-2

Available options:

python3 src/hash-dataset.py [OPTIONS] [FILES]

-m : Display samples that are matching
-n : Display samples that are not matching

hash-dataset's People

Contributors

rahulbshrestha avatar

Watchers

 avatar

hash-dataset's Issues

test

Example benchmark config script.

Arguments set here are passed to each task.

Specifications inside the tasks taking precedence.

tpath: examples/tasks/task_vlcs.py # dataset

list of all domains that are used as test domain

in a leave-one-out setup, i.e. for each run,

one domain from this list is chosen as test domain

while training is performed on all other domains

of the specified dataset.

TODO domains is a misleading name here.

domains:

  • caltech

- sun

- labelme

output_dir: zoutput/benchmarks/demo_benchmark

number of hyperparameter samples per task.

Thus, the total runs of each task are given

by len(domains) * num_param_samples * num_seeds (see below)

num_param_samples: 2

epos: 2
batchsize: 2

the seed is increased by +1 until it reaches endseed.

endseed is included, so in total startseed - endseed + 1

different seeds are used to estimate the stochastic

variance.

startseed: 1
endseed: 2 # currently included

Each node containing the aname property is considered as a task.

Task1: # name

Parameters that are fixed for all runs of this task.

aname: diva
nname: alexnet
nname_dom: alexnet

specification of parameters that shall vary

between runs to analyze the sensitivity

of this task w.r.t. these parameters.

hyperparameters:
# Each parameter must contain:
# - distribution (uniform | loguniform | normal | lognormal)
# - min and max if distribution is uniform or loguniform
# - mean and std if distribution is normal or lognormal

# step is optional and defines discrete parameters
# with the given step size.
# If min/mean and step are integer valued,
# the hyperparameter is ensured to be integer valued too.
# Otherwise, it is a float and rounding errors can occur.
gamma_y:
  min: 10e4
  max: 10e6
  step: 1
  distribution: uniform

gamma_d:
  min: 1e4
  max: 1e6
  step: 1
  distribution: uniform

Task2:
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.