mariochampion / roboflow Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 0.0 194 KB

RoboFlow is a semi-automated TensorFlow image classification explorer in a python command line app.

Home Page: https://mariochampion.github.io/roboflow/

License: Apache License 2.0

Python 100.00%

command-line command-line-app image-classification python python-2 tensorflow tensorflow-experiments

roboflow's Introduction

Hello.

My name is Mario Champion.

but you can also call me mariochampion.

roboflow's People

Contributors

Stargazers

Watchers

roboflow's Issues

catch when ZERO images at 'scrapeurl' to download

so youknow it is the site without images versus roboflow just not working

with imgur api, de-dupe imgsrc_list

add headers to all csvs

to improve friendliness to pandas

capture 'basetag 0 0' equivalent params in guided flow and redirect smartly

to either classify or retrain...

had to add line to retrain.py file to get model accuracy. must be better way.

fully acknowledging it is not a best practice, i added a line to tensorflow's retrain.py script because i could not get stdout=subprocess.PIPE or any close permutation to work. and although tf.logging says it prints to stdout, there is an github issue that disputes that, along with stackexchange conversation and my experience.

See tensorflow/tensorflow#3047
See https://stackoverflow.com/questions/4760215/running-shell-command-from-python-and-capturing-the-output
See https://stackoverflow.com/questions/6657690/python-getoutput-equivalent-in-subprocess

therefore, the add_accuracy_to_modeldir() function in robo_tfretrain.py relies on pulling the model accuracy from the "retrained_labels.txt" file generated at the end of tensorflow's retraining. i glommed onto that file creation to add accuracy to a line at the end of that file, then use add_accuracy_to_modeldir() to grab it, and delete it from the file.

the added line is literally the last line in retrain.py's main() function. around line 1144 i added:

f.write(str("_acc"+str(test_accuracy*100)[:5]) + '\n')

BUT I WOULD LOVE to just grab it from tf.logging.info and/or stdout or anything that doesnt require a change to a file i didnt create...

suggestions welcomed!!

run 'retrain' with passed parameters, so can be automated

right now, the retrain stage is NOT totally automated, but it could be. this would allow a loop of slight retraining parameter permutations to run on the same training data with no further human interaction.

this is certainly near the top of the TODO list, but i didnt want to hold up posting to github for other feedback and potential contributions.

look into roboflow.py, especially the beginning of main() and preflightchecks() functions...

classifymodel_setup -- sort by highest accuracy, default [enter] to that model

just to friendly/speed it up

separate 'scrapeurl' and assc regex to allow imgur, pinterest, and other sites

right now, web.stagram.com is pretty baked into the flow via the 'scrapeurl' in the config file and regexes which are specific to that site's structure.

ideally, these would be pulled into selectable structure to allow use of other tagged-image sites, such as imgur and pinterest, etc.

see about speeding up downloads with 'parallelism'

use http://chriskiehl.com/article/parallelism-in-one-line/

dcrflow wrong when NOT 0 0 retrain but '1 x retrain' because...

thinks since you are DLing you want to classify. so easy fix is go 0 0 , but need to catch when dont.
see roboflow.py line 1016

related to issue on dcr conditionals
'flowlist': ['retrain', 'download'], 'd_c_r_flow': 'dl_class_retrain'

but should be just dl_retrain at the end. this occurs when you have NEVER retrained and under adv usage enter a "retrain" param, as you should.

cli entry params seems to skip retrain when called out

python roboflow robots 100 robot retrain
-- woops this is actually how ist is supposed to work, duh!

add "automatic" or somesuch param to merge BOTH classify_top and retrain_defaults

so you can choose most accurate class model and your retrain param setup defaults from the CLI (and thus automate with a script), such as:
python roboflow.py robots 100 roboart automatic

check on start that an imgurapi clientid exists, otherwise it wont work.

so tell them

eek! webstagram stopped working!!

https://web.stagram.com/tag/robot
https://web.stagram.com/tag/robotart

etc etc -- meaning downloads not working. time to re-prioritize the issues!

clarify classify msging when 0 0 params used

because it classifies only when you download, if you 0 0 the params it just skips it.

needs to tell you that...

(altho smallchance maybe you wanna run diff classifier on already-downloadeds. but, that is for now treated as an edge case, and in which case dl 1 more and then it ll work... but could make it friendlier by telling you that (but w/out me trying to catch EVERY edge case)

command line args rework with 'click' or perhaps 'fire' library

http://click.pocoo.org/5/

de-dupe images from multiple runs of a classifier on same searchtag.

because you can run different classifiers on the same unsorted_{searchtag} images to see how they perform, you can get duplicate images in your sorted_{timestamp} folders, when you run the a classifier multiple times on the SAME searchtag images at different times.

for example: download 100 images tagged robotart and classify them. then download 100 more and run classify them again. it will look to the same root unsorted_robotart dir, and classify all 200 images (into a different sorted_{timestamp} dir as the time will have changed) BUT if you then just run a retrain with 'harvest' enabled, it ll take ALL the high-confidence images from the as yet un-harvested sorted_* dirs, and you get dupes in your training_photos dir.

potential solution: when running a classifier, look through any unharvested basetag/sorted_{timestamp} for the same image BEGINNING, since image names get a score appended, the exact image name wont likely exist. (ie, robotart_2_1234.jpg becomes, under one classifier, robotart_2_1234_875.jpg for a 87.5% score from a classifier and robotart_2_1234_825.jpg from another. so there is certainly a way to check the filename start for dupes, just havent done it.

current (temp) workaround: manually delete dupes by looking at filenames for a MAX value of a previous classifier run, and delete the overlap BEFORE a harvest run.

move retrain_dict setup to retrain file

just so its cleaner in roboflow.py and it kinda makes sense that retrain setup code be in the main retrain-ing python file.

clean up d_c_r_flow conditionals

d = download
c = classify
r = retrain

and need to account for options of 1,1-2,1-2-3,1-3,3 and this is handled thru a bunch of d_c_r_flow setups and conditionals in roboflow.py

a cleansing/clarifying refactor would be good ; )