because you can run different classifiers on the same unsorted_{searchtag} images to see how they perform, you can get duplicate images in your sorted_{timestamp} folders, when you run the a classifier multiple times on the SAME searchtag images at different times.
for example: download 100 images tagged robotart and classify them. then download 100 more and run classify them again. it will look to the same root unsorted_robotart dir, and classify all 200 images (into a different sorted_{timestamp} dir as the time will have changed) BUT if you then just run a retrain with 'harvest' enabled, it ll take ALL the high-confidence images from the as yet un-harvested sorted_* dirs, and you get dupes in your training_photos dir.
potential solution: when running a classifier, look through any unharvested basetag/sorted_{timestamp} for the same image BEGINNING, since image names get a score appended, the exact image name wont likely exist. (ie, robotart_2_1234.jpg becomes, under one classifier, robotart_2_1234_875.jpg for a 87.5% score from a classifier and robotart_2_1234_825.jpg from another. so there is certainly a way to check the filename start for dupes, just havent done it.
current (temp) workaround: manually delete dupes by looking at filenames for a MAX value of a previous classifier run, and delete the overlap BEFORE a harvest run.