wusaifei / garbage_classify Goto Github PK

本文新增添分类，检测，换脸技术等学习教程，各种调参技巧和tricks，卷积结构详细解析可视化，注意力机制代码等详解！本次垃圾分类挑战杯，目的在于构建基于深度学习技术的图像分类模型，实现垃圾图片类别的精准识别，大赛参考深圳垃圾分类标准，按可回收物、厨余垃圾、有害垃圾和其他垃圾四项分类。本项目包含完整的分类网络、数据增强、SVM等各种分类增强策略，后续还会继续更新新的分类技巧。

Python 100.00%

garbage_classify's Issues

TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

这个权重（efficientnet-b5_notop.h5）在哪里？

关于准确率和验证集和测试集

想问一下大佬，你这个实验结果的准确率是验证集上的还是测试集上的，你测试集用完一次就不用了吗

Traceback (most recent call last):
File "/tmp/pycharm_project_329/run.py", line 166, in
tf.compat.v1.app.run()
File "/home/jx/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/jx/.local/lib/python3.5/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/jx/.local/lib/python3.5/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/tmp/pycharm_project_329/run.py", line 122, in main
check_args(FLAGS)
File "/tmp/pycharm_project_329/run.py", line 61, in check_args
raise Exception('FLAGS.num_classes error, '
Exception: FLAGS.num_classes error, should be a positive number associated with your classification task

您帮我看一下，主要我用的是别人服务器的环境，tensflow是1.14，numpy是1.14.5，python是3.5，我怕新建个环境还得重新安装包，或者会出现缺少各种的麻烦。谢谢

关于运行run.py文件的问题

我在运行时出现了File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
请问该如何解决这个问题呢

找不到指定模块

你好作者，我按照你的要求安装了所有的环境但是训练的时候找不到指定模块是怎么回事
“
Traceback (most recent call last):
File "run.py", line 17, in
import tensorflow as tf
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow\python_init_.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "C:\ProgramData\Anaconda3\envs\tf_1_13_1\lib\imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: DLL load failed: 找不到指定的模块。
”

另外我想问一下这个训练里面的“--train_url='./model_snapshots‘’”是什么意思没找到这个文件夹

SVM作用是什么？为什么要加svm？

我是新手小白，不太懂这里边的道理。
是不是说先用efficientnet提取特征，然后用SVM代替原来的Linear进行分类分类？这样子的精度会提高多少？

你好，我按照你的步骤运行python run.py --data_url='../datasets/garbage_classify/train_data' --train_url='./model_snapshots' --deploy_script_path='./deploy_scripts'，已经cd到run.py的目录

你好，我按照你的步骤运行python run.py --data_url='../datasets/garbage_classify/train_data' --train_url='./model_snapshots' --deploy_script_path='./deploy_scripts'，已经cd到run.py的目录，运行结果提示：
File "run.py", line 68, in check_args
raise Exception('FLAGS.data_url: %s is not exist' % FLAGS.data_url)
Exception: FLAGS.data_url: './garbage_classify/train_data' is not exist
可是这个训练集是在这个目录，请问要怎么解决

训练10轮发现acc一直为0.047，没变过

使用的是Xception模型，训练10轮acc没改变。请教一下博主 @saifeiwu

你好，请问训练需要多大内存？我把batchsize设为10还是会OOM

运行环境

你好可以说一下运行的环境吗

关于准确率的小问题

请问一下大佬，你这个准确率是整个数据集识别正确的除以整个数据集的数量，还是40个小类识别准确率的平均值啊

关于在第一轮回调时的报错信息

在训练了1个epoch后，开始回调时出现了下面这个报错信息：

Traceback (most recent call last):
File "run.py", line 166, in
tf.app.run()
File "e:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 157, in main
train_model(FLAGS)
File "D:\garbage_classification\garbage_classify-master\train.py", line 134, in train_model
shuffle=False
File "e:\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "e:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "e:\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 242, in fit_generator
workers=0)
File "e:\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "e:\Anaconda3\lib\site-packages\keras\engine\training.py", line 1791, in evaluate_generator
verbose=verbose)
File "e:\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 341, in evaluate_generator
callbacks._call_begin_hook('test')
File "e:\Anaconda3\lib\site-packages\keras\callbacks\callbacks.py", line 105, in _call_begin_hook
self.on_test_begin()
File "e:\Anaconda3\lib\site-packages\keras\callbacks\callbacks.py", line 239, in on_test_begin
callback.on_test_begin(logs)
AttributeError: 'WarmUpCosineDecayScheduler' object has no attribute 'on_test_begin'

我的运行环境和作者的是一样的，用笔记本跑的代码，所以把调用GPU的代码注释了
报错信息上说WarmUpCosineDecayScheduler这个类缺少on_test_begin，我看了一下作者重写的这个回调类里确实没有，但是我单独运行这个类检查是没问题的，不知道该怎么改，求解答，非常感谢！

TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

`(tfenv) D:\DeepLearnning\Code\RubbishSort\garbage_classify-master>python run.py --mode=eval --eval_pb_path=./model_snapshots/model --test_data_url=./datasets/garbage_classify/tr
ain_data --num_classes=40
Using TensorFlow backend.
2020-03-12 10:09:15.415268: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-03-12 10:09:16.536963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce MX250 major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
totalMemory: 2.00GiB freeMemory: 1.62GiB
2020-03-12 10:09:16.571051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 10:09:25.865212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 10:09:25.879351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 10:09:25.885838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 10:09:25.956745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1364 MB memory)
-> physical GPU (device: 0, name: GeForce MX250, pci bus id: 0000:06:00.0, compute capability: 6.1)
WARNING:tensorflow:From D:\DeepLearnning\Code\RubbishSort\garbage_classify-master\eval.py:131: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be re
moved in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function
for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.chec
kpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-03-12 10:09:38.781264: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 10:09:38.846031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 10:09:38.874307: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 10:09:38.876455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 10:09:38.889822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1364 MB memory)
-> physical GPU (device: 0, name: GeForce MX250, pci bus id: 0000:06:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\client\session.py", line 1092, in _run
subfeed, allow_tensor=True, allow_operation=False)
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\framework\ops.py", line 3478, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\framework\ops.py", line 3557, in _as_graph_element_locked
raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 169, in
tf.app.run()
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 165, in main
eval_model(FLAGS)
File "D:\DeepLearnning\Code\RubbishSort\garbage_classify-master\eval.py", line 215, in eval_model
test_single_model(FLAGS)
File "D:\DeepLearnning\Code\RubbishSort\garbage_classify-master\eval.py", line 159, in test_single_model
pred_score = sess1.run([output_score], feed_dict={input_images: img})
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "D:\DeepLearnning\anaconda3\envs\tfenv\lib\site-packages\tensorflow\python\client\session.py", line 1095, in _run
'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.
`
我在--mode=eval估值的时候遇到了这个问题，请教博主！

有没有训练好的模型参考下，这边训练太慢了

关于GPU配置的咨询以及数据增强

楼主你好
我这边发现用阿里云服务器 nvidia的P100 16G显存的话batch size貌似8都不行，只能到6， Ubuntu的系统，你做training的时候用的啥系统，啥配置？

另外关于数据增强的部分是不是只再training的时候才用到，还是做最后模型检测的时候，也像数据归一化一样可以使用。

请问作者有训练好的模型吗？想写一个网页应用

请教run.py出现valueError，

ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (16, 1)

作者你好，在save_pb的运行里，h5文件存在也会报错，不存在也会报错，这是为什么

我看了run.py的freeze_weight_path那段代码，没明白这段代码的意思
if not os.path.exists(FLAGS.freeze_weights_file_path):
raise Exception('FLAGS.freeze_weights_file_path: %s is not exist' %
FLAGS.freeze_weights_file_path)
if os.path.isdir(FLAGS.freeze_weights_file_path):
raise Exception('FLAGS.freeze_weights_file_path must be a file path, not a directory, %s ' %
FLAGS.freeze_weights_file_path)
if os.path.exists(FLAGS.freeze_weights_file_path.rsplit('/', 1)[0] + '/model'):
raise Exception('a model directory is already exist in ' +
FLAGS.freeze_weights_file_path.rsplit('/', 1)[0]
+ ', please rename or remove the model directory ')

当训练完运行eval.py测试，出现TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

Using TensorFlow backend.
2020-03-23 14:30:41.084890: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-23 14:30:44.398776: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5638b17ca080 executing computations on platform CUDA. Devices:
2020-03-23 14:30:44.398829: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-03-23 14:30:44.398840: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-03-23 14:30:44.405480: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2020-03-23 14:30:44.408096: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5638b193fdc0 executing computations on platform Host. Devices:
2020-03-23 14:30:44.408128: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2020-03-23 14:30:44.408312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2020-03-23 14:30:44.408403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:02:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2020-03-23 14:30:44.408541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2020-03-23 14:30:44.411191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-23 14:30:44.411219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1
2020-03-23 14:30:44.411233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N
2020-03-23 14:30:44.411244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N
2020-03-23 14:30:44.411375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-23 14:30:44.411927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
WARNING:tensorflow:From /opt/shakey/imageclass/garbage_classify-master/eval.py:134: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
WARNING:tensorflow:From /opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-03-23 14:31:22.029904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1
2020-03-23 14:31:22.030413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-23 14:31:22.030436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1
2020-03-23 14:31:22.030449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N
2020-03-23 14:31:22.030459: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N
2020-03-23 14:31:22.030628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-23 14:31:22.030908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1092, in _run
subfeed, allow_tensor=True, allow_operation=False)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3557, in _as_graph_element_locked
raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 168, in
tf.app.run()
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 165, in main
eval_model(FLAGS)
File "/opt/shakey/imageclass/garbage_classify-master/eval.py", line 218, in eval_model
test_single_model(FLAGS)
File "/opt/shakey/imageclass/garbage_classify-master/eval.py", line 162, in test_single_model
pred_score = sess1.run([output_score], feed_dict={input_images: img})
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1095, in _run
'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

关于mean_std.py

请问楼主大神，mean_std.py最后执行的结果也没有保存，那它对最后识别的准确率有什么影响，没看懂mean_std.py是干什么的，望您解答，谢谢您！

src_v2官方Baseline5个epoch验证集准确度98.5%？

作者您好！华为云官网比赛结束后没有公布测试集。我训练了一下官方给的baseline代码（src_v2.zip的代码），数据集用的是garbage_classify_v2.zip, Baseline只训练了5个epoch，训练集：验证集=0.75:0.25，结果train_acc到了99%，val_acc到了98.5%，我不知道这个是什么原因，想请教一下您！

准确率达不到预期

用链接里的数据集训练，30epoch为0.825的准确率，并没有这么高，batch由于内存限制设置的4，其他都没改动，能帮忙找一下原因嘛？

单gpu可以跑吗？

垃圾分类本地运行

作者您好我在本地运行您的run.py文件，用的efficentnet-b5_notop.h5权重文件，input_size=228(没有用456，GPU原因)，batch_size=8，epoch=30 但是loss下降的很慢，训练集acc能到0.95，但是测试集acc基本在70%左右不怎么变化请问下是什么原因呢

关于数据增强的问题

你好，数据增强不是通过旋转平移等操作来增加原本的数据集吗，我看到aug.py和data_gen.py里面虽然进行了数据增强，但是是直接替换了原始读取的图片，数据集的总量还是没有变的，对这里有点疑惑？

请问作者运行mean_std.py报这个错要怎么解决

运行mean_std.py报错
Traceback (most recent call last):
File "D:/PycharmProjects/garbage_classify-master/mean_std.py", line 31, in
means[i] += img[:, :, i].mean()
IndexError: invalid index to scalar variable.

有人一起做吗，咱们建个微信群讨论吧，

AttributeError: 'BaseSequence' object has no attribute 'shape'

AttributeError: 'BaseSequence' object has no attribute 'shape'
这个类有问题？

Fatal Python error: Segmentation fault Thread 0x00007f9b32efe700 (most recent call first):

Fatal Python error: Segmentation fault

Thread 0x00007f9b32efe700 (most recent call first):

Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.
这个问题怎么解决啊，望大佬赐教

ModuleNotFoundError: No module named 'keras'

File "run.py", line 160, in main
from train import train_model
File "G:\garbage_classify-master\train.py", line 6, in
import keras.backend
ModuleNotFoundError: No module named 'keras'

你们有遇到过这种情况吗，keras我确认是安装了的。

组归一化（GroupNormalization）代替批量归一化，会影响预训练权值吗

多gpu并行训练问题

以下是多gpu并行训练的loss：

在第一个epoch的时候loss 和对应的 acc是正常的，到第二个epoch有问题，怀疑是合并参数的时候有问题？？

作者你好customize_service这个文件的一个库 from model_service.tfserving_model_service import TfServingBaseService 在你的源码中没有

作者你好customize_service这个文件的一个库 from model_service.tfserving_model_service import TfServingBaseService 在你的源码中没有，请问这是你自己定义的么，在你的源码项目中没有这个

efficientnet-b5_notop.h5权重下载

@saifeiwu 博主，请问你还有efficientnet-b5_notop.h5权重吗？我下载了好几天也没成功。

加载efficientnet-b5_notop.h5

楼主，请问你还有这个.h5权重吗？我下载了几天也没下载下来，需要你的帮助，谢谢！QQ598770323

关于准确度和loss值

总共10类，每类几百张照片，val accuracy 在第三轮就达到了1.0，之后一直不变。test accuracy 在前几轮不断上升，上升到第8轮为0.69，之后不断震荡、下降，50轮时，test accuracy 为0.3。loss下降缓慢，到第10轮之后下降的特别特别慢，几乎保持不变。请问楼主，这是什么原因。

关于.h5文件

您好，在代码中有很多,h5文件，是所有的.h5文件都要下载吗？谢谢您

还有这个地方怎么处理？需要下载吗。谢谢您

BASE_WEIGHTS_PATH = (
'https://github.com/Callidior/keras-applications/'
'releases/download/efficientnet/')

WEIGHTS_HASHES = {
'efficientnet-b0': ('163292582f1c6eaca8e7dc7b51b01c61'
'5b0dbc0039699b4dcd0b975cc21533dc',
'c1421ad80a9fc67c2cc4000f666aa507'
'89ce39eedb4e06d531b0c593890ccff3'),
'efficientnet-b1': ('d0a71ddf51ef7a0ca425bab32b7fa7f1'
'6043ee598ecee73fc674d9560c8f09b0',
'75de265d03ac52fa74f2f510455ba64f'
'9c7c5fd96dc923cd4bfefa3d680c4b68'),
'efficientnet-b2': ('bb5451507a6418a574534aa76a91b106'
'f6b605f3b5dde0b21055694319853086',
'433b60584fafba1ea3de07443b74cfd3'
'2ce004a012020b07ef69e22ba8669333'),

运行run.py报错

在window下运行的，运行run.py文件，错误为：OSError: Unable to open file (unable to open file: name = '/home/work/user-job-dir/src/efficientnet-b5_notop.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
找不到这个文件，求解答

训练一个epoch之后报错 TypeError: must be real number, not NoneType

你好，我把model_fn改成了自己的模型，但是训练了一个epoch之后报这个错误：
Traceback (most recent call last):
File "run.py", line 154, in
tf.app.run()
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run.py", line 145, in main
train_model(FLAGS)
File "/data2/hlf/garbage_classify/train.py", line 277, in train_model
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/keras/engine/training_generator.py", line 260, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/dell/anaconda3/envs/tensorflow-1.14-python-3.6/lib/python3.6/site-packages/keras/callbacks/callbacks.py", line 152, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/data2/hlf/garbage_classify/train.py", line 210, in on_epoch_end
TypeError: must be real number, not NoneType
不知道是什么问题呢？想请教一下的！！！

def on_epoch_end(self, epoch, logs={}):
self.losses.append(logs.get('loss'))
self.val_losses.append(logs.get('val_loss'))
save_path = os.path.join(self.FLAGS.train_local, 'weights_%03d_%.4f.h5' % (epoch,logs.get('val_acc'))) # 210行
self.model.save_weights(save_path)
if self.FLAGS.train_url.startswith('s3://'):
save_url = os.path.join(self.FLAGS.train_url, 'weights_%03d_%.4f.h5' % (epoch, logs.get('val_acc')))
shutil.copyfile(save_path, save_url)
print('save weights file', save_path)
if self.FLAGS.keep_weights_file_num > -1:
weights_files = glob(os.path.join(self.FLAGS.train_local, '*.h5'))
if len(weights_files) >= self.FLAGS.keep_weights_file_num:
weights_files.sort(key=lambda file_name: os.stat(file_name).st_ctime, reverse=True)

1 训练结束后进行评估，运作python run.py --mode=eval --eval_pb_path='./model_snapshots/model' --test_data_url='./test_img'报错

2020-03-10 10:37:34.998582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N
2020-03-10 10:37:34.998708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-10 10:37:34.998966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1092, in _run
subfeed, allow_tensor=True, allow_operation=False)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3478, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3557, in _as_graph_element_locked
raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 166, in
tf.app.run()
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run.py", line 163, in main
eval_model(FLAGS)
File "/opt/shakey/imageclass/garbage_classify-master/eval.py", line 215, in eval_model
test_single_model(FLAGS)
File "/opt/shakey/imageclass/garbage_classify-master/eval.py", line 159, in test_single_model
pred_score = sess1.run([output_score], feed_dict={input_images: img})
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/opt/shakey/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1095, in _run
'Cannot interpret feed_dict key as Tensor: ' + e.args[0])
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("input_1:0", shape=(?, 456, 456, 3), dtype=float32) is not an element of this graph.