Comments (9)
Is that the log file or stdout? Can you post stdout instead? Also, you can just copy&paste the text here, that's actually better, you should not make a screenshot of it. There should be another error which states what the actual problem is and where it happens. If you don't see such thing, set log_verbosity = 5 in the config file and run again.
from returnn.
That's the log file. I'm going to post the stdout in a bit after I run go.sh again
from returnn.
Unhandled exception <type 'exceptions.AssertionError'> in thread <_MainThread(MainThread, started 140237591738112)>, proc 6301.
EXCEPTION
Traceback (most recent call last):
File "../../../rnn.py", line 468, in <module>
line: main(sys.argv)
locals:
main = <local> <function main at 0x7f8b64445b18>
sys = <local> <module 'sys' (built-in)>
sys.argv = <local> ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 457, in main
line: init(commandLineOptions=argv[1:])
locals:
init = <global> <function init at 0x7f8b644458c0>
commandLineOptions = <not found>
argv = <local> ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 314, in init
line: initData()
locals:
initData = <global> <function initData at 0x7f8b64445668>
File "../../../rnn.py", line 225, in initData
line: dev_data, extra_cache_bytes_dev = load_data(config, cache_byte_sizes[1], 'dev', chunking=chunking,
seq_ordering="sorted", shuffle_frames_of_nseqs=0)
locals:
dev_data = <global> None
extra_cache_bytes_dev = <not found>
load_data = <global> <function load_data at 0x7f8b644455f0>
config = <global> <Config.Config instance at 0x7f8b644b77e8>
cache_byte_sizes = <local> [11274289153, 11274289153, 11274289153]
chunking = <local> '0'
seq_ordering = <not found>
shuffle_frames_of_nseqs = <not found>
File "../../../rnn.py", line 206, in load_data
line: data = init_dataset_via_str(config_str, config=config, cache_byte_size=cache_byte_size, **kwargs)
locals:
data = <not found>
init_dataset_via_str = <global> <function init_dataset_via_str at 0x7f8b644b6938>
config_str = <local> 'features/raw/train_valid.h5', len = 27
config = <local> <Config.Config instance at 0x7f8b644b77e8>
cache_byte_size = <local> 11274289153
kwargs = <local> {'chunking': '0', 'shuffle_frames_of_nseqs': 0, 'name': 'dev', 'seq_ordering': 'sorted'}
File "/home/kapitan/Desktop/returnn/Dataset.py", line 743, in init_dataset_via_str
line: assert os.path.exists(f)
locals:
os = <global> <module 'os' from '/usr/lib/python2.7/os.pyc'>
os.path = <global> <module 'posixpath' from '/usr/lib/python2.7/posixpath.pyc'>
os.path.exists = <global> <function exists at 0x7f8b9bcf2ed8>
f = <local> 'features/raw/train_valid.h5', len = 27
AssertionError
Device gpuX proc, pid 6406: Parent seem to have died: recv_bytes EOFError
from returnn.
So, features/raw/train_valid.h5
does not exist. Make sure it exists. Maybe you started it from the wrong directory / pwd?
from returnn.
I tried adding a train_valid.h5 into that folder. I also removed the assert lines in the create_IAM_dataset.py. Here's the error now.
Unhandled exception <type 'exceptions.IOError'> in thread <_MainThread(MainThread, started 140321343878912)>, proc 7510.
EXCEPTION
Traceback (most recent call last):
File "../../../rnn.py", line 468, in <module>
line: main(sys.argv)
locals:
main = <local> <function main at 0x7f9ee4485b18>
sys = <local> <module 'sys' (built-in)>
sys.argv = <local> ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 457, in main
line: init(commandLineOptions=argv[1:])
locals:
init = <global> <function init at 0x7f9ee44858c0>
commandLineOptions = <not found>
argv = <local> ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 314, in init
line: initData()
locals:
initData = <global> <function initData at 0x7f9ee4485668>
File "../../../rnn.py", line 225, in initData
line: dev_data, extra_cache_bytes_dev = load_data(config, cache_byte_sizes[1], 'dev', chunking=chunking,
seq_ordering="sorted", shuffle_frames_of_nseqs=0)
locals:
dev_data = <global> None
extra_cache_bytes_dev = <not found>
load_data = <global> <function load_data at 0x7f9ee44855f0>
config = <global> <Config.Config instance at 0x7f9ee44f77e8>
cache_byte_sizes = <local> [11274289153, 11274289153, 11274289153]
chunking = <local> '0'
seq_ordering = <not found>
shuffle_frames_of_nseqs = <not found>
File "../../../rnn.py", line 206, in load_data
line: data = init_dataset_via_str(config_str, config=config, cache_byte_size=cache_byte_size, **kwargs)
locals:
data = <not found>
init_dataset_via_str = <global> <function init_dataset_via_str at 0x7f9ee44f6938>
config_str = <local> 'features/raw/train_valid.h5', len = 27
config = <local> <Config.Config instance at 0x7f9ee44f77e8>
cache_byte_size = <local> 11274289153
kwargs = <local> {'chunking': '0', 'shuffle_frames_of_nseqs': 0, 'name': 'dev', 'seq_ordering': 'sorted'}
File "/home/kapitan/Desktop/returnn/Dataset.py", line 744, in init_dataset_via_str
line: data.add_file(f)
locals:
data = <local> <HDFDataset.HDFDataset object at 0x7f9ee448bfd0>
data.add_file = <local> <bound method HDFDataset.add_file of <HDFDataset.HDFDataset object at 0x7f9ee448bfd0>>
f = <local> 'features/raw/train_valid.h5', len = 27
File "/home/kapitan/Desktop/returnn/HDFDataset.py", line 37, in add_file
line: fin = h5py.File(filename, "r")
locals:
fin = <not found>
h5py = <global> <module 'h5py' from '/usr/lib/python2.7/dist-packages/h5py/__init__.pyc'>
h5py.File = <global> <class 'h5py._hl.files.File'>
filename = <local> 'features/raw/train_valid.h5', len = 27
File "/usr/lib/python2.7/dist-packages/h5py/_hl/files.py", line 272, in __init__
line: fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
locals:
fid = <not found>
make_fid = <global> <function make_fid at 0x7f9ef0142668>
name = <local> 'features/raw/train_valid.h5', len = 27
mode = <local> 'r'
userblock_size = <local> None
fapl = <local> <h5py.h5p.PropFAID object at 0x7f9ee3c46cf8>
swmr = <local> False
File "/usr/lib/python2.7/dist-packages/h5py/_hl/files.py", line 92, in make_fid
line: fid = h5f.open(name, flags, fapl=fapl)
locals:
fid = <not found>
h5f = <global> <module 'h5py.h5f' from '/usr/lib/python2.7/dist-packages/h5py/h5f.x86_64-linux-gnu.so'>
h5f.open = <global> <cyfunction with_phil.<locals>.wrapper at 0x7f9ef01744d0>
name = <local> 'features/raw/train_valid.h5', len = 27
flags = <local> 0
fapl = <local> <h5py.h5p.PropFAID object at 0x7f9ee3c46cf8>
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
-- code not available --
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
-- code not available --
File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5f.c:1811)
-- code not available --
IOError: Unable to open file (File signature not found)
from returnn.
It seems your features/raw/train_valid.h5
file is broken. Maybe you need to download it again? Are you sure you downloaded the correct file? Or maybe you need to create it yourself. Have you read the Readme? Maybe ask @pvoigtlaender. I have seen that there is a script create_IAM_dataset
which creates such a file. Have you run that script?
from returnn.
I read the Readme and followed the instructions there. As for the script that generates the train_valud.h5, I believe you are referring to this one?
`def convert_IAM_lines_train(base_path_imgs, tag, blacklist=[]):
base_path_out = "features/" + tag + "/"
mkdir_p(base_path_out)
file_list_path = "lines.txt"
char_list_path = "chars.txt"
selection_list_path = "split/train.txt"
out_file_name_train1 = base_path_out + "train.1.h5"
out_file_name_train2 = base_path_out + "train.2.h5"
out_file_name_train_valid = base_path_out + "train_valid.h5"
print ("converting IAM_lines to", out_file_name_train1, "and", out_file_name_train2)
train_list, train_valid_list = get_train_and_train_valid_lists(selection_list_path, blacklist, 0.9)
len1 = len(train_list) / 2
train_list1 = train_list[:len1]
train_list2 = train_list[len1:]
selections = [train_list1, train_list2, train_valid_list]
out_file_names = [out_file_name_train1, out_file_name_train2, out_file_name_train_valid]
convert(file_list_path, char_list_path, selections, out_file_names, pad_whitespace=True, dataset_prefix="trainset", base_path=base_path_imgs, compress=False)`
from returnn.
Can you please try running this script?
https://gist.github.com/cwig/315d212964542f7f1797d5fdd122891e
from returnn.
it worked :O
Thank you so much :)) I wonder what's the problem with my setup though
from returnn.
Related Issues (20)
- `ConcatFilesDataset` combines poorly with `MetaDataset` HOT 6
- RF torch `lstm` fails with torch amp option. HOT 6
- `DistributeFilesDataset` with sharding on file level HOT 6
- ConcatFilesDataset: Reshuffle files per subepoch after every full epoch HOT 2
- `ConcatFilesDataset` needs a better name HOT 10
- RF BatchNorm running var small diff between TF-layers, pure RF and direct PyTorch, biased vs unbiased
- `DistributeFilesDataset`, allow kwargs in `get_sub_epoch_dataset` HOT 10
- Tensor deepcopy does not copy raw_tensor
- Possible race condition in `FileCache`? HOT 5
- Ideas for generic `CachedFile` support across all datasets HOT 18
- `FileCache`: better cleaning, free more than just the minimum
- `FileCache`: avoid cache-wide dir lock
- `DistributeFilesDataset`: copying files blocks `init_seq_order` HOT 2
- `FileCache`: Race condition when removing empty directories HOT 5
- Gradient checkpointing for weight noise etc in PyTorch HOT 7
- SlowMo (BMUF) support for PyTorch distributed training
- DistributeFilesDataset has issues with DataLoader and `num_workers > 0` HOT 1
- RF scaled_dot_product_attention
- DistributeFilesDataset Sharding with PT Dataloader breaks HOT 3
- Hang in training (often with multi GPU training) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from returnn.