INFO:root:setting up ocr
INFO:root:ocr finished successfully
INFO:pd3f.parsr_wrapper:sending PDF to Parsr
INFO:pd3f.parsr_wrapper:got response from Parsr
INFO:pd3f.doc_info:media line width: 636.96
INFO:pd3f.doc_info:median line height: 20
INFO:pd3f.doc_info:median line space: 9.210000000000036
INFO:pd3f.doc_info:counter width: [(638.93, 21), (638.02, 19), (639.09, 19), (638.95, 17), (638.05, 16)]
INFO:pd3f.doc_info:counter height: [(20, 1861), (21, 602), (19, 395), (24.14, 31), (22, 21)]
INFO:pd3f.doc_info:counter lineheight: [(9.0, 487), (10.0, 454), (9.789999999999964, 158), (10.210000000000036, 138), (9.210000000000036, 127)]
ERROR:rq.worker:Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "./app.py", line 273, in do_the_job
text, tables = extract(
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 53, in extract
e = Export(
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 171, in __init__
self.export()
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 239, in export
cleaned_header, cleaned_footer, new_footnotes = self.export_header_footer()
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 198, in export_header_footer
headers = remove_duplicates(headers, self.lang)
File "/usr/local/lib/python3.8/site-packages/pd3f/doc_info.py", line 136, in remove_duplicates
if single_score(only_text(r), lang) <= single_score(
File "/usr/local/lib/python3.8/site-packages/pd3f/dehyphen_wrapper.py", line 65, in single_score
scorer = get_scorer(lang)
File "/usr/local/lib/python3.8/site-packages/pd3f/dehyphen_wrapper.py", line 30, in get_scorer
scorer = FlairScorer(lang=lang)
File "/usr/local/lib/python3.8/site-packages/dehyphen/scorer.py", line 26, in __init__
self.lms = [FlairEmbeddings(x).lm for x in model_names]
File "/usr/local/lib/python3.8/site-packages/dehyphen/scorer.py", line 26, in
self.lms = [FlairEmbeddings(x).lm for x in model_names]
File "/usr/local/lib/python3.8/site-packages/flair/embeddings/token.py", line 567, in __init__
model = cached_path(base_path, cache_dir=cache_dir)
File "/usr/local/lib/python3.8/site-packages/flair/file_utils.py", line 90, in cached_path
return get_from_cache(url_or_filename, dataset_cache)
File "/usr/local/lib/python3.8/site-packages/flair/file_utils.py", line 166, in get_from_cache
raise IOError(
OSError: HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-mix-german-forward-v0.2rc.pt with status code 301.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/rq/worker.py", line 936, in perform_job
rv = job.perform()
File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 684, in perform
self._result = self._execute()
File "/usr/local/lib/python3.8/site-packages/rq/job.py", line 690, in _execute
return self.func(*self.args, **self.kwargs)
File "./app.py", line 273, in do_the_job
text, tables = extract(
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 53, in extract
e = Export(
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 171, in __init__
self.export()
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 239, in export
cleaned_header, cleaned_footer, new_footnotes = self.export_header_footer()
File "/usr/local/lib/python3.8/site-packages/pd3f/export.py", line 198, in export_header_footer
headers = remove_duplicates(headers, self.lang)
File "/usr/local/lib/python3.8/site-packages/pd3f/doc_info.py", line 136, in remove_duplicates
if single_score(only_text(r), lang) <= single_score(
File "/usr/local/lib/python3.8/site-packages/pd3f/dehyphen_wrapper.py", line 65, in single_score
scorer = get_scorer(lang)
File "/usr/local/lib/python3.8/site-packages/pd3f/dehyphen_wrapper.py", line 30, in get_scorer
scorer = FlairScorer(lang=lang)
File "/usr/local/lib/python3.8/site-packages/dehyphen/scorer.py", line 26, in __init__
self.lms = [FlairEmbeddings(x).lm for x in model_names]
File "/usr/local/lib/python3.8/site-packages/dehyphen/scorer.py", line 26, in
self.lms = [FlairEmbeddings(x).lm for x in model_names]
File "/usr/local/lib/python3.8/site-packages/flair/embeddings/token.py", line 567, in __init__
model = cached_path(base_p```ath, cache_dir=cache_dir)
File "/usr/local/lib/python3.8/site-packages/flair/file_utils.py", line 90, in cached_path
return get_from_cache(url_or_filename, dataset_cache)
File "/usr/local/lib/python3.8/site-packages/flair/file_utils.py", line 166, in get_from_cache
raise IOError(
OSError: HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/lm-mix-german-forward-v0.2rc.pt with status code 301.