Giter VIP home page Giter VIP logo

mqancienthistory / lat-epig Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 0.0 15.7 MB

The Lat-Epig interface allows you to query the EDCS and save the search result in a TSV file and plot the results on a map of the Roman Empire without any prior knowledge of programming.

Home Page: https://mybinder.org/v2/gh/mqAncientHistory/Lat-Epig/HEAD?urlpath=notebooks/EpigraphyScraper.ipynb

License: GNU General Public License v3.0

Jupyter Notebook 3.99% Python 1.68% HTML 93.99% Shell 0.11% Dockerfile 0.06% PowerShell 0.10% Roff 0.06%
latin epigraphy web-scraper jupyter binder docker

lat-epig's People

Contributors

denubis avatar dependabot[bot] avatar ewansc avatar petrifiedvoices avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

lat-epig's Issues

Fix multiple dates in <b> tag

Example: scraper scrapes the text of an inscription, and places it to one attribute along with comments and other text. It needs to be separated, so the text of an inscription is a standalone attribute and all other comments as well. It works for some inscriptions, but not for all. I am listing examples of problematic inscriptions: (there is 1268 in total that I could find).

1. EDCS-03300852 - alternative or missing date

When multiple dates available, scraper did scrape dating to the text of an inscription.

HTML:
<b>dating:</b> &nbsp; <b>a:&nbsp;</b> 276&nbsp; <b>to</b> 276;&nbsp;&nbsp;&nbsp; <b>b:&nbsp;</b> 276&nbsp; <b>to</b> 282&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

Scraped in the CSV (as of now)
Inscription attribute: to 276; b: 276 to 282 \n\n \n \nImp(eratori) / Floriano / P(io) F(elici) Aug(usto) / p(atri) p(atriae) / m(ilia) p(assuum) III // Imp(eratori) Caes(ari) / M(arco) Aur(elio) Probo / P(io) F(elici) Aug(usto) / m(ilia) p(assuum) III

Desired outcome:
Inscription attribute: Imp(eratori) / Floriano / P(io) F(elici) Aug(usto) / p(atri) p(atriae) / m(ilia) p(assuum) III // Imp(eratori) Caes(ari) / M(arco) Aur(elio) Probo / P(io) F(elici) Aug(usto) / m(ilia) p(assuum) III
Comments attribute: to 276; b: 276 to 282 (paste the text as is)
Dating from attribute:NA (take the A version as primary, sometimes there might be 1-4 numbers instead of NA)
Dating to attribute:276

Examples of other inscriptions with a similar problem: EDCS-32001032, EDCS-09700347, EDCS-24900108, EDCS-24900080
(Total 1072 inscriptions with problematic date, as a result of HTML tag error)

Link to the CSVs with minimal examples (Git does not allow me to paste them here):
https://github.com/sdam-au/EDCS_ETL/tree/master/output

Fix commentary in the text of an inscription, <b> tag in HTML

EDCS-74800023 - missing text of an inscription; double usage of <b> tag iN HTML

Although originally present on the website, scraper did not scrape the text of the inscription, just the comments (or I suspect comments overwrote the text of the inscription).

HMTL:
P(ublius) Titius [3] / Primu[s sibi(?)] / et s[uis et(?)] / Primae [3] / matri Ma[3] / Primigenia[e 3] / in fr(onte) p(edes) X[3] / in ag(ro) p(edes) XI[3]

<b>comment:</b> <a href="http://www.aemiliaonline.it/reperti/stele/stele-di-publius-titus-primus" target="_blank">http://www.aemiliaonline.it/reperti/stele/stele-di-publius-titus-primus</a>

Curently as is scraped to CSV:
Inscription attribute: http://www.aemiliaonline.it/reperti/stele/stele-di-publius-titus-primus \n\n http://www.aemiliaonline.it/reperti/stele/stele-di-publius-titus-primus

Desired outcome:
Inscription attribute: P(ublius) Titius [3] / Primu[s sibi(?)] / et s[uis et(?)] / Primae [3] / matri Ma[3] / Primigenia[e 3] / in fr(onte) p(edes) X[3] / in ag(ro) p(edes) XI[3]
Comments attribute: http://www.aemiliaonline.it/reperti/stele/stele-di-publius-titus-primus

Examples of other inscriptions with a similar problem: EDCS-27601424, EDCS-10300305, EDCS-75900072
(Total 53 inscriptions, as a result of HTML tag error)

Link to the CSVs with minimal examples (Git does not allow me to paste them here):
https://github.com/sdam-au/EDCS_ETL/tree/master/output

Outdated Docker

Hello.

First, thanks for this great tool!

While deploying the tool locally using docker (denubis/lat-epig-scraper), I faced some issues caused but errors in the parsing of the cities csv file, in the "Interactive Map Output" section. Replacing the script interactive_map.py in the docker container by the one in this GitHub repo fixed those issues for me. It seems that the commits of May 12, 2023 were not integrated in the docker image.

image

Thanks!

Fix extra text in the text of inscriptions

EDCS-73700333 - extra biblio references in the text of inscription

Instead of scraping inscription and biblio references as separate attributes, scraper scraped text of an inscription along with the DOI info. Scraper plops the text of the comment into the same attribute as text of inscription, although it should go to the commentary.

HTML:
Iesus s(anc)t&lt;u=O&gt;(s) ego Iesus sum ego fui s(anc)t&lt;u=O&gt;(s) ego s(anc)t&lt;u=O&gt;(s) fui / s(anc)t&lt;u=O&gt;(s) Iesus fui s(anc)t&lt;u=O&gt;(s) Iesus est Iesus s(anc)t&lt;u=O&gt;(s) est / Ego [3] / s(anc)t&lt;u=O&gt;(s) Iesus fuit s(anc)t&lt;u=O&gt;(s) Iesus est Iesus fuit s(anc)t&lt;u=O&gt;(s) Iesus / s(anc)t&lt;u=O&gt;(s) fuit Iesus s(anc)t&lt;u=O&gt;(s) Iesus fuit s(anc)t&lt;u=O&gt;(s) Iesus fuit / s(anc)t&lt;u=O&gt;(s) Iesus est Iesus est alef s(anc)t&lt;u=O&gt;(s) Iesus Iesus / Iesus s(anc)t&lt;u=O&gt;(s) Iesus alef est |(omega) est s(anc)t&lt;u=O&gt;(s) Iesus / Iesus s(anc)t&lt;u=O&gt;(s) fui s(anc)t(us) Iesus fui s(anc)t&lt;u=O&gt;(s) I[esus] / s(anc)t&lt;u=O&gt;(s) Iesus s(anc)t&lt;u=O&gt;(s) Iesus Iesus [fuit] / s(anc)t&lt;u=O&gt;(s) Iesus Iesus est s(anc)t(us) Iesus fuit s(anc)t&lt;u=O&gt;(s) / Iesus s(anc)t&lt;u=O&gt;(s) fuit Iesus ego fui I[esus] / Iesus s(anc)t(us) fuit s(anc)t&lt;u=O&gt;(s) s(anc)t&lt;u=O&gt;(s) fuit s(anc)t&lt;u=O&gt;(s) I[esus] / Iesus s(anc)t(us) fuit s(anc)t&lt;u=O&gt;(s) Iesus fuit Iesus fuit s(anc)t&lt;u=O&gt;(s) / Iesus s(anc)t&lt;u=O&gt;(s) fui s(anc)t&lt;u=O&gt;(s) Iesus fuit fui s(anc)t&lt;u=O&gt;(s) Iesus

<b>comment:</b> DOI: <a href="https://doi.org/10.15581/012.26.004" target="_blank">10.15581/012.26.004</a>

As is now scraped to CSV:
Inscription attribute: Iesus s(anc)t<u=O>(s) ego Iesus sum ego fui s(anc)t<u=O>(s) ego s(anc)t<u=O>(s) fui / s(anc)t<u=O>(s) Iesus fui s(anc)t<u=O>(s) Iesus est Iesus s(anc)t<u=O>(s) est / Ego [3] / s(anc)t<u=O>(s) Iesus fuit s(anc)t<u=O>(s) Iesus est Iesus fuit s(anc)t<u=O>(s) Iesus / s(anc)t<u=O>(s) fuit Iesus s(anc)t<u=O>(s) Iesus fuit s(anc)t<u=O>(s) Iesus fuit / s(anc)t<u=O>(s) Iesus est Iesus est alef s(anc)t<u=O>(s) Iesus Iesus / Iesus s(anc)t<u=O>(s) Iesus alef est |(omega) est s(anc)t<u=O>(s) Iesus / Iesus s(anc)t<u=O>(s) fui s(anc)t(us) Iesus fui s(anc)t<u=O>(s) I[esus] / s(anc)t<u=O>(s) Iesus s(anc)t<u=O>(s) Iesus Iesus [fuit] / s(anc)t<u=O>(s) Iesus Iesus est s(anc)t(us) Iesus fuit s(anc)t<u=O>(s) / Iesus s(anc)t<u=O>(s) fuit Iesus ego fui I[esus] / Iesus s(anc)t(us) fuit s(anc)t<u=O>(s) s(anc)t<u=O>(s) fuit s(anc)t<u=O>(s) I[esus] / Iesus s(anc)t(us) fuit s(anc)t<u=O>(s) Iesus fuit Iesus fuit s(anc)t<u=O>(s) / Iesus s(anc)t<u=O>(s) fui s(anc)t<u=O>(s) Iesus fuit fui s(anc)t<u=O>(s) Iesus\n\n10.15581/012.26.004

Desired outcome:
Inscription attribute: Iesus s(anc)t<u=O>(s) ego Iesus sum ego fui s(anc)t<u=O>(s) ego s(anc)t<u=O>(s) fui / s(anc)t<u=O>(s) Iesus fui s(anc)t<u=O>(s) Iesus est Iesus s(anc)t<u=O>(s) est / Ego [3] / s(anc)t<u=O>(s) Iesus fuit s(anc)t<u=O>(s) Iesus est Iesus fuit s(anc)t<u=O>(s) Iesus / s(anc)t<u=O>(s) fuit Iesus s(anc)t<u=O>(s) Iesus fuit s(anc)t<u=O>(s) Iesus fuit / s(anc)t<u=O>(s) Iesus est Iesus est alef s(anc)t<u=O>(s) Iesus Iesus / Iesus s(anc)t<u=O>(s) Iesus alef est |(omega) est s(anc)t<u=O>(s) Iesus / Iesus s(anc)t<u=O>(s) fui s(anc)t(us) Iesus fui s(anc)t<u=O>(s) I[esus] / s(anc)t<u=O>(s) Iesus s(anc)t<u=O>(s) Iesus Iesus [fuit] / s(anc)t<u=O>(s) Iesus Iesus est s(anc)t(us) Iesus fuit s(anc)t<u=O>(s) / Iesus s(anc)t<u=O>(s) fuit Iesus ego fui I[esus] / Iesus s(anc)t(us) fuit s(anc)t<u=O>(s) s(anc)t<u=O>(s) fuit s(anc)t<u=O>(s) I[esus] / Iesus s(anc)t(us) fuit s(anc)t<u=O>(s) Iesus fuit Iesus fuit s(anc)t<u=O>(s) / Iesus s(anc)t<u=O>(s) fui s(anc)t<u=O>(s) Iesus fuit fui s(anc)t<u=O>(s) Iesus
Comments attribute: 10.15581/012.26.004

Examples of other inscriptions with a similar problem: EDCS-75000138, EDCS-75000139, EDCS-44500182
(Total 143 inscriptions, as a result of HTML tag error)

Link to the CSVs with minimal examples (Git does not allow me to paste them here):
https://github.com/sdam-au/EDCS_ETL/tree/master/output

Metadata in the CSv name

Base on my understanding of parse.py, line 262 - the number of inscriptions that is added to the name of the scraped CSV is actually based on the output of the website, rather than on number of actually scraped inscriptions. Unfortunately, I have noticed in large scrapes this number does not correspond, e.g. when I scraped for all inscriptions from entire province Roma.

The website said there should be some 120,000 inscriptions, but the scraper only scraped some 90,000. Can you base the number of inscriptions in the CSV name on the number of actually scraped inscriptions rather than on what their website claims?

Mapping the results not working in Jupyter - report

I have followed the Readme, plus had to install (via terminal) the following:

sudo apt build-dep python3-cartopy
pip3 install -r requirements.txt
pip3 install -r map_requirements.txt
pip3 install -r geopandas, matplotlib, geoplot, frictionless

The map interface now loads, but no map is produced and the following error shows:

Generate New Maps!
Starting Map Generation
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in read_byte_stream(self)
    112         try:
--> 113             byte_stream = self.read_byte_stream_create()
    114             byte_stream = self.read_byte_stream_infer_stats(byte_stream)

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/plugins/local.py in read_byte_stream_create(self)
     68             source = source.replace(scheme, "", 1)
---> 69         byte_stream = io.open(source, "rb")
     70         return byte_stream

FileNotFoundError: [Errno 2] No such file or directory: 'cities/Hanson2016_Cities_OxREP.csv'

During handling of the above exception, another exception occurred:

FrictionlessException                     Traceback (most recent call last)
~/Github/EpigraphyScraperNotebook/map_interface.py in map_on_button_clicked(b)
     14     def map_on_button_clicked(b):
     15         print("Starting Map Generation")
---> 16         make_map.main()
     17     map_button.on_click(map_on_button_clicked)

~/Github/EpigraphyScraperNotebook/make_map.py in main()
    173 def main():
    174 
--> 175   cities_rows = extract(CITIES_DATA)
    176   cities_dataframe = pandas.DataFrame(cities_rows)
    177 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/main.py in extract(source, source_type, process, stream, **options)
     49     # Extract source
     50     extract = getattr(module, "extract_%s" % source_type)
---> 51     return extract(source, process=process, stream=stream, **options)

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/table.py in extract_table(source, scheme, format, hashing, encoding, compression, compression_path, control, dialect, query, schema, sync_schema, patch_schema, headers, infer_type, infer_names, infer_volume, infer_confidence, infer_float_numbers, infer_missing_values, onerror, lookup, process, stream, json)
    170     data = read_row_stream(table)
    171     data = (process(row) for row in data) if process else data
--> 172     return data if stream else list(data)
    173 
    174 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/table.py in read_row_stream(table)
    177 
    178 def read_row_stream(table):
--> 179     with table as table:
    180         for row in table.row_stream:
    181             yield row

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/table.py in __enter__(self)
    215     def __enter__(self):
    216         if self.closed:
--> 217             self.open()
    218         return self
    219 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/table.py in open(self)
    399             self.__resource.stats = {"hash": "", "bytes": 0, "fields": 0, "rows": 0}
    400             self.__parser = system.create_parser(self.__resource)
--> 401             self.__parser.open()
    402             self.__read_infer_sample()
    403             self.__data_stream = self.__read_data_stream()

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/parser.py in open(self)
     70             raise FrictionlessException(error)
     71         try:
---> 72             self.__loader = self.read_loader()
     73             self.__data_stream = self.read_data_stream()
     74             return self

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/parser.py in read_loader(self)
    101         if self.needs_loader:
    102             loader = system.create_loader(self.resource)
--> 103             return loader.open()
    104 
    105     def read_data_stream(self):

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in open(self)
     81             raise FrictionlessException(error)
     82         try:
---> 83             self.__byte_stream = self.read_byte_stream()
     84             return self
     85         except Exception:

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in read_byte_stream(self)
    116         except IOError as exception:
    117             error = errors.SchemeError(note=str(exception))
--> 118             raise FrictionlessException(error)
    119         except config.COMPRESSION_EXCEPTIONS as exception:
    120             error = errors.CompressionError(note=str(exception))

FrictionlessException: [scheme-error] The data source could not be successfully loaded: [Errno 2] No such file or directory: 'cities/Hanson2016_Cities_OxREP.csv'

Starting Map Generation
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in read_byte_stream(self)
    112         try:
--> 113             byte_stream = self.read_byte_stream_create()
    114             byte_stream = self.read_byte_stream_infer_stats(byte_stream)

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/plugins/local.py in read_byte_stream_create(self)
     68             source = source.replace(scheme, "", 1)
---> 69         byte_stream = io.open(source, "rb")
     70         return byte_stream

FileNotFoundError: [Errno 2] No such file or directory: 'cities/Hanson2016_Cities_OxREP.csv'

During handling of the above exception, another exception occurred:

FrictionlessException                     Traceback (most recent call last)
~/Github/EpigraphyScraperNotebook/map_interface.py in map_on_button_clicked(b)
     14     def map_on_button_clicked(b):
     15         print("Starting Map Generation")
---> 16         make_map.main()
     17     map_button.on_click(map_on_button_clicked)

~/Github/EpigraphyScraperNotebook/make_map.py in main()
    173 def main():
    174 
--> 175   cities_rows = extract(CITIES_DATA)
    176   cities_dataframe = pandas.DataFrame(cities_rows)
    177 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/main.py in extract(source, source_type, process, stream, **options)
     49     # Extract source
     50     extract = getattr(module, "extract_%s" % source_type)
---> 51     return extract(source, process=process, stream=stream, **options)

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/table.py in extract_table(source, scheme, format, hashing, encoding, compression, compression_path, control, dialect, query, schema, sync_schema, patch_schema, headers, infer_type, infer_names, infer_volume, infer_confidence, infer_float_numbers, infer_missing_values, onerror, lookup, process, stream, json)
    170     data = read_row_stream(table)
    171     data = (process(row) for row in data) if process else data
--> 172     return data if stream else list(data)
    173 
    174 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/extract/table.py in read_row_stream(table)
    177 
    178 def read_row_stream(table):
--> 179     with table as table:
    180         for row in table.row_stream:
    181             yield row

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/table.py in __enter__(self)
    215     def __enter__(self):
    216         if self.closed:
--> 217             self.open()
    218         return self
    219 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/table.py in open(self)
    399             self.__resource.stats = {"hash": "", "bytes": 0, "fields": 0, "rows": 0}
    400             self.__parser = system.create_parser(self.__resource)
--> 401             self.__parser.open()
    402             self.__read_infer_sample()
    403             self.__data_stream = self.__read_data_stream()

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/parser.py in open(self)
     70             raise FrictionlessException(error)
     71         try:
---> 72             self.__loader = self.read_loader()
     73             self.__data_stream = self.read_data_stream()
     74             return self

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/parser.py in read_loader(self)
    101         if self.needs_loader:
    102             loader = system.create_loader(self.resource)
--> 103             return loader.open()
    104 
    105     def read_data_stream(self):

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in open(self)
     81             raise FrictionlessException(error)
     82         try:
---> 83             self.__byte_stream = self.read_byte_stream()
     84             return self
     85         except Exception:

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/frictionless/loader.py in read_byte_stream(self)
    116         except IOError as exception:
    117             error = errors.SchemeError(note=str(exception))
--> 118             raise FrictionlessException(error)
    119         except config.COMPRESSION_EXCEPTIONS as exception:
    120             error = errors.CompressionError(note=str(exception))

FrictionlessException: [scheme-error] The data source could not be successfully loaded: [Errno 2] No such file or directory: 'cities/Hanson2016_Cities_OxREP.csv'```

E-Pig has issue?

Hi Brian,

Got this issue with scraper that it seems unable to do a search - screenshot here.

Ray

Screenshot 2022-04-19 13 53 00

Can't access Docker desktop link

When trying to access Lat-Epig with Docker, I can't seem access Lat-Epig interface. I ran docker run -p 8888:8888 denubis/lat-epig-scraper:main as indicated with success. Then, as instructed, when I try to paste http://localhost:8888/notebooks/EpigraphyScraper.ipynb in my browser, i get an error message
Capture d’écran, le 2022-10-25 à 15 39 23

I also tried to follow to links provided by Docker, but neither of those links works for me
Capture d’écran, le 2022-10-25 à 15 40 47

Thank you for your help. I apologize for any inconvenience, I am pretty new to Digital Classics.

Best,
Maxime

Parsing languages within scraper

Parse.py, lines 192-202

were extracting notations of languages from the text of the inscription (looked like "GR", "HEB", "IT") and saving it as separate value in the attribute language.
Currently it is commented out, but I would love to have it working again.

Pivot map gen to use JSON

Since there is now data minimisation in filenames, the maps need to use the produced json instead of tsv.

DUPLICATE

Build functions in the correct sequence, producing the Conservative and interpretive version of the text

Connection timeout on map export

Search terms: "Text 1- Carthag, AND, Text 2- Theveste"
Platform: Binder/voila
Error:


⠙ Making maps...2021-09-03T03:37:45.912817
{'operator': 'and', 'term2': 'Theveste', 'term1': 'Carthag'}
⠹ Making maps...Loaded data...
⠸ Making maps...Initialised plot...
0m Making maps...
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
/usr/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1353             try:
-> 1354                 h.request(req.get_method(), req.selector, req.data, headers,
   1355                           encode_chunked=req.has_header('Transfer-encoding'))

/usr/lib/python3.8/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1251         """Send a complete request to the server."""
-> 1252         self._send_request(method, url, body, headers, encode_chunked)
   1253 

/usr/lib/python3.8/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1297             body = _encode(body, 'body')
-> 1298         self.endheaders(body, encode_chunked=encode_chunked)
   1299 

/usr/lib/python3.8/http/client.py in endheaders(self, message_body, encode_chunked)
   1246             raise CannotSendHeader()
-> 1247         self._send_output(message_body, encode_chunked=encode_chunked)
   1248 

/usr/lib/python3.8/http/client.py in _send_output(self, message_body, encode_chunked)
   1006         del self._buffer[:]
-> 1007         self.send(msg)
   1008 

/usr/lib/python3.8/http/client.py in send(self, data)
    946             if self.auto_open:
--> 947                 self.connect()
    948             else:

/usr/lib/python3.8/http/client.py in connect(self)
   1413 
-> 1414             super().connect()
   1415 

/usr/lib/python3.8/http/client.py in connect(self)
    917         """Connect to the host and port specified in __init__."""
--> 918         self.sock = self._create_connection(
    919             (self.host,self.port), self.timeout, self.source_address)

/usr/lib/python3.8/socket.py in create_connection(address, timeout, source_address)
    807         try:
--> 808             raise err
    809         finally:

/usr/lib/python3.8/socket.py in create_connection(address, timeout, source_address)
    795                 sock.bind(source_address)
--> 796             sock.connect(sa)
    797             # Break explicitly a reference cycle

TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
~/src/lat_epig/map_interface.py in map_on_button_clicked(b)
    220             #     searchterm=None
    221 
--> 222             make_map(data_file=map_data.value,
    223                      map_title_text=map_title_text,
    224                      province_shapefilename=map_shapefile.value,

~/.local/lib/python3.8/site-packages/yaspin/core.py in inner(*args, **kwargs)
    119         def inner(*args, **kwargs):
    120             with self:
--> 121                 return fn(*args, **kwargs)
    122 
    123         return inner

~/src/lat_epig/make_map.py in make_map(data_file, map_title_text, province_shapefilename, searchterm, basemap_multicolour, provinces, roads, cities, filetype, show_ids, append_inscriptions, dpi, map_dimensions, partial_provinces, map_inscription_markersize, map_greyscale, will_cite)
    302       red = '#000000'
    303     if basemap_multicolour:
--> 304       bounded_prov.plot(ax=ax, linewidth=1, alpha=0.1,  cmap=prism, zorder=1, label=province_shapefilename)
    305     else:
    306       bounded_prov.plot(ax=ax, linewidth=0.3, alpha=0.5, color=brown, linestyle='dashed', zorder=1, label=province_shapefilename)

~/.local/lib/python3.8/site-packages/geopandas/plotting.py in __call__(self, *args, **kwargs)
    923             kind = kwargs.pop("kind", "geo")
    924             if kind == "geo":
--> 925                 return plot_dataframe(data, *args, **kwargs)
    926             if kind in self._pandas_kinds:
    927                 # Access pandas plots

~/.local/lib/python3.8/site-packages/geopandas/plotting.py in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, categories, classification_kwds, missing_kwds, aspect, **style_kwds)
    687 
    688     if column is None:
--> 689         return plot_series(
    690             df.geometry,
    691             cmap=cmap,

~/.local/lib/python3.8/site-packages/geopandas/plotting.py in plot_series(s, cmap, color, ax, figsize, aspect, **style_kwds)
    465         )
    466 
--> 467     plt.draw()
    468     return ax
    469 

~/.local/lib/python3.8/site-packages/matplotlib/pyplot.py in draw()
    958     the current figure.
    959     """
--> 960     gcf().canvas.draw_idle()
    961 
    962 

~/.local/lib/python3.8/site-packages/matplotlib/backend_bases.py in draw_idle(self, *args, **kwargs)
   2053         if not self._is_idle_drawing:
   2054             with self._idle_draw_cntx():
-> 2055                 self.draw(*args, **kwargs)
   2056 
   2057     def get_width_height(self):

~/.local/lib/python3.8/site-packages/matplotlib/backends/backend_agg.py in draw(self)
    404              (self.toolbar._wait_cursor_for_draw_cm() if self.toolbar
    405               else nullcontext()):
--> 406             self.figure.draw(self.renderer)
    407             # A GUI class may be need to update a window using this draw, so
    408             # don't forget to call the superclass.

~/.local/lib/python3.8/site-packages/matplotlib/artist.py in draw_wrapper(artist, renderer, *args, **kwargs)
     72     @wraps(draw)
     73     def draw_wrapper(artist, renderer, *args, **kwargs):
---> 74         result = draw(artist, renderer, *args, **kwargs)
     75         if renderer._rasterizing:
     76             renderer.stop_rasterizing()

~/.local/lib/python3.8/site-packages/matplotlib/artist.py in draw_wrapper(artist, renderer, *args, **kwargs)
     49                 renderer.start_filter()
     50 
---> 51             return draw(artist, renderer, *args, **kwargs)
     52         finally:
     53             if artist.get_agg_filter() is not None:

~/.local/lib/python3.8/site-packages/matplotlib/figure.py in draw(self, renderer)
   2778 
   2779             self.patch.draw(renderer)
-> 2780             mimage._draw_list_compositing_images(
   2781                 renderer, self, artists, self.suppressComposite)
   2782 

~/.local/lib/python3.8/site-packages/matplotlib/image.py in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    130     if not_composite or not has_images:
    131         for a in artists:
--> 132             a.draw(renderer)
    133     else:
    134         # Composite any adjacent images together

~/.local/lib/python3.8/site-packages/matplotlib/artist.py in draw_wrapper(artist, renderer, *args, **kwargs)
     49                 renderer.start_filter()
     50 
---> 51             return draw(artist, renderer, *args, **kwargs)
     52         finally:
     53             if artist.get_agg_filter() is not None:

~/.local/lib/python3.8/site-packages/cartopy/mpl/geoaxes.py in draw(self, renderer, **kwargs)
    515         self._done_img_factory = True
    516 
--> 517         return matplotlib.axes.Axes.draw(self, renderer=renderer, **kwargs)
    518 
    519     def _update_title_position(self, renderer):

~/.local/lib/python3.8/site-packages/matplotlib/artist.py in draw_wrapper(artist, renderer, *args, **kwargs)
     49                 renderer.start_filter()
     50 
---> 51             return draw(artist, renderer, *args, **kwargs)
     52         finally:
     53             if artist.get_agg_filter() is not None:

~/.local/lib/python3.8/site-packages/matplotlib/_api/deprecation.py in wrapper(*inner_args, **inner_kwargs)
    429                          else deprecation_addendum,
    430                 **kwargs)
--> 431         return func(*inner_args, **inner_kwargs)
    432 
    433     return wrapper

~/.local/lib/python3.8/site-packages/matplotlib/axes/_base.py in draw(self, renderer, inframe)
   2919             renderer.stop_rasterizing()
   2920 
-> 2921         mimage._draw_list_compositing_images(renderer, self, artists)
   2922 
   2923         renderer.close_group('axes')

~/.local/lib/python3.8/site-packages/matplotlib/image.py in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    130     if not_composite or not has_images:
    131         for a in artists:
--> 132             a.draw(renderer)
    133     else:
    134         # Composite any adjacent images together

~/.local/lib/python3.8/site-packages/matplotlib/artist.py in draw_wrapper(artist, renderer, *args, **kwargs)
     49                 renderer.start_filter()
     50 
---> 51             return draw(artist, renderer, *args, **kwargs)
     52         finally:
     53             if artist.get_agg_filter() is not None:

~/.local/lib/python3.8/site-packages/cartopy/mpl/feature_artist.py in draw(self, renderer, *args, **kwargs)
    151         except ValueError:
    152             warnings.warn('Unable to determine extent. Defaulting to global.')
--> 153         geoms = self._feature.intersecting_geometries(extent)
    154 
    155         # Combine all the keyword args in priority order.

~/.local/lib/python3.8/site-packages/cartopy/feature/__init__.py in intersecting_geometries(self, extent)
    295         """
    296         self.scaler.scale_from_extent(extent)
--> 297         return super().intersecting_geometries(extent)
    298 
    299     def with_scale(self, new_scale):

~/.local/lib/python3.8/site-packages/cartopy/feature/__init__.py in intersecting_geometries(self, extent)
    104             extent_geom = sgeom.box(extent[0], extent[2],
    105                                     extent[1], extent[3])
--> 106             return (geom for geom in self.geometries() if
    107                     geom is not None and extent_geom.intersects(geom))
    108         else:

~/.local/lib/python3.8/site-packages/cartopy/feature/__init__.py in geometries(self)
    277         key = (self.name, self.category, self.scale)
    278         if key not in _NATURAL_EARTH_GEOM_CACHE:
--> 279             path = shapereader.natural_earth(resolution=self.scale,
    280                                              category=self.category,
    281                                              name=self.name)

~/.local/lib/python3.8/site-packages/cartopy/io/shapereader.py in natural_earth(resolution, category, name)
    280     format_dict = {'config': config, 'category': category,
    281                    'name': name, 'resolution': resolution}
--> 282     return ne_downloader.path(format_dict)
    283 
    284 

~/.local/lib/python3.8/site-packages/cartopy/io/__init__.py in path(self, format_dict)
    201         else:
    202             # we need to download the file
--> 203             result_path = self.acquire_resource(target_path, format_dict)
    204 
    205         return result_path

~/.local/lib/python3.8/site-packages/cartopy/io/shapereader.py in acquire_resource(self, target_path, format_dict)
    335         url = self.url(format_dict)
    336 
--> 337         shapefile_online = self._urlopen(url)
    338 
    339         zfh = ZipFile(io.BytesIO(shapefile_online.read()), 'r')

~/.local/lib/python3.8/site-packages/cartopy/io/__init__.py in _urlopen(self, url)
    240         """
    241         warnings.warn('Downloading: {}'.format(url), DownloadWarning)
--> 242         return urlopen(url)
    243 
    244     @staticmethod

/usr/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

/usr/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    523 
    524         sys.audit('urllib.Request', req.full_url, req.data, req.headers, req.get_method())
--> 525         response = self._open(req, data)
    526 
    527         # post-process response

/usr/lib/python3.8/urllib/request.py in _open(self, req, data)
    540 
    541         protocol = req.type
--> 542         result = self._call_chain(self.handle_open, protocol, protocol +
    543                                   '_open', req)
    544         if result:

/usr/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

/usr/lib/python3.8/urllib/request.py in https_open(self, req)
   1395 
   1396         def https_open(self, req):
-> 1397             return self.do_open(http.client.HTTPSConnection, req,
   1398                 context=self._context, check_hostname=self._check_hostname)
   1399 

/usr/lib/python3.8/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1355                           encode_chunked=req.has_header('Transfer-encoding'))
   1356             except OSError as err: # timeout error
-> 1357                 raise URLError(err)
   1358             r = h.getresponse()
   1359         except:

URLError: <urlopen error [Errno 110] Connection timed out>```

Links to all partners

I have noticed in parse.py lines 58-78 you list some of the partners and their hyperlinks. However, I know EDCS is based on 37 partners in total, but the code in lines 58-78 only mentions 4 or 5. Can this be a potential problem or all the partners will be scraped in the end?

Missing road shapefiles

ERROR message:

Generate New Maps!
Starting Map Generation
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Github/EpigraphyScraperNotebook/map_interface.py in map_on_button_clicked(b)
     14     def map_on_button_clicked(b):
     15         print("Starting Map Generation")
---> 16         make_map.main()
     17     map_button.on_click(map_on_button_clicked)

~/Github/EpigraphyScraperNotebook/make_map.py in main()
    179   #https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
    180 
--> 181   roads_3857 = geopandas.read_file(ROMAN_ROADS_SHP).to_crs(epsg=3857)
    182   provinces_3857 = geopandas.read_file(PROVINCES_SHP).to_crs(epsg=3857)
    183 

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/geopandas/geodataframe.py in to_crs(self, crs, epsg, inplace)
    814         else:
    815             df = self.copy()
--> 816         geom = df.geometry.to_crs(crs=crs, epsg=epsg)
    817         df.geometry = geom
    818         df.crs = geom.crs

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/geopandas/geoseries.py in to_crs(self, crs, epsg)
    541         transformer = Transformer.from_crs(self.crs, crs, always_xy=True)
    542 
--> 543         new_data = vectorized.transform(self.values.data, transformer.transform)
    544         return GeoSeries(
    545             GeometryArray(new_data), crs=crs, index=self.index, name=self.name

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/geopandas/_vectorized.py in transform(data, func)
    888         for i in range(n):
    889             geom = data[i]
--> 890             result[i] = transform(func, geom)
    891 
    892         return result

~/Github/EpigraphyScraperNotebook/.direnv/python-3.8.2/lib/python3.8/site-packages/shapely/ops.py in transform(func, geom)
    225     also satisfy the requirements for `func`.
    226     """
--> 227     if geom.is_empty:
    228         return geom
    229     if geom.type in ('Point', 'LineString', 'LinearRing', 'Polygon'):

AttributeError: 'NoneType' object has no attribute 'is_empty'

Size of dots bigger for small searches

If searching for a small sample of inscriptions e.g. oracul - it is hard to find the dots on the maps. If they could be bigger blobs then easier to see or to use for a publication etc.

Obviously if get to 200 inscriptions problematic to have bigger dots

Mapper creates map only from the first scrape

In second or any other scrape the following error message is gererated:
`Starting Map Generation

No old maps to move from output_maps to old_maps.
Rendering: output/2021-07-05-term1_tumulus-71.tsv
0m Making maps...

NameError Traceback (most recent call last)
~/map_interface.py in map_on_button_clicked(b)
49
50 with out:
---> 51 make_map.main()
52 datestring=datetime.datetime.now().strftime("%Y%m%d")
53 output_filename=f"epigraphy_scraper_maps_output_{datestring}"

~/make_map.py in main()
226 for file in glob.glob(f"{DATA_DIR}/*.tsv"):
227 print(f"Rendering: {file}")
--> 228 makeMap(file, roads_3857, provinces_3857, cities_geodataframe_3857)
229 makeMap(file, roads_3857, provinces_3857, cities_geodataframe_3857, cities=False, roads=False)
230 shutil.move(file, f"already_mapped_data/{file}")

/usr/local/lib/python3.8/dist-packages/yaspin/core.py in inner(*args, **kwargs)
124 def inner(*args, **kwargs):
125 with self:
--> 126 return fn(*args, **kwargs)
127
128 return inner

~/make_map.py in makeMap(data_file, roads_3857, provinces_3857, cities_geodataframe_3857, provinces, roads, cities)
126 @yaspin(text="Making maps...")
127 def makeMap(data_file, roads_3857, provinces_3857, cities_geodataframe_3857, provinces=True, roads=True, cities=True):
--> 128 point_dataframe_3857 = makeDataframe(data_file)
129
130

~/make_map.py in makeDataframe(data_file, epsg)
106 # Handles multiline columns cleanly.
107 data_filename = os.path.basename(data_file)
--> 108 print(f"Making {data_filename}\n\troads: {roads}\n\tprovinces: {provinces}\n\tcities: {cities}\n")
109 import_rows = extract(data_file)
110 import_dataframe = pandas.DataFrame(import_rows)

NameError: name 'roads' is not defined

Starting Map Generation

No old maps to move from output_maps to old_maps.
Rendering: output/2021-07-05-term1_%+province_Aegyptus-1071.tsv
0m Making maps...

NameError Traceback (most recent call last)
~/map_interface.py in map_on_button_clicked(b)
49
50 with out:
---> 51 make_map.main()
52 datestring=datetime.datetime.now().strftime("%Y%m%d")
53 output_filename=f"epigraphy_scraper_maps_output_{datestring}"

~/make_map.py in main()
226 for file in glob.glob(f"{DATA_DIR}/*.tsv"):
227 print(f"Rendering: {file}")
--> 228 makeMap(file, roads_3857, provinces_3857, cities_geodataframe_3857)
229 makeMap(file, roads_3857, provinces_3857, cities_geodataframe_3857, cities=False, roads=False)
230 shutil.move(file, f"already_mapped_data/{file}")

/usr/local/lib/python3.8/dist-packages/yaspin/core.py in inner(*args, **kwargs)
124 def inner(*args, **kwargs):
125 with self:
--> 126 return fn(*args, **kwargs)
127
128 return inner

~/make_map.py in makeMap(data_file, roads_3857, provinces_3857, cities_geodataframe_3857, provinces, roads, cities)
126 @yaspin(text="Making maps...")
127 def makeMap(data_file, roads_3857, provinces_3857, cities_geodataframe_3857, provinces=True, roads=True, cities=True):
--> 128 point_dataframe_3857 = makeDataframe(data_file)
129
130

~/make_map.py in makeDataframe(data_file, epsg)
106 # Handles multiline columns cleanly.
107 data_filename = os.path.basename(data_file)
--> 108 print(f"Making {data_filename}\n\troads: {roads}\n\tprovinces: {provinces}\n\tcities: {cities}\n")
109 import_rows = extract(data_file)
110 import_dataframe = pandas.DataFrame(import_rows)

NameError: name 'roads' is not defined
And If I hit the refresh button:---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in
59
60 for file in glob.glob(f"{DATA_DIR}/*.tsv"):
---> 61 df = makeDataframe(file, epsg=4326)
62 #pprint(df)
63 map_xmin, map_ymin, map_xmax, map_ymax = df.total_bounds

in makeDataframe(data_file, epsg)
48 point_geodataframe['Links'] = point_geodataframe['Links'].apply(linkify)
49
---> 50 point_geodataframe['inscription'] = point_geodataframe['inscription'].apply(lambda x: textwrap.shorten(x, width=255))
51
52 point_geodataframe_3857 = point_geodataframe.to_crs(epsg=epsg)

/usr/local/lib/python3.8/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4136 else:
4137 values = self.astype(object)._values
-> 4138 mapped = lib.map_infer(values, f, convert=convert_dtype)
4139
4140 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

in (x)
48 point_geodataframe['Links'] = point_geodataframe['Links'].apply(linkify)
49
---> 50 point_geodataframe['inscription'] = point_geodataframe['inscription'].apply(lambda x: textwrap.shorten(x, width=255))
51
52 point_geodataframe_3857 = point_geodataframe.to_crs(epsg=epsg)

/usr/lib/python3.8/textwrap.py in shorten(text, width, **kwargs)
404 """
405 w = TextWrapper(width=width, max_lines=1, **kwargs)
--> 406 return w.fill(' '.join(text.strip().split()))
407
408

AttributeError: 'NoneType' object has no attribute 'strip'`

Mapper desired outcome

In order to create a publishable map, ADD:

  • scale
  • north arrow
  • editable title of the map, which could be also switched off if needed
  • BW option
  • make roads thinner
  • make the inscription markers either dots or triangles, not circles
  • output formats: JPG/PNG/TIFF and/or HTML
  • set DPI for the output
  • add all attributes to the popup label in the HTML version

Datasets:

Error loading binder on Chrome

Error loading mqAncientHistory/Lat-Epig/HEAD!

Received above error message on Binder when following link from GitHub
Perhaps same issue as here #44 (comment)?

My browser: Chrome Version 100.0.4896.127 (Official Build) (64-bit)

Error from build log

"Getting requirements to build wheel: finished with status 'error'
�[91m ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /tmp/tmpbzajv9ap_in_process.py get_requires_for_build_wheel /tmp/tmpsz3eowpd
cwd: /tmp/pip-install-xjlso9i9/cartopy_95182bc0365149ae9cfa5f29895cd500
Complete output (1 lines):
Proj version 7.2.1 is installed, but cartopy requires at least version 8.0.0.
�[0m�[91mWARNING: Discarding https://files.pythonhosted.org/packages/f6/55/1e1c737dc9436b320deead73d1c455ddbb74b8b6992081863492f6f6378a/Cartopy-0.20.2.tar.gz#sha256=4d08c198ecaa50a6a6b109d0f14c070e813defc046a83ac5d7ab494f85599e35 (from https://pypi.org/simple/cartopy/) (requires-python:>=3.7). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmpbzajv9ap_in_process.py get_requires_for_build_wheel /tmp/tmpsz3eowpd Check the logs for full command output.
�[0m�[91mERROR: Could not find a version that satisfies the requirement Cartopy==0.20.2
�[0m�[91mERROR: No matching distribution found for Cartopy==0.20.2

�[0mRemoving intermediate container a847dbbee9a1
The command '/bin/sh -c python3 -m pip install --user --no-cache-dir -r requirements.txt' returned a non-zero code: 1Built image, launching...
Failed to connect to event stream"

Full build log

blob:https://mybinder.org/abe308a7-13bf-419f-bad5-2cd1f06247cb

Changing Inscription Genus error

Changing Inscription Genus means that generated TSV will not open. New window opens with error:

"400 Bad Request
nginx/1.19.2"

Query:
Province:Dalmatia
Inscription Genus...: milites

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.