pnoll1 / osmand_map_creation Goto Github PK
View Code? Open in Web Editor NEWOSM data + open address data compiled for use in OSMAnd
License: GNU General Public License v3.0
OSM data + open address data compiled for use in OSMAnd
License: GNU General Public License v3.0
I've been trying to figure out how to generate a map with the openaddresses au/countrywide address set (Addresses are generated for the Australian Government but not licensed appropriately for openstreetmaps). I'm not sure where to begin
Looking at the processing.sh I'm presuming I would end up with a command similar to this?
cd /home/pat/projects/osmand_map_creation/osmand_osm/osm/;python3 processing.py au:act au:nsw au:vic au:qld au:tas au:nt au:sa au:wa --normal;cd /home/pat/projects/osmand_map_creation/;java -Djava.util.logging.config.file=logging.properties -Xms64M -Xmx22G -cp "./OsmAndMapCreator.jar:lib/OsmAnd-core.jar:./lib/*.jar" net.osmand.util.IndexBatchCreator batch.xml;cd /home/pat/projects/osmand_map_creation/osmand_osm;mv *.pbf osm/
https://results.openaddresses.io/sources/au/countrywide
https://download.geofabrik.de/australia-oceania/australia.html
There is no obf available at opensupermaps.com for Massachusetts...
slice function overwrites python builtin of same name. consider renaming function.
PR has multiple addresses with fields that violate OSM's 255 character limit. Add filter to find and delete addresses over limit.
lexington-fayette.vrt
causes failure of KY and OK
data is written to db, but - changes to _.
processing_2022-06-23T12:05:20.555361.log:
2022-06-25 10:04:12,929 writing osm file for ca/on/oxford-addresses-county.geojson
2022-06-25 10:04:15,981 writing osm file for ca/on/city_of_niagara_falls-addresses-city.geojson
2022-06-25 10:04:21,356 writing osm file for ca/on/city_of_vaughan-addresses-city.geojson
2022-06-25 10:04:23,988 writing osm file for ca/on/city_of_cambridge-addresses-city.geojson
2022-06-25 10:04:26,452 writing osm file for ca/on/northumberland-addresses-county.geojson
2022-06-25 10:04:26,699 pg2osm fileinfo failure: Geometry error: Invalid location. Usually this means a node was missing from the input data.
Multiple other failures in log like this for other areas
spokane table has 213266 records, output is 211223.
expanded can't be compared directly with non-expanded db addresses, ogr2osm output with default translation errors out when importing to postgres with Osm2pgsql failed due to ERROR: XML parsing error at line 1, column 1: not well-formed (invalid token).
tx and fl fail with same errors in output:
Node ID twice in input. Maybe you are using a history or change file?
This command expects the input file to be ordered: First nodes in order of ID,
then ways in order of ID, then relations in order of ID.
Florida prints this twice, Texas prints three times.
currently everything is logged when errors and time stamps are almost always the useful parts. Logging module should make it easy to hide low level info making errors easier to find.
The ODBL link on the overview is out of date, I believe this is the new link for it: https://opendatacommons.org/licenses/odbl/
Basic errors making it to master due to forgetting to test.
Codebase is designed with testing in mind so a basic implementation should be easy.
Could start with files with known outputs for a quick start and move to proper minimal cases later. Automating a snapshot of last months us:ri run would check for the case where build should finish, us:pr could handle case where build should fail.
This would allow logging from ogr2osm giving more fine grained info when something goes wrong. Info is here: https://github.com/roelderickx/ogr2osm#as-a-library
OA data already included in downloads, need to check if state_expander and update_osm will work. Does geofabrik have extract for this?
This months run may have errored out due to lack of space. Project currently takes ~615GBs. Buy new nvme drive and move personal files over to it.
pypy gives a drop in way to speed up python code.
Depdency management is currently a mishmash of manual updates, pip and updates tied to Debian Unstable.
Options to stabilize build environment:
running --update-oa returns 2.3K file. running the curl command also returns the same result.
Downloading using firefox appears to work
Opening file with text editor shows it's a 403 error. Most likely a user agent block, investigate openaddresses docs and/or reach out to openaddresses for clarification.
add md5 check for extract downloads
properly raise existing errors so script stops when incomplete/bad file detected
Several of the latest download attempts have failed at partial completion. Need to add error handling; resume download if possible.
make hardcoded values configurable and breakout into separate file
log:
2022-01-25 00:24:03,968 writing osm file for us/pr/statewide-addresses-state.geojson
2022-01-25 00:24:08,055 us_pr_statewide_addresses_state is hashes only
2022-01-25 00:24:08,055 Expecting value: line 1 column 1 (char 0)
2022-01-25 00:24:29,071 osm update finished for us:pr
2022-01-25 00:24:30,318 us:pr Merge Finished
terminal error:
Traceback (most recent call last):
File "/home/pat/projects/osmand_map_creation/osmand_osm/osm/./processing.py", line 261, in prep_for_qa
stats = run('osmium fileinfo -ej {0}'.format(working_area.master_list[-1].path_osm), shell=True, capture_output=True ,check=True , encoding='utf8')
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/home/pat/projects/osmand_map_creation/osmand_osm/osm/./processing.py", line 423, in run_all
stats, stats_area, stats_final = prep_for_qa(working_area)
File "/home/pat/projects/osmand_map_creation/osmand_osm/osm/./processing.py", line 263, in prep_for_qa
logging.error(e.stderr)
AttributeError: 'IndexError' object has no attribute 'stderr'
pg2osm appears to run twice when there is only one source, correctly the first time then erroring out in pg2osm on the second which removes it from the master list then reaching an unrecoverable error in prep_for_qa where the list is required to be a nonempty list. Error most likely at creating master list.
Data for lots of chains. The locations are largely redundant, but opening hours would be a big nice to have. Websites and phone numbers may be useful as well.
Data is organized by scraper which relates to a chain or a local branch of a chain. This data will need to be split by state/province to match current merge methodology.
Is it possible to have one file that simply adds every single one in say USA?
So we don't have to download and import for every state.
When trying to load test files with /x1b in them:
ERROR 1: JSON parsing error: invalid string sequence (at offset 85)
Adding -skipfailures to ogr2ogr call seems to drop address with issue.
results.openaddresses.io and it's csv format is not longer updated. Need to update to json format at batch.openaddresses.io.
Thoughts
us:pr contains addresses with no geometry. Need to investigate what happens to that data.
{"type":"Feature","properties":{"hash":"bb060ebcb2feae72","number":"359","street":"DEGETAU","unit":"","city":"SAN JUAN","district":"SANTURCE","region":"PR","postcode":"915","id":""},"geometry":null}
Script crashed during build. Adding another slice to MX will likely fix.
when map creator calls were added, setup of program should've been added to script. Also, need to add creation of working directories and creation of batch.xml
Is it possible to visibly hide all the numbers on the map?
Gets extremely cluttered, not seeing anything anywhere in settings
Florida:
Node ID twice in input. Maybe you are using a history or change file?
This command expects the input file to be ordered: First nodes in order of ID,
then ways in order of ID, then relations in order of ID.
Similar to #5 .
OA has tons of data that OSM doesn't for this region.
File is 4G.
Could rewrite history to remove history before basing off openaddresses. This would remove commits that included data files which are probably taking up most of the space. Upside is condensed history that may help with new contributors. Downside is that may create havoc for anyone that has built off the code.
Upstream has an issue open: osmandapp/OsmAnd-iOS#1766
filtering us_tx_dallasTraceback (most recent call last):
File "/opt/ogr2osm/ogr2osm.py", line 792, in
main()
File "/opt/ogr2osm/ogr2osm.py", line 783, in main
output()
File "/opt/ogr2osm/ogr2osm.py", line 517, in output
tag = etree.Element('tag', {'k':key, 'v':value})
File "src/lxml/etree.pyx", line 3007, in lxml.etree.Element
File "src/lxml/apihelpers.pxi", line 131, in lxml.etree._makeElement
File "src/lxml/apihelpers.pxi", line 119, in lxml.etree._makeElement
File "src/lxml/apihelpers.pxi", line 299, in lxml.etree._initNodeAttributes
File "src/lxml/apihelpers.pxi", line 310, in lxml.etree._addAttributeToNode
File "src/lxml/apihelpers.pxi", line 1493, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
need to identify and determine if filters can/should be added to catch
code from clean_file_names() and below didn't run
--normal and --all options don't work. Argparse namespace object doesn't seem to allow setting attributes after creation.
osmand map creator has intermittent issues with large files and/or files with large numbers of addresses added from openaddresses. A basic slicer that will split the country/subregion in a chosen number of parts with proper names based on area would save the manual work of creating configs to slice properly(based on the amount of data).
currently use a shell file to batch builds for osmand_map_creator. Bring that into the python processing file for easier maintenance and flexibility.
batching config should be done in config file
iso country_iso subdivision(if applicable) is most likely. Need to find different way to communicate status to website or get rid of altogether.
Helps automation for clients, simplifies maintenance
Texas and florida files were moved when they should have failed quality check for having multiple nodes with same id.
Expected behavior: File fails quality check, ready_to_move=False, file stays in working folder so good file from previous build is retained.
Hello,
I wanted to thank you for maintaining this project. It has allowed me to use OSMAnd+ as my primary navigator with no problems on my GrapheneOS phone free of google services. Do you think in the future there could be a potential F-Droid app that automatically adds or auto-updates downloaded maps without having to manually add and update them every time they are updated? Thanks again.
-rw-r--r-- 1 pat pat 73M May 22 08:29 us_ar_alpha.osm.pbf
-rw-r--r-- 1 pat pat 59M Jun 21 09:17 us_ar.osm.pbf
A smaller file typically indicates that OpenAddress data had a regression and has fewer addresses than before. OSM data typically grows so files should get bigger each month.
Adding a check after pbf is built to compare with last months file in osmand_osm/osm folder would be a good start and easy to implement. Add a warning to the log flagging that the file is likely worse than last months.
should write failed source to log file and skip source to continue processing
current workaround would be to move source or change .vrt file to another extension
Some sources have failed with ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters from xml writer.
Luna NM had \x1b. Statewide FL has \x02.
Geofabrik provides no subregions for some countries eg mx. Using 3166-1 code as shortcut to build many 3166-2 areas is not currently planned.
Much of the work has already been done with switching to working_area object for 3166-2 support. Need to work through edge cases and find good test area.
Geofabrik's extracts are limiting which areas can be built easily. Extracts aren't done according to country codes in some cases eg Dominican and haiti being combined. Some countries become unbuildable once addresses are added eg Spain or nearly unbuildable eg Mexico.
An extract system that follows iso area codes would simplify adding new countries and remove reliance on Geofabrik. The osm planet file could be downloaded and split.
With data from the useful links section, it should be possible to build with osmium extract.
Useful links:
https://github.com/datasets/country-codes
https://github.com/datasets/geo-countries
osmium merge errors on files that aren't ordered. OA address files are sorted backwards since that's how ogr2osm outputs.
Need to run osmium sort on address files or try to invert ogr2osm output so ids increase
Mitigates possible issues resulting from incomplete or invalid downloads. Probably seen in #17
Download Navi is possible recommended download client and supports md5 or sha256.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.