Create a module (maybe named simply models
?) for quick and easy vectorization of data for machine learning.
Allow users to just pass a list of (lon, lat)
or shapely.geometry.Point
values in wgs84 CRS and return a NumPy array.
No GIS knowledge is required, just enrich your data with predefined models. May download the existing model from the server or run a predefined pipeline in the background with a progress bar and logging.
Examples to follow the API:
word2vec
from gensim.models import Word2Vec
word2vec = Word2Vec(all_words, min_count=2)
embeddings
from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding, ConcatEmbedding
g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)
f = FastTextEmbedding()
k = KazumaCharEmbedding()
c = ConcatEmbedding([g, f, k])
for w in ['canada', 'vancouver', 'toronto']:
print('embedding {}'.format(w))
print(g.emb(w))
print(f.emb(w))
print(k.emb(w))
print(c.emb(w))