Giter VIP home page Giter VIP logo

gitarchiveutils's People

Contributors

liushuaikobe avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

pokerg

gitarchiveutils's Issues

Event的Payload的计数

有的一个Event中是用户在本地提交了好几次之后,才push到远程。
因此应该考虑Payload中的数量。

调用Geoname service时,次数有限制

目前采用的处理方法是,如果出异常,就更换一个账号,sleep一分钟之后重新尝试连接。
但是出现异常的那一批协程所处理的数据可能会丢失。可能跟requests库的内部实现有关系,之后肯定要在完善一下。
qq20131029-2

将读取数据的模块也单独提出来

有以下几种情况:

  • 每日正常的执行任务时读取
  • 异常
    • 下载未成功,人为指定一个记录文件的目录
    • 规范化时出异常,人为执行从数据库中读取记录

Geo转换时,同一批协程中,相同的地名只请求一次

  • 完善地名缓存,可以保存到DB
  • ConnectionError: HTTPConnectionPool(host='api.geonames.org', port=80): Max retries exceeded with url: /searchJSON?username=liushuaikobe&q=K%C3%B6ln%2C+Germany&maxRows=5 (Caused by <class 'socket.error'>: [Errno 54] Connection reset by peer)

V2.0 Requests异常捕获错误

Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/gevent/greenlet.py", line 390, in run
    result = self._run(*self.args, **self.kwargs)
  File "/Users/liushuai/git/PythonProject/GitArchiveUtils/daily-task/normalize.py", line 153, in search
    r = requests.get('http://api.geonames.org/searchJSON', params=params)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 357, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Python/2.7/site-packages/requests/sessions.py", line 460, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Python/2.7/site-packages/requests/adapters.py", line 354, in send
    raise ConnectionError(e)
ConnectionError: HTTPConnectionPool(host='api.geonames.org', port=80): Max retries exceeded with url: /searchJSON?username=liushuaikobe&q=The+Moon.+Also+London.&maxRows=1 (Caused by <class 'socket.error'>: [Errno 60] Operation timed out)
<Greenlet at 0x101adb730: <bound method Normalizer.search of <normalize.Normalizer object at 0x101afc390>>('The Moon. Also London.')> failed with ConnectionError

用户个人信息的更新问题

Now:

在处理记录的时候,如果发现该记录的actor已经在DB中,则直接删除该记录的actor_attributes项,不会去更新用户的各种信息。

需要想出一种更新用户信息的解决办法。

记录Redis缓存命中情况

在地名解析的时候,记录总的请求次数, 缓存的命中次数等信息。
将信息保存到Redis中。
可以按天归档,存到另一个数据库中。

MySQL高并发的一系列问题

比如这样的问题,有人建议用on duplicate key update,但是第一版就采用的这种方法,会出现Deadlock的错误,发生的频率非常大。
还有一种解决方案是在程序中判断记录是否存在。
可能是因为我对MySQL不熟悉,它的锁的机制不会,决定放弃MySQL,使用mongoDB。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.