liushuaikobe / gitarchiveutils Goto Github PK
View Code? Open in Web Editor NEWUtils for handling and parsing the data from http://www.githubarchive.org/
License: GNU General Public License v2.0
Utils for handling and parsing the data from http://www.githubarchive.org/
License: GNU General Public License v2.0
有的一个Event中是用户在本地提交了好几次之后,才push到远程。
因此应该考虑Payload中的数量。
处理下一个文件时会重新从config里面读取,可暂时忽略这个缺陷。
有以下几种情况:
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/gevent/greenlet.py", line 390, in run
result = self._run(*self.args, **self.kwargs)
File "/Users/liushuai/git/PythonProject/GitArchiveUtils/daily-task/normalize.py", line 153, in search
r = requests.get('http://api.geonames.org/searchJSON', params=params)
File "/Library/Python/2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 357, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 460, in send
r = adapter.send(request, **kwargs)
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 354, in send
raise ConnectionError(e)
ConnectionError: HTTPConnectionPool(host='api.geonames.org', port=80): Max retries exceeded with url: /searchJSON?username=liushuaikobe&q=The+Moon.+Also+London.&maxRows=1 (Caused by <class 'socket.error'>: [Errno 60] Operation timed out)
<Greenlet at 0x101adb730: <bound method Normalizer.search of <normalize.Normalizer object at 0x101afc390>>('The Moon. Also London.')> failed with ConnectionError
与SQL语句中的单引号冲突
在处理记录的时候,如果发现该记录的actor已经在DB中,则直接删除该记录的actor_attributes项,不会去更新用户的各种信息。
需要想出一种更新用户信息的解决办法。
在地名解析的时候,记录总的请求次数, 缓存的命中次数等信息。
将信息保存到Redis中。
可以按天归档,存到另一个数据库中。
比如这样的问题,有人建议用on duplicate key update
,但是第一版就采用的这种方法,会出现Deadlock的错误,发生的频率非常大。
还有一种解决方案是在程序中判断记录是否存在。
可能是因为我对MySQL不熟悉,它的锁的机制不会,决定放弃MySQL,使用mongoDB。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.