Comments (7)
@sebbalex found out that we have orgs in manual-reuse.yml
that are already in the onboarding list. That explains the locking of the repos because the crawler is trying to update the same local copy concurrently.
We need to:
- Remove the duplicate orgs
- Check for duplicates before starting the crawl process, for good measure.
from publiccode-crawler.
That caused by git not being able to connect to the repo: Failed to connect to github.com port 443: Connection timed out
.
I think we are DoS-ing ourselves launching a go routine for every repo and getting starved for network bandwidth.
from publiccode-crawler.
@sebbalex IMO we can close this one. #189 and #191 solved the root cause and #188 now provides the complete error message.
from publiccode-crawler.
@sebbalex What errors are we getting lately?
from publiccode-crawler.
Issue is still there, today we got 6 repositories in error, detail:
time="2020-10-06T00:11:04Z" level=error msg="[UniversitaDellaCalabria/uniAuth] error while cloning: cannot git pull the repository: exit status 128: fatal: Unable to create '/var/crawler/data/repos/api.github.com/UniversitaDellaCalabria/uniAuth/gitClone/.git/index.lock': File exists.\n\nAnother git process seems to be running in this repository, e.g.\nan editor opened by 'git commit'. Please make sure all processes\nare terminated then try again. If it still fails, a git process\nmay have crashed in this repository earlier:\nremove the file manually to continue.\n"
from publiccode-crawler.
@sebbalex What's the timestamp of the lockfile?
from publiccode-crawler.
As we discussed offline I could not find any lock files in the disk. We need to dig deeper.
from publiccode-crawler.
Related Issues (20)
- Refactor the logic around clientApi.go
- **[suggestion]** We could log also the name/id of the publisher not found in the whitelist? HOT 1
- Vitality for metarepos is off HOT 1
- Use err.Message() in order to return only the error message instead all the error stack
- Rename the repo to publiccode-crawler HOT 1
- Drop Elasticsearch as state and use the API HOT 1
- [meta] The crawler shouldn't know about IndicePA (ipa) HOT 1
- Check for regressions in CI HOT 2
- The docker image is too big
- Make the crawler even more generic HOT 4
- Path to the list of third-party repositories HOT 1
- Improve the printed output HOT 1
- Add the helm chart to the repo HOT 1
- Crashes when publiccode.yml is a symlink or a directory
- Make the crawler work when there's no config.toml
- Must create a new software entity in the API even if the publiccode.yml is invalid
- Enable all the linters
- Flat vitality index HOT 3
- Give more insights about the vitality index
- Add tests for the vitality index calculation?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from publiccode-crawler.