Comments (8)
#189 dramatically decreases the frequency of this happening.
Keeping this open though, because the root issue is not resolved.
from publiccode-crawler.
pcvalidator -export
expands the URLs only if -remote-base-url
is passed. This is documented in the source but not to the user.
@sebbalex is there a chance the crawler could run it with no or empty RemoteBaseURL
?
from publiccode-crawler.
@sebbalex is there a chance the crawler could run it with no or empty
RemoteBaseURL
?
RemoteBaseURL is needed to enforce absolute and relative url validation:
If left empty, absolute URLs will not be validated and no remote validation of files with relative paths will be performed.
furthermore there was no evidence of this in the past and since no changes were made in our codebase this confuses me.
from publiccode-crawler.
RemoteBaseURL is needed to enforce absolute and relative url validation:
Sorry I wasn't clear. I was trying to say that the crawler being run with an empty RemoteBaseURL for some exotic reasons could explain what we are seeing. I'm just as puzzled as you. 🤔
from publiccode-crawler.
In latest run I noticed about this timeout problems, I think this is related to URL expand issue we got here.
time="2020-09-24T08:35:11Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/AgID/rndt-joomla-template/master/documentation/images/logo-rndt.png: Get https://raw.githubusercontent.com/AgID/rndt-joomla-template/master/documentation/images/logo-rndt.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:12Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/AgID/rndt-catalogue/master/documentation/images/logo-rndt.png: Get https://raw.githubusercontent.com/AgID/rndt-catalogue/master/documentation/images/logo-rndt.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:13Z" level=error msg="Error parsing publiccode.yml: logo: HTTP GET failed for https://raw.githubusercontent.com/italia/18app/master/src/Italia.DiciottoApp.iOS/Assets.xcassets/AppIcon.appiconset/Icon120.png: Get https://raw.githubusercontent.com/italia/18app/master/src/Italia.DiciottoApp.iOS/Assets.xcassets/AppIcon.appiconset/Icon120.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:13Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/responsive_it.png: Get https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/responsive_it.png: dial tcp 151.101.36.133:443: i/o timeout\ndescription/en/screenshots: HTTP GET failed for https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/home_en.png: Get https://raw.githubusercontent.com/consiglionazionaledellericerche/cool-jconon/master/docs/screenshot/home_en.png: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:14Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/vvfosprojects/sovvf/master/doc/images/dashboard.jpg: Get https://raw.githubusercontent.com/vvfosprojects/sovvf/master/doc/images/dashboard.jpg: dial tcp 151.101.36.133:443: i/o timeout"
time="2020-09-24T08:35:14Z" level=error msg="Error parsing publiccode.yml: description/it/screenshots: HTTP GET failed for https://raw.githubusercontent.com/IstitutoCentraleCatalogoUnicoBiblio/Nuovo-Opac-di-Polo-SBN/master/screenshots/nuovo_opac.png: Get https://raw.githubusercontent.com/IstitutoCentraleCatalogoUnicoBiblio/Nuovo-Opac-di-Polo-SBN/master/screenshots/nuovo_opac.png: dial tcp 151.101.36.133:443: i/o timeout"
from publiccode-crawler.
#189 dramatically decreases the frequency of this happening.
Keeping this open though, because the root issue is not resolved.
We could consider that root cause was the amount of concurrency process and close this, wdyt @bfabio ?
from publiccode-crawler.
@sebbalex I'm not convinced, there must be something wrong in the code that doesn't handle git failures correctly and still resolves the URL as relative. Most (all?) of the failures where caused by concurrency, but the crawler should have stopped processing the repo as soon as they happened.
from publiccode-crawler.
This doesn't apply anymore.
After #302 the crawler doesn't touch publiccode.yml's contents, APIs consumers are now in charge of doing the expansion, if they need it.
from publiccode-crawler.
Related Issues (20)
- Refactor the logic around clientApi.go
- **[suggestion]** We could log also the name/id of the publisher not found in the whitelist? HOT 1
- Vitality for metarepos is off HOT 1
- Use err.Message() in order to return only the error message instead all the error stack
- Rename the repo to publiccode-crawler HOT 1
- Drop Elasticsearch as state and use the API HOT 1
- [meta] The crawler shouldn't know about IndicePA (ipa) HOT 1
- Check for regressions in CI HOT 2
- The docker image is too big
- Make the crawler even more generic HOT 4
- Path to the list of third-party repositories HOT 1
- Improve the printed output HOT 1
- Add the helm chart to the repo HOT 1
- Crashes when publiccode.yml is a symlink or a directory
- Make the crawler work when there's no config.toml
- Must create a new software entity in the API even if the publiccode.yml is invalid
- Enable all the linters
- Flat vitality index HOT 3
- Give more insights about the vitality index
- Add tests for the vitality index calculation?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from publiccode-crawler.