Comments (2)
Note in particular that having launch-this-crawl tasks and update-this-access-service tasks tied together is not a great idea. e.g. on DEV when we want to try one without running the other.
from ukwa-services.
I've separated out the crawl launcher, and I think that's good enough for now. Note that using things like External Task etc. don't really help because it means e.g. the launcher can't run because the other one didn't. Instead, it makes sense for them to be separate and for the launch source files to be atomically updated.
from ukwa-services.
Related Issues (20)
- Add Airflow DAG to pull Nominet data and upload to HDFS HOT 2
- Make OutbackCDX console output easier to control
- Unclear canonical URLs
- DAG to fix up Twitter URLs
- Update NPLD Access Stack deployment configuration
- Update Airflow to 2.5.3 HOT 1
- Use rclone for DC move-to-hdfs HOT 2
- Ensure BETA Wayback ACLs can be easily kept up to date
- Add OpenTelemetry tracing to track and understand what's happening during web stack activity
- Deploy updated API including Collections
- Deploy new URL search UI for main website
- Update translations ahead of LD Access final deployment
- Crawl maintenance tools for re-crawls
- Be more systematic about excluding web archives from crawl activity
- Update warc-servers to reboot from time to time
- Update Cambridge IP addresses HOT 2
- Make jobs that rely on TrackDB last-modified dates more robust
- Deploy Browsertrix Cloud HOT 1
- Integrate the results from Browsertrix Cloud
- Add support for keeping the Geo-IP database updated for Domain Crawls
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ukwa-services.