Comments (4)
- All the old URLs are dead and gone now? (for desktop)
Of the 1.3M desktop URLs we've been testing for ~6 months, 1M of them are still in the corpus. We considered including the remaining 300k URLs for consistency across crawls but opted not to for simplicity.
(we = myself, @igrigorik @pmeenan @paulcalvano)
- Presumably you're doing the same thing for mobile?
Yes. We plan to roll that out next month.
- @brendankenny said there weren't a lot of https urls in the old HA url list.. Whats the balance in this new URL list?
This list is 60% HTTPS:
SELECT
SUM(IF(STARTS_WITH(url, 'https'), 1, 0)) / COUNT(0) AS pct_https
FROM
`httparchive.urls.2018_12_15_desktop`
- We thought the 1.5M run was barely finishing in under 2 weeks. Is there new WPT capacity being added to handle this load?
Yeah, we used to run each page 3 times on desktop, 3 on mobile, and 1 on Lighthouse. As of December 1 we're doing it 1 time on all 3 settings. That affords us the space to increase the desktop corpus. To afford the mobile increase, we're going to reduce the crawl frequency from 15 days to monthly. I wrote up a short summary / sanity check in this doc.
from httparchive.org.
mysql> select count(0) from urlsdev;
+----------+
| count(0) |
+----------+
| 3880557 |
+----------+
1 row in set (0.00 sec)
mysql> select count(0) from urlsmobile;
+----------+
| count(0) |
+----------+
| 1294054 |
+----------+
1 row in set (0.44 sec)
from httparchive.org.
Random questions:
- All the old URLs are dead and gone now? (for desktop)
- Presumably you're doing the same thing for mobile?
- @brendankenny said there weren't a lot of https urls in the old HA url list.. Whats the balance in this new URL list?
- We thought the 1.5M run was barely finishing in under 2 weeks. Is there new WPT capacity being added to handle this load?
from httparchive.org.
Sounds great! Thanks for the answers.
from httparchive.org.
Related Issues (20)
- Some reports have failed for 2022_05_01 HOT 12
- May 2022 Loading Speed Graphs show huge improvement on previous months. HOT 4
- Missing histogram data for Nov 15, 2010 HOT 10
- Page indexing issues detected on httparchive.org HOT 1
- Add a change log entry about response bodies in May 2022 HOT 3
- Should we track the newer image formats? HOT 2
- Video bytes and requests not displayed for a single selected month HOT 2
- Link in gettingstarted_bigquery.md leads to error HOT 3
- Improve social metadata
- Some reports have failed for 2022_07_01 HOT 1
- Some reports have failed for 2022_07_01 HOT 1
- Incorrect units in chart titles. HOT 3
- Some reports have failed for 2022_08_01 HOT 2
- Some reports have failed for 2022_10_01 HOT 1
- Getting Started guide might not be correct anymore
- Some reports have failed for 2022_11_01 HOT 1
- All reports have failed for 2022_12_01 HOT 1
- Some reports have failed for 2023_01_01 HOT 6
- BigQuery extract of all datapoints for top and worst 1,000
- Store Technology meta data in HTTP Archive HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from httparchive.org.