gapple / drupalreleasedate Goto Github PK
View Code? Open in Web Editor NEWSystem for tracking the Drupal core issue queue
Home Page: http://drupalreleasedate.com
License: MIT License
System for tracking the Drupal core issue queue
Home Page: http://drupalreleasedate.com
License: MIT License
There are frequently intermittent test failures due to tests that rely on random results, and the necessity of averaging over a large number of samples increases test times.
The graph at https://drupalreleasedate.com/chart/estimates shows some fairly wacky estimates made before 12th Dec, which means the scale has to be adjusted to fit them in. If those estimates can be hidden then the remainder of the graph will be more informative.
Data on Drupal.org is now available as JSON through an API:
This would circumvent some of the fragility of parsing the full XHTML pages of the issue queue, and it includes paging information, so it's possible to go back to two requests per count (first page + last page), instead of iterating through every page to find the end.
By default the responses include a lot of information on each issue (e.g. full issue description, attached files, comment ids...) but this can be limited by providing full=0
as a parameter.
It looks like the parameters use internal field names and IDs rather than the nice names that the front end uses, so the configuration will need to be updated or mapped (e.g. priorities
=> field_issue_priority
).
The info page also doesn't indicate how to specify multiple parameters (e.g. a set of statuses).
Google Charts is somewhat inflexible, and there is additional information that might be helpful to place on the charts that it is just not capable of.
Probably minor, but it also doesn't seem to cache in the browser well, with the dynamic loader, and charts can't be rendered when developing offline.
Right now PHP's time limit is relied on to prevent run-away estimates, but this results in there being no indication of the difference between an estimate still being processed and one that has timed out. If the simulation keeps track of its own time limitation, it could raise an exception when it is reached so that the caller can handle it.
It could also then be possible to pass the partial estimate data within the exception for storage and later analysis.
xjm posted some historical numbers for D8 issue counts, and even provided the created / fixed dates for issues.
http://xjm.drupalgardens.com/blog/technical-debt-drupal-8-or-when-will-it-be-ready
This data could be used to fill in samples before collection was started
Currently if an estimate iteration reaches 10 times the current issue count it will fail. This is fine for now while the issue count is still near its all-time peak, but could become increasingly problematic as the issue count decreases and allows a smaller range.
Capping iterations to an individual run time or maximum estimate length (e.g. 50 years) could prevent a few iterations from monopolizing the calculation time, and provide the opportunity for additional iterations to be run.
There are two possibilities for these capped iterations:
I think it makes sense to include these iterations in the median calculation (shifting it to a later date). Failed runs due to hitting the issue threshold are currently not included in the median but, by this logic, including them as well make sense.
Currently a code update is required to track a new minor version, so when the new minor version branch is created and issues automatically migrated there is a bit of a gap in the data.
Probably best to fetch info on available versions from the API (also see #19) before collecting stats.
Old minor releases still have a few issues assigned (e.g. 8.0 currently has 4 issues, versus ~2800 for 8.4), but are not supported. Tracking could be stopped on these branches once they are EOL.
@xjmdrupal
Update your #Drupal.org issue queue bookmarks; there's a new "Plan" category! https://www.drupal.org/node/1815826 (serial ID 5) /cc @gappleca
I've updated the site config to fetch the issue count for the new category, but the estimator and charts don't make use of the new value yet.
Right now caching is set to static values, even though we generally know the exact time that data will change. This results in caches being less effective and precise than possible.
For example:
Given that we know the interval between sample fetches and estimate generations, we can set the max-age to the interval between the current time and the time new data is expected to be available (or slightly less).
For example:
max-age: 20700
)max-age: 20700; Age: 19800
, so will only be cached for a further 15 minutesIf a new sample isn't available when expected the max-age can be set to a moderate value until the new data is available. This is more feasible for samples (where the new values will be available in a matter of minutes) and not so well for estimates (which can take hours).
Estimates throw a bit of a kink in this method though, since a placeholder value is stored based on the last sample it was generated from, and the actual estimate may not be available for a few hours afterwards (and there's no indication of the difference between an incomplete estimate and a script timeout).
Samples also have a very short window where the estimates
row is stored but not all the related estimate_values
rows, but this should be solvable by utilizing a transaction.
With the current linear probability for picking samples, recent (consistent) changes to momentum will be under-represented in the estimate date.
The simple solution is to provide a static weighting function to the sample selection that increases the probability of picking more recent samples over past samples, but would have to be balanced to not introduce too much volatility with momentum changes.
It would probably be beneficial for the weighting to be time-aware, due to the changing periods between samples over time.
The Drupal.Org API should have sufficient information to keep the homepage updated with the latest available release.
This query gets the information down to all 8.0.x releases available:
https://www.drupal.org/api-d7/node.json?type=project_release&field_release_project=3060&field_release_version_major=8&field_release_version_minor=0&field_release_build_type=static
Might be able to get away with just sorting based on creation date, otherwise it could be done based on the version fields (since we may not be able to trust the order within the array of results).
Sometimes it looks like there's not much activity, numbers going slightly up and down.
If it was possible to also show "closed/opened today|week|month", this would be more informative.
Due to #7, the site wasn't picking up counts for ~10 days without being noticed.
Upon fetch failure the site should send an email or tweet so that status can be reviewed and fixed more rapidly.
As per documentation
https://www.drupal.org/core/dev-cycle
At this moment, the calculation does not account for these release windows, showing days outside the release windows instead
For critical etc...
Some newer classes started with PSR code style, but many classes need to be completely or partially updated.
Looks like there is an error parsing the issue count page, preventing counts from being updated
It looks like the vocabulary structure has changed for the Drupal.org upgrade
e.g.
1
-> 400
bug
-> 1
Ideally the change could be detected and handled automatically, which could be simple as the D7 site currently includes <meta name="Generator" content="Drupal 7 (http://drupal.org)" />
https://drupal.org/node/2085755
https://drupal.org/project/d7qa
Using the console won't tie up an Apache process for an extended period of time and would allow running at a lower priority, avoiding tying up resources for more time-sensitive tasks.
The best way is probably to pull the code out of the controller and make it callable from both the controller and a console command, but the console should be able to call the controller directly too.
Followup to #34, to enable removing 'unsafe-inline'
from the content security policy.
There are other metrics that may be informative to track, even if they aren't factored into the release estimation, such as beta blockers or the core initiatives.
samples_values
table that stores key-value pairs for configured criteriaThe actual size of data transfer is still reasonably small (especially with gzip on the json responses; if I'm reading the headers right, all samples are 220kb of data and compress down to ~18kb), but with the new parameters available on the data endpoints it should be possible to store information locally, and retrieve only newer values.
We made huge progress with the critical issue count, as the stats clearly show (down from 130 in Nov to 70 now).
However, the current graph does not really show that.
It might have been a good way to show beta blockers and the others on the same graph before, but I think it is just confusing and misleading now.
Suggestion: display two separate linear graphs, one for majors and one for criticals.
The Monte Carlo estimation is currently ineffective, due to the recent rise in issue count overwhelming the overall downward trend.
Using a simpler estimation method can provide a value until the Monte Carlo method is able to produce an estimation and confidence interval again.
Fetching samples and generating an estimate should be their own commands, rather than options to a cron command.
You may add tracking that will present how much days are left to release (as of Aug 5th to estimated Sep 1 is 26 days left to release) so you may track if estimate is approaching or delaying.
The /data/samples.json
and /samples/estimates.json
API endpoints could benefit from accepting from
and to
parameters to restrict the returned data.
This could particularly help the samples chart, which is reasonably slow to render with the large number of current samples (~1000).
...such as Drupal.org ;)
I'm in the process of revamping https://drupal.org/community-initiatives/drupal-core to be more focused on getting the release out the door and would love to stick a big-ass graph like this right front-and center (credited to you, of course).
I tried naively copy/pasting the JS from drupalreleasedate.com but it won't work because Drupal.org gives errors about the mixed mode (https vs. http) that happens when trying to grab http://drupalreleasedate.com/data/samples.json from there.
Any ideas on how I could make this happen? Thanks so much for putting this together! :D
https://twitter.com/nicholasruunu/status/545610509593378817
Internet Explorer seems unable to parse the ISO8601 format dates
Now that the estimate calculation can be run through the console (#12), it's feasible to parallelize it to maximize use of available processing capacity but run it at a lower priority to prevent affecting time-dependent tasks (such as serving the site itself).
Now that core is using Semantic Versioning branches, some issues are getting moved to 8.1.x-dev
. Since the 8.x
issue filter includes these issues still, it isn't an accurate representation of the issues holding up the 8.0.0
release.
This currently doesn't affect the critical issue count (:relieved:), but other counts will continue to diverge as issues are triaged to later minor releases.
So little content there. Wouldn't this easily fit the Homepage?
And the GET INVOLVED part deserves more attention...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.