gapple / drupalreleasedate Goto Github PK

System for tracking the Drupal core issue queue

License: MIT License

PHP 74.88% CSS 0.76% HTML 24.03% Shell 0.33%

drupalreleasedate's Introduction

Drupal Release Date

System for tracking issue counts against the next version of Drupal core, and estimating a release date based on collected samples.

Access the site at http://drupalreleasedate.com/

Data API

Public JSON feeds are provided for access to all of the site's data.

For more information on available endpoints and the response format, view the API documentation.

Installation and Setup

Install dependencies with Composer by running composer install in the root directory of the project
Copy config/default.config.php to config/config.php, and adjust as needed
Run bin/console install to set up the database
Configure Apache to serve web as the document root

Running Tests

PHPUnit tests can be run with vendor/bin/phpunit

drupalreleasedate's People

Contributors

Stargazers

Watchers

Forkers

luenemam bloodndn solkem swarad07 pingers alimac

drupalreleasedate's Issues

Dynamic HTTP Cache expiry on data responses

Right now caching is set to static values, even though we generally know the exact time that data will change. This results in caches being less effective and precise than possible.
For example:

Sample data is generated at 06:00
Request is sent at 06:15, and response is set to one hour expiry
Request is sent at 07:30
- Previous response has expired, so cannot be re-used even though no new data is available
Request is sent at 11:55, and response is set to one hour expiry
Sample data is generated at 12:00
Request is sent at 12:15
- Previous response is still cached, so new data is available but not returned.

Given that we know the interval between sample fetches and estimate generations, we can set the max-age to the interval between the current time and the time new data is expected to be available (or slightly less).
For example:

Sample data is generated at 06:00
Request is sent at 06:15
- Response is cached and set to expire in 5:45 (max-age: 20700)
Request is sent at 11:45
- Response hits cache and is returned with max-age: 20700; Age: 19800, so will only be cached for a further 15 minutes
Sample data is generated at 12:00
Request is sent at 12:15
- Previous entry has expired, so a new response is sent with a long expiry period

If a new sample isn't available when expected the max-age can be set to a moderate value until the new data is available. This is more feasible for samples (where the new values will be available in a matter of minutes) and not so well for estimates (which can take hours).

Estimates throw a bit of a kink in this method though, since a placeholder value is stored based on the last sample it was generated from, and the actual estimate may not be available for a few hours afterwards (and there's no indication of the difference between an incomplete estimate and a script timeout).
Samples also have a very short window where the estimates row is stored but not all the related estimate_values rows, but this should be solvable by utilizing a transaction.

Move contents of the about page to homepage

So little content there. Wouldn't this easily fit the Homepage?

And the GET INVOLVED part deserves more attention...

Account for new "Plan" issue category

@xjmdrupal
Update your #Drupal.org issue queue bookmarks; there's a new "Plan" category! https://www.drupal.org/node/1815826 (serial ID 5) /cc @gappleca

1:44 PM - 9 Jun 2015

I've updated the site config to fetch the issue count for the new category, but the estimator and charts don't make use of the new value yet.

Migrate to alternative charting library

Google Charts is somewhat inflexible, and there is additional information that might be helpful to place on the charts that it is just not capable of.

Probably minor, but it also doesn't seem to cache in the browser well, with the dynamic loader, and charts can't be rendered when developing offline.

D3.js: https://github.com/mbostock/d3/wiki/Gallery

Automatically manage versions

Currently a code update is required to track a new minor version, so when the new minor version branch is created and issues automatically migrated there is a bit of a gap in the data.

Probably best to fetch info on available versions from the API (also see #19) before collecting stats.

Old minor releases still have a few issues assigned (e.g. 8.0 currently has 4 issues, versus ~2800 for 8.4), but are not supported. Tracking could be stopped on these branches once they are EOL.

Upgrade tests to PHPUnit 4

https://github.com/sebastianbergmann/phpunit/wiki/Release-Announcement-for-PHPUnit-4.0.0

Use local storage for data

The actual size of data transfer is still reasonably small (especially with gzip on the json responses; if I'm reading the headers right, all samples are 220kb of data and compress down to ~18kb), but with the new parameters available on the data endpoints it should be possible to store information locally, and retrieve only newer values.

Send notification when fetching counts fails

Due to #7, the site wasn't picking up counts for ~10 days without being noticed.

Upon fetch failure the site should send an email or tweet so that status can be reviewed and fixed more rapidly.

Implement Content-Security-Policy header

http://www.html5rocks.com/en/tutorials/security/content-security-policy/

Get rid of inline JS / CSS

Followup to #34, to enable removing 'unsafe-inline' from the content security policy.

Use Drupal.org API

Data on Drupal.org is now available as JSON through an API:

https://www.drupal.org/about-drupalorg/api

This would circumvent some of the fragility of parsing the full XHTML pages of the issue queue, and it includes paging information, so it's possible to go back to two requests per count (first page + last page), instead of iterating through every page to find the end.

By default the responses include a lot of information on each issue (e.g. full issue description, attached files, comment ids...) but this can be limited by providing full=0 as a parameter.

It looks like the parameters use internal field names and IDs rather than the nice names that the front end uses, so the configuration will need to be updated or mapped (e.g. priorities => field_issue_priority).
The info page also doesn't indicate how to specify multiple parameters (e.g. a set of statuses).

e.g. https://www.drupal.org/api-d7/node.json?type=project_issue&field_project=3060&field_issue_status=8&field_issue_version=8.0.x-dev&field_issue_priority=200&full=0

Update homepage with latest RC available

The Drupal.Org API should have sufficient information to keep the homepage updated with the latest available release.

This query gets the information down to all 8.0.x releases available:
https://www.drupal.org/api-d7/node.json?type=project_release&field_release_project=3060&field_release_version_major=8&field_release_version_minor=0&field_release_build_type=static

Might be able to get away with just sorting based on creation date, otherwise it could be done based on the version fields (since we may not be able to trust the order within the array of results).

Add date restriction parameters to API requests

The /data/samples.json and /samples/estimates.json API endpoints could benefit from accepting from and to parameters to restrict the returned data.

This could particularly help the samples chart, which is reasonably slow to render with the large number of current samples (~1000).

Prevent run-away estimate iterations

Capping iterations to an individual run time or maximum estimate length (e.g. 50 years) could prevent a few iterations from monopolizing the calculation time, and provide the opportunity for additional iterations to be run.

There are two possibilities for these capped iterations:

Record as the capped date (or maybe the iteration's current date for timed-out runs)
Treat as errors, in the way that hitting the issue count threshold is currently done.

I think it makes sense to include these iterations in the median calculation (shifting it to a later date). Failed runs due to hitting the issue threshold are currently not included in the median but, by this logic, including them as well make sense.

IE support

https://twitter.com/nicholasruunu/status/545610509593378817

Internet Explorer seems unable to parse the ISO8601 format dates

Allow tracking additional criteria, not just Category & Priority

There are other metrics that may be informative to track, even if they aren't factored into the release estimation, such as beta blockers or the core initiatives.

The current samples table doesn't support additional criteria well, since it requires a new column for each value that is tracked, and should be normalized with a samples_values table that stores key-value pairs for configured criteria
The current code makes some assumptions based on only Category and Priority changing between fetching values, and should be changed to isolate all parameters between each request.
Configuration is currently in static class variables, and should be moved outside of the class and injected instead.

Allow for embedding chart on other websites

...such as Drupal.org ;)

I'm in the process of revamping https://drupal.org/community-initiatives/drupal-core to be more focused on getting the release out the door and would love to stick a big-ass graph like this right front-and center (credited to you, of course).

I tried naively copy/pasting the JS from drupalreleasedate.com but it won't work because Drupal.org gives errors about the mixed mode (https vs. http) that happens when trying to grab http://drupalreleasedate.com/data/samples.json from there.

Any ideas on how I could make this happen? Thanks so much for putting this together! :D

Add links to the issue que

For critical etc...

Estimate Weighting

With the current linear probability for picking samples, recent (consistent) changes to momentum will be under-represented in the estimate date.

The simple solution is to provide a static weighting function to the sample selection that increases the probability of picking more recent samples over past samples, but would have to be balanced to not introduce too much volatility with momentum changes.
It would probably be beneficial for the weighting to be time-aware, due to the changing periods between samples over time.

Add range filter to charts

https://developers.google.com/chart/interactive/docs/gallery/controls#chartrangefilter

Update to be aware of core's change to semantic versioning

Now that core is using Semantic Versioning branches, some issues are getting moved to 8.1.x-dev. Since the 8.x issue filter includes these issues still, it isn't an accurate representation of the issues holding up the 8.0.0 release.
This currently doesn't affect the critical issue count (😌), but other counts will continue to diverge as issues are triaged to later minor releases.

Update database fields
Update issue count fetching
Update estimate generation
Update API responses

Predictable number generator for use in unit tests

There are frequently intermittent test failures due to tests that rely on random results, and the necessity of averaging over a large number of samples increases test times.

Handle Drupal.org D7 upgrade

It looks like the vocabulary structure has changed for the Drupal.org upgrade

e.g.

"Critical" Priority: 1 -> 400
"Bug report" category: bug -> 1

Ideally the change could be detected and handled automatically, which could be simple as the D7 site currently includes <meta name="Generator" content="Drupal 7 (http://drupal.org)" />

https://drupal.org/node/2085755
https://drupal.org/project/d7qa

Better configuration management

https://github.com/igorw/ConfigServiceProvider

Import historical data

xjm posted some historical numbers for D8 issue counts, and even provided the created / fixed dates for issues.
http://xjm.drupalgardens.com/blog/technical-debt-drupal-8-or-when-will-it-be-ready

This data could be used to fill in samples before collection was started

Improve increasing issue count threshold

Currently if an estimate iteration reaches 10 times the current issue count it will fail. This is fine for now while the issue count is still near its all-time peak, but could become increasingly problematic as the issue count decreases and allows a smaller range.

Remove / hide estimates before 12th Dec

The graph at https://drupalreleasedate.com/chart/estimates shows some fairly wacky estimates made before 12th Dec, which means the scale has to be adjusted to fit them in. If those estimates can be hidden then the remainder of the graph will be more informative.

Use navigator.sendBeacon for analytics events

https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#useBeacon

Multiple estimation methods

The Monte Carlo estimation is currently ineffective, due to the recent rise in issue count overwhelming the overall downward trend.

Using a simpler estimation method can provide a value until the Monte Carlo method is able to produce an estimation and confidence interval again.

"Invalid Date" error in Chrome on OSX

https://twitter.com/nicholasruunu/status/545610509593378817
https://twitter.com/adammalone/status/552650859364888577

Use UserTiming API (or polyfill) for performance measurement

http://www.w3.org/TR/user-timing/
https://github.com/nicjansma/usertiming.js

Track estimate time limit internally

Right now PHP's time limit is relied on to prevent run-away estimates, but this results in there being no indication of the difference between an estimate still being processed and one that has timed out. If the simulation keeps track of its own time limitation, it could raise an exception when it is reached so that the caller can handle it.

It could also then be possible to pass the partial estimate data within the exception for storage and later analysis.

Split console 'cron' command into individual commands

Fetching samples and generating an estimate should be their own commands, rather than options to a cron command.

Fetching is broken again

Looks like there is an error parsing the issue count page, preventing counts from being updated

Consistent coding style

Some newer classes started with PSR code style, but many classes need to be completely or partially updated.

Add tracking 'days-to' to statistics

You may add tracking that will present how much days are left to release (as of Aug 5th to estimated Sep 1 is 26 days left to release) so you may track if estimate is approaching or delaying.

Allow running fetch / update tasks through console

Using the console won't tie up an Apache process for an extended period of time and would allow running at a lower priority, avoiding tying up resources for more time-sensitive tasks.

The best way is probably to pull the code out of the controller and make it callable from both the controller and a console command, but the console should be able to call the controller directly too.

Add stats for closed and opened

Sometimes it looks like there's not much activity, numbers going slightly up and down.
If it was possible to also show "closed/opened today|week|month", this would be more informative.

Calculation does not account for Drupal Core release windows.

As per documentation

https://www.drupal.org/core/dev-cycle

Release windows for Drupal core are on the first and third Wednesdays of each month. For stable releases, the first Wednesday is a bugfix release window and the third Wednesday is a security release window.
Once we reach zero critical issues, the first release candidate is tagged on the next of these release windows (whether bugfix or security) to sync up the Drupal 8 release schedule with that of Drupal 6 and 7. (If the next release window is less than 72 hours away, the following one will be used instead.)

At this moment, the calculation does not account for these release windows, showing days outside the release windows instead

Replace logarithmic issue graph with separate linear graph for majors and criticals.

We made huge progress with the critical issue count, as the stats clearly show (down from 130 in Nov to 70 now).

However, the current graph does not really show that.

It might have been a good way to show beta blockers and the others on the same graph before, but I think it is just confusing and misleading now.

Suggestion: display two separate linear graphs, one for majors and one for criticals.

Parallelize estimate calculation

Now that the estimate calculation can be run through the console (#12), it's feasible to parallelize it to maximize use of available processing capacity but run it at a lower priority to prevent affecting time-dependent tasks (such as serving the site itself).

http://pthreads.org/