Giter VIP home page Giter VIP logo

drupalreleasedate's Introduction

Drupal Release Date

System for tracking issue counts against the next version of Drupal core, and estimating a release date based on collected samples.

Access the site at http://drupalreleasedate.com/

Build Status Coverage Status

Data API

Public JSON feeds are provided for access to all of the site's data.

For more information on available endpoints and the response format, view the API documentation.

Installation and Setup

  1. Install dependencies with Composer by running composer install in the root directory of the project
  2. Copy config/default.config.php to config/config.php, and adjust as needed
  3. Run bin/console install to set up the database
  4. Configure Apache to serve web as the document root

Running Tests

PHPUnit tests can be run with vendor/bin/phpunit

drupalreleasedate's People

Contributors

gapple avatar luenemam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

drupalreleasedate's Issues

Dynamic HTTP Cache expiry on data responses

Right now caching is set to static values, even though we generally know the exact time that data will change. This results in caches being less effective and precise than possible.
For example:

  • Sample data is generated at 06:00
  • Request is sent at 06:15, and response is set to one hour expiry
  • Request is sent at 07:30
    • Previous response has expired, so cannot be re-used even though no new data is available
  • Request is sent at 11:55, and response is set to one hour expiry
  • Sample data is generated at 12:00
  • Request is sent at 12:15
    • Previous response is still cached, so new data is available but not returned.

Given that we know the interval between sample fetches and estimate generations, we can set the max-age to the interval between the current time and the time new data is expected to be available (or slightly less).
For example:

  • Sample data is generated at 06:00
  • Request is sent at 06:15
    • Response is cached and set to expire in 5:45 (max-age: 20700)
  • Request is sent at 11:45
    • Response hits cache and is returned with max-age: 20700; Age: 19800, so will only be cached for a further 15 minutes
  • Sample data is generated at 12:00
  • Request is sent at 12:15
    • Previous entry has expired, so a new response is sent with a long expiry period

If a new sample isn't available when expected the max-age can be set to a moderate value until the new data is available. This is more feasible for samples (where the new values will be available in a matter of minutes) and not so well for estimates (which can take hours).

Estimates throw a bit of a kink in this method though, since a placeholder value is stored based on the last sample it was generated from, and the actual estimate may not be available for a few hours afterwards (and there's no indication of the difference between an incomplete estimate and a script timeout).
Samples also have a very short window where the estimates row is stored but not all the related estimate_values rows, but this should be solvable by utilizing a transaction.

Automatically manage versions

Currently a code update is required to track a new minor version, so when the new minor version branch is created and issues automatically migrated there is a bit of a gap in the data.

Probably best to fetch info on available versions from the API (also see #19) before collecting stats.

Old minor releases still have a few issues assigned (e.g. 8.0 currently has 4 issues, versus ~2800 for 8.4), but are not supported. Tracking could be stopped on these branches once they are EOL.

Use local storage for data

The actual size of data transfer is still reasonably small (especially with gzip on the json responses; if I'm reading the headers right, all samples are 220kb of data and compress down to ~18kb), but with the new parameters available on the data endpoints it should be possible to store information locally, and retrieve only newer values.

Send notification when fetching counts fails

Due to #7, the site wasn't picking up counts for ~10 days without being noticed.

Upon fetch failure the site should send an email or tweet so that status can be reviewed and fixed more rapidly.

Use Drupal.org API

Data on Drupal.org is now available as JSON through an API:

This would circumvent some of the fragility of parsing the full XHTML pages of the issue queue, and it includes paging information, so it's possible to go back to two requests per count (first page + last page), instead of iterating through every page to find the end.

By default the responses include a lot of information on each issue (e.g. full issue description, attached files, comment ids...) but this can be limited by providing full=0 as a parameter.

It looks like the parameters use internal field names and IDs rather than the nice names that the front end uses, so the configuration will need to be updated or mapped (e.g. priorities => field_issue_priority).
The info page also doesn't indicate how to specify multiple parameters (e.g. a set of statuses).

e.g. https://www.drupal.org/api-d7/node.json?type=project_issue&field_project=3060&field_issue_status=8&field_issue_version=8.0.x-dev&field_issue_priority=200&full=0

Update homepage with latest RC available

The Drupal.Org API should have sufficient information to keep the homepage updated with the latest available release.

This query gets the information down to all 8.0.x releases available:
https://www.drupal.org/api-d7/node.json?type=project_release&field_release_project=3060&field_release_version_major=8&field_release_version_minor=0&field_release_build_type=static

Might be able to get away with just sorting based on creation date, otherwise it could be done based on the version fields (since we may not be able to trust the order within the array of results).

Add date restriction parameters to API requests

The /data/samples.json and /samples/estimates.json API endpoints could benefit from accepting from and to parameters to restrict the returned data.

This could particularly help the samples chart, which is reasonably slow to render with the large number of current samples (~1000).

Prevent run-away estimate iterations

Capping iterations to an individual run time or maximum estimate length (e.g. 50 years) could prevent a few iterations from monopolizing the calculation time, and provide the opportunity for additional iterations to be run.

There are two possibilities for these capped iterations:

  • Record as the capped date (or maybe the iteration's current date for timed-out runs)
  • Treat as errors, in the way that hitting the issue count threshold is currently done.

I think it makes sense to include these iterations in the median calculation (shifting it to a later date). Failed runs due to hitting the issue threshold are currently not included in the median but, by this logic, including them as well make sense.

Allow tracking additional criteria, not just Category & Priority

There are other metrics that may be informative to track, even if they aren't factored into the release estimation, such as beta blockers or the core initiatives.

  1. The current samples table doesn't support additional criteria well, since it requires a new column for each value that is tracked, and should be normalized with a samples_values table that stores key-value pairs for configured criteria
  2. The current code makes some assumptions based on only Category and Priority changing between fetching values, and should be changed to isolate all parameters between each request.
  3. Configuration is currently in static class variables, and should be moved outside of the class and injected instead.

Allow for embedding chart on other websites

...such as Drupal.org ;)

I'm in the process of revamping https://drupal.org/community-initiatives/drupal-core to be more focused on getting the release out the door and would love to stick a big-ass graph like this right front-and center (credited to you, of course).

I tried naively copy/pasting the JS from drupalreleasedate.com but it won't work because Drupal.org gives errors about the mixed mode (https vs. http) that happens when trying to grab http://drupalreleasedate.com/data/samples.json from there.

Any ideas on how I could make this happen? Thanks so much for putting this together! :D

Estimate Weighting

With the current linear probability for picking samples, recent (consistent) changes to momentum will be under-represented in the estimate date.

The simple solution is to provide a static weighting function to the sample selection that increases the probability of picking more recent samples over past samples, but would have to be balanced to not introduce too much volatility with momentum changes.
It would probably be beneficial for the weighting to be time-aware, due to the changing periods between samples over time.

Update to be aware of core's change to semantic versioning

Now that core is using Semantic Versioning branches, some issues are getting moved to 8.1.x-dev. Since the 8.x issue filter includes these issues still, it isn't an accurate representation of the issues holding up the 8.0.0 release.
This currently doesn't affect the critical issue count (๐Ÿ˜Œ), but other counts will continue to diverge as issues are triaged to later minor releases.

  • Update database fields
  • Update issue count fetching
  • Update estimate generation
  • Update API responses

Improve increasing issue count threshold

Currently if an estimate iteration reaches 10 times the current issue count it will fail. This is fine for now while the issue count is still near its all-time peak, but could become increasingly problematic as the issue count decreases and allows a smaller range.

Multiple estimation methods

The Monte Carlo estimation is currently ineffective, due to the recent rise in issue count overwhelming the overall downward trend.

Using a simpler estimation method can provide a value until the Monte Carlo method is able to produce an estimation and confidence interval again.

Track estimate time limit internally

Right now PHP's time limit is relied on to prevent run-away estimates, but this results in there being no indication of the difference between an estimate still being processed and one that has timed out. If the simulation keeps track of its own time limitation, it could raise an exception when it is reached so that the caller can handle it.

It could also then be possible to pass the partial estimate data within the exception for storage and later analysis.

Fetching is broken again

Looks like there is an error parsing the issue count page, preventing counts from being updated

Consistent coding style

Some newer classes started with PSR code style, but many classes need to be completely or partially updated.

Add tracking 'days-to' to statistics

You may add tracking that will present how much days are left to release (as of Aug 5th to estimated Sep 1 is 26 days left to release) so you may track if estimate is approaching or delaying.

Allow running fetch / update tasks through console

Using the console won't tie up an Apache process for an extended period of time and would allow running at a lower priority, avoiding tying up resources for more time-sensitive tasks.

The best way is probably to pull the code out of the controller and make it callable from both the controller and a console command, but the console should be able to call the controller directly too.

Add stats for closed and opened

Sometimes it looks like there's not much activity, numbers going slightly up and down.
If it was possible to also show "closed/opened today|week|month", this would be more informative.

Calculation does not account for Drupal Core release windows.

As per documentation

https://www.drupal.org/core/dev-cycle

  • Release windows for Drupal core are on the first and third Wednesdays of each month. For stable releases, the first Wednesday is a bugfix release window and the third Wednesday is a security release window.
  • Once we reach zero critical issues, the first release candidate is tagged on the next of these release windows (whether bugfix or security) to sync up the Drupal 8 release schedule with that of Drupal 6 and 7. (If the next release window is less than 72 hours away, the following one will be used instead.)

At this moment, the calculation does not account for these release windows, showing days outside the release windows instead

Replace logarithmic issue graph with separate linear graph for majors and criticals.

We made huge progress with the critical issue count, as the stats clearly show (down from 130 in Nov to 70 now).

However, the current graph does not really show that.

It might have been a good way to show beta blockers and the others on the same graph before, but I think it is just confusing and misleading now.

Suggestion: display two separate linear graphs, one for majors and one for criticals.

Parallelize estimate calculation

Now that the estimate calculation can be run through the console (#12), it's feasible to parallelize it to maximize use of available processing capacity but run it at a lower priority to prevent affecting time-dependent tasks (such as serving the site itself).

http://pthreads.org/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.