Giter VIP home page Giter VIP logo

archiver_toolkit's Introduction

How to archive content from mozilla.org

1. Get commit access to the necessary source trees in SVN

  • If you are not already a Mozilla committer, you will need to go through the standard commit process: http://www.mozilla.org/hacking/committer/
  • Then you can file a bug to get access to the website-archive tree. Here is a bug template: http://mzl.la/1bcLR7d
  • You will need your access to be approved (in the bug) by someone with authority over the content. Jennifer Bertsch, Chris More, or David Boswell can all vouch for you.

If you intend to also delete this content from its original location (see the last step), file a bug to get access to the mozilla.org/mozilla.com SVN trees. Here is a bug template: http://mzl.la/17JeXr4

2. Clone the archiver repository

If you're reading this README, you're in the right place. Clone this repo:

git clone https://github.com/mozilla/archiver_toolkit.git

3. Checkout the website-archive SVN tree

In the github folder you created in step 2, checkout the SVN folder that contains archived www.mozilla.org content:

svn checkout https://svn.mozilla.org/projects/website-archive.mozilla.org/www.mozilla.org

4. Put some new content into the SVN tree

The goal of this effort is to add something to the website-archive SVN tree. Use wget to spider a subdirectory of www.mozilla.org. The below command does it exactly right. It contains two parameters you must adjust:

  • www.mozilla.org/devpreview_releasenotes should be adjusted to www.mozilla.org/some_descriptive_folder_name

  • http://www.mozilla.org/projects/devpreview/releasenotes/ should be adjusted to http://www.mozilla.org/the_path_to_the_folder_you_are_archiving

      wget -e robots=off -w 1 --mirror -p --adjust-extension --no-parent --convert-links --no-host-directories \
      -P www.mozilla.org/devpreview_releasenotes \
      http://www.mozilla.org/projects/devpreview/releasenotes/
    

5. Process the HTML you just retrieved

We like to make a handful of minor changes to archived content. We get rid of the search tool, since it is not guaranteed to work forever. And we add a message at the top of the page explaining that the content is archived. The archiver_toolkit repository contains a python command-line tool that processes these files. To run it...

  1. (optional) Set up a virtualenv to isolate this repository's libraries from your system libraries:

     virtualenv --no-site-packages venv && . venv/bin/activate
    
  2. Install required libraries:

     pip install -r requirements.txt
    
  3. Run the script. Pass it the path to the new content:

     python process_files.py www.mozilla.org/some_descriptive_folder_name/
    

6. Review the processed code

Open the .html files in your local copy of the SVN tree you have downloaded and modified. In Firefox, use the "File->Open" menu to find them and browse them. Some things to look for:

  • Does the archival message appear at the top of every page?
  • Do links to other local content work?
  • Do links to remote content work?
  • Do the pages look like their counterparts on the live server (visible images, working layout, etc.)?

7. Commit your changes

Now that the content is ready to be archived, commit it to the website-archive SVN tree:

svn add www.mozilla.org/some_descriptive_folder_name
svn commit www.mozilla.org/some_descriptive_folder_name

The changes will be automatically deployed in about 15 minutes.

8. Verify the deploy

Once the automatic deploy from step 6 happens (usually within 15 minutes), visit the new URL and make sure things work as expected. In the example above, the URL will be:

http://website-archive.mozilla.org/www.mozilla.org/devpreview_releasenotes/projects/devpreview/releasenotes/

9. Redirect the original content to the archives

Requests for the archived content at www.mozilla.org should now be redirected to the archival site. This requires changes to the global.conf file in the Bedrock code repository (see "How to Contribute"). If you are not comfortable with this step, you can open a new bug for this change.

  1. Clone the Bedrock code repository

  2. Change the etc/httpd/global.conf file -- add a valid RewriteRule (see Apache documentation). In the example above, this would be...

     RewriteRule ^/projects/devpreview/releasenotes(.*)$ http://website-archive.mozilla.org/www.mozilla.org/devpreview_releasenotes/projects/devpreview/releasenotes$1 [L,R=301]
    
  3. Commit your changes and submit a pull request.

10. Verify the redirect

Once the new RewriteRule is in production, requests to the original URL should redirect to the archive URL. Wait until this is true.

11. Delete the original content from the www.mozilla.org SVN tree

The final step is to remove the old content from the www.mozilla.org SVN tree. If you are not comfortable with this step, you can open a new bug for this change.

  1. Checkout the relevant SVN tree (.org, .com).

  2. Remove the folder you've archived.

  3. Commit your changes.

  4. Work with the Web Productions team to get your changes merged onto the staging and production systems.

archiver_toolkit's People

Contributors

hoosteeno avatar jgmize avatar sancus avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

archiver_toolkit's Issues

Wiki changes

FYI: The following changes were made to this repository's wiki:

  • defacing spam has been removed

  • the wiki has been disabled, as it was not used

These were made as the result of a recent automated defacement of publically writeable wikis.

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please reach out to [email protected].

(Message COC001)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.