Giter VIP home page Giter VIP logo

mpvd's Introduction

mpvd (Mozilla Products Vulnerability Dataset)

Update

  • The following information is up-to-date as of March 14, 2024.
  • The data will be updated periodically to account for new vulnerabilities provided in security advisories for updates to Mozilla products.

Disclaimer

  • The dataset is not 100% complete due to unavailable data or other issues encountered while scraping and parsing.
  • Noisy data might be present in relation to the provided vulnerable and fixed source code files. This noisy data comes from files that might be considered irrelevant such as those with certain file extensions or keywords within the filename (e.g. test, readme, etc.). Although steps have been taken to mitigate these files in the overall dataset, they might still exist in some manner.

Summary

This repository contains a dataset of source code files related to the known vulnerabilities in Mozilla products. Advisories are available for the following products: Firefox, Firefox ESR, Firefox for iOS, Firefox OS, Mozilla VPN, Thunderbird, Thunderbird ESR, and SeaMonkey.

Additional vulnerability data related to the security advisories as well as the associated Bug IDs are also included.

Cite

If you plan to use this data for your work, please provide a citation / reference.

Napier, K. Mozilla products vulnerability dataset [Data set]. https://github.com/krn65/mpvd

@misc{Napier_Mozilla_products_vulnerability, author = {Napier, Kollin}, title = {{Mozilla products vulnerability dataset}}, url = {https://github.com/krn65/mpvd} }

Descriptions

The product_data folder contains a .csv file for each product that provides the following data from available security advisories: version (of product that fixed vulnerability), CVE ID(s), advisory (ID from Mozilla), title, reporter, impact (defined by Mozilla), description, and Bug ID(s). The product security advisory data is sorted in descending order starting with the most recent version.

The product_bug_ids folder contains a .txt file for each product that provides a list of Bug IDs found within the security advisories for each product. There is also a file titled all_bug_ids-unique.txt that provides a combined list of all the Bug IDs from all products which is sorted with duplicates removed.

Only source code files that have the following file extensions are scraped and part of the dataset: .js, .java, .c, .cc, .cpp, .c++, .cp, .cxx, .h, .hh, .hpp, .py

The source_code-vulnerable.7z file contains a folder of "vulnerable" (older) versions of source code files for all products before a vulnerability was fixed. The source_code-fixed.7z file contains a folder of "fixed" (newer) versions of source code files for all products after a vulnerability was fixed.

  • Note: A scraped "vulnerable" source code file is not necessarily the original state of the file which created the vulnerability. It is just the last instance of the file where the vulnerability was still present (aka the parent revision which comes before the revision that fixed the vulnerability).

Each source code filename is labeled as bug_id-revision_id-status-original_filename.extension. The bug_id refers to the Bugzilla entry. The revision_id refers to the commit (Phabricator) or revision (Mercurial) ID for the files related to the Bugzilla entry of the Bug ID. The status refers to either vulnerable (old) or fixed (new) source code. The original_filename represents the name of the file that was changed. When writing the file, the filename was adjusted by replacing backslashes (\) with underscores (_). The extension refers to the file extension.

Both .7z files contain 8,440 source code files from all Mozilla products with a variety of file extensions. Only 2,657 of the total 3,466 unique Bug IDs are represented from the downloaded source code files. Only instances of source code files that have content (not empty) for both the fixed and vulnerable versions are included in the dataset. For example, if a source code file was created or deleted between a revision (current and parent), then that file is ignored. As stated earlier, this is due to unavailable data or other issues encountered while scraping or parsing the associated product security advisory data. Only Bugzilla entries that are public, have a status of Closed, and available attachments (table of revisions) are considered.

mpvd's People

Contributors

krn65 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.