Giter VIP home page Giter VIP logo

search-terms-sanitization's Introduction

search-terms-sanitization

Code for evaluating and implementing search terms sanitization.

Working in this repo

Making commits

Open a PR and get one passing review before merging.

Directory structure

This repo's directory structure is minimal for now. We'll add more structure as we go.

.circleci CircleCI
nightly-job code for the sanitization job that runs nightly
assets public data like US Census surnames
non_sensitive analyses that do not involve sensitive search data
suggest_search_tools reusable python code for the research team

Set-up

  1. Request access to the [email protected] service account. This documentation describes how.
  2. Create a GCP-hosted notebook environment and clone this repo into it. This video tutorial demonstrates how.
  3. Optional: If you want to use the code in the suggest_search_tools/ directory as a python library, you can pip install it:
    cd search-terms-sanitization/  # make sure you're in the search-terms-sanitization/ directory
    pip install -e .               # -e installs in editable (develop) mode
    This is needed to run the notebooks in non_sensitive/.

Outputs

The nightly sanitization job writes data to

  • sanitized search terms: moz-fx-data-shared-prod.search_terms_derived.merino_log_sanitized_v3
  • the job metadata table: moz-fx-data-shared-prod.search_terms_derived.sanitization_job_metadata
  • the job metadata languages table: moz-fx-data-shared-prod.search_terms.sanitization_job_languages

Related artifacts

search-terms-sanitization's People

Contributors

chelseatroy avatar dzeber avatar quiiver avatar rebecca-burwei avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

search-terms-sanitization's Issues

Add open source software license

This Mozilla repository has been identified as lacking a license. Consistent with Mozilla's Licensing Policy an open source license should be applied to the code in this repository.

Please add an appropriate LICENSE.md file to the root directory of the project. In general, Mozilla's licensing policies are as follows:

  • Client-side products created by Mozilla employees or contributors should use the Mozilla Public License, Version 2.0 (MPL).

  • Server-side products or utilities that support Mozilla products may use either the MPL or the Apache License 2.0 (Apache 2.0).

In special cases, another license might be appropriate. If the repository is a fork of another repository it must apply the license of the original. Similarly, another license might be appropriate to match that of a broader project (for example Rust crates that Firefox depends on are published under an Apache 2.0 / MIT dual license, as that is the dual license used by the Rust programming language and projects).

Please ensure that any license added to the LICENSE.md file matches other licensing information in the repository (for example, it should match any license indicated in a setup.py or package.json file).

Mozilla staff can access more information in our Software Licensing Runbook – search for “Licensing Runbook” in Confluence to find it.

If you have any questions you can contact Daniel Nazer who can be reached at dnazer on Mozilla email or Slack.

OPENLIC-2023-01

Recent CI runs are failing

Recently, CircleCI build-and-push-image jobs have been failing, eg. this one.

The "Initialize gcloud CLI" step fails with ERROR: gcloud crashed (ValueError): No key could be detected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.