Giter VIP home page Giter VIP logo

search-gov's Introduction

Search-gov Info

Code Status

Build Status Maintainability

Contributing to search-gov

Read our contributing guidelines.

Dependencies

Ruby

Use RVM to install the version of Ruby specified in .ruby-version.

NodeJS

Use NVM to install the version of NodeJS specified in the .nvmrc.

Docker

Docker can be used to: 1) run just the required services (MySQL, Elasticsearch, etc.) while running the search-gov application in your local machine, and/or 2) run the entire search-gov application in a Docker container. Please refer to searchgov-services for detailed instructions on centralized configuration for the services.

When running in a Docker container (option 2 above), the search-gov application is configured to run on port 3100. Required dependencies - (Ruby, NodeJS, Package Manager, Packages, Gems, JavaScript dependencies) - are installed using Docker. However, other data or configuration may need to be setup manually, which can be done in the running container using bash.

Using bash to perform any operations on search-gov application running in Docker container, below command needs to be run in search-services.

$ docker compose run search-gov bash

For example, to setup DB in Docker:

$ docker compose run search-gov bash
$ bin/rails db:setup

The Elasticsearch service provided by searchgov-services is configured to run on the default port, 9200. To use a different host (with or without port) or set of hosts, set the ES_HOSTS environment variable. For example, use following command to run the specs using Elasticsearch running on localhost:9207:

ES_HOSTS=localhost:9207 bundle exec rspec spec

Verify that Elasticsearch 7.17.x is running on the expected port (port 9200 by default):

$ curl localhost:9200
{
  "name" : "002410188f61",
  "cluster_name" : "es7-docker-cluster",
  "cluster_uuid" : "l3cAhBd4Sqa3B4SkpUilPQ",
  "version" : {
    "number" : "7.17.7",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "78dcaaa8cee33438b91eca7f5c7f56a70fec9e80",
    "build_date" : "2022-10-17T15:29:54.167373105Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Package Manager

We recommend using Homebrew for local package installation on a Mac.

Packages

Use the package manager of your choice to install the following packages:

Example of installation on Mac using Homebrew:

$ brew install gcc
$ brew install protobuf
$ brew install java
$ brew install imagemagick
$ brew install [email protected]
$ brew install v8

Example of installation on Linux:

$ apt-get install protobuf-compiler
$ apt-get install libprotobuf-dev
$ apt-get install imagemagick
$ apt-get install default-jre
$ apt-get install default-mysql-client

Gems

Use Bundler 2.3.8 to install the required gems:

$ gem install bundler -v 2.3.8
$ bundle install

Refer to the wiki to troubleshoot gem installation errors.

JavaScript dependencies

Use Yarn to install the required JavaScript dependencies:

$ npm install --global yarn
$ yarn install

Service credentials; how we protect secrets

The app does its best to avoid interacting with most remote services during the test phase through heavy use of the VCR gem.

Run this command to get a valid secrets.yml file that will work for running existing specs:

$ cp config/secrets.yml.dev config/secrets.yml

If you find that you need to run specs that interact with a remote service, you'll need to put valid credentials into your secrets.yml file.

Anything listed in the secret_keys entry of that file will automatically be masked by VCR in newly-recorded cassettes.

Data

Elasticsearch Indexes

You can create the USASearch-related indexes like this:

$ rake usasearch:elasticsearch:create_indexes

You can index all the records from ActiveRecord-backed indexes like this:

$ rake usasearch:elasticsearch:index_all[FeaturedCollection+BoostedContent]

If you want it to run in parallel using Resque workers, call it like this:

$ rake usasearch:elasticsearch:resque_index_all[FeaturedCollection+BoostedContent]

Note that indexing everything uses whatever index/mapping/setting is in place. If you need to change the Elasticsearch schema first, you can 'recreate' or 'migrate' the index:

Recreate an index (for development/test environments)

⚠️ The recreate_index task should only be used in development or test environments, as it deletes and then recreates the index from scratch:

$ rake usasearch:elasticsearch:recreate_index[FeaturedCollection]

Migrate an index (safe for production use)

In production, if you are changing a schema and want to migrate the index without having it be unavailable while the new index is being populated, do this:

$ rake usasearch:elasticsearch:migrate[FeaturedCollection]

Same thing, but using Resque to index in parallel:

$ rake usasearch:elasticsearch:resque_migrate[FeaturedCollection]

MySQL Database

Create and set up your development and test databases:

$ rails db:setup
$ rails db:test:prepare

Tests

Make sure the unit tests, functional and integration tests run:

# Run the RSpec tests
$ rspec spec/

# Run the Cucumber integration tests
$ cucumber features/

# Run the JavaScript tests
$ yarn test

Optionally, to only run Cucumber accessibility tests:

$ cucumber features/ --tags @a11y

The above will call the axe step defined in features/support/hooks.rb for any scenario tagged with the @a11y tag (but not @a11y_wip as these are expected to fail).

Code Coverage

We require 100% code coverage. After running the tests (both RSpec & Cucumber), open coverage/index.html in your favorite browser to view the report. You can click around on the files that have < 100% coverage to see what lines weren't exercised.

Circle CI

We use CircleCI for continuous integration. Build artifacts, such as logs, are available in the 'Artifacts' tab of each CircleCI build.

Code Quality

We use Rubocop for static code analysis. Settings specific to search-gov are configured via .rubocop.yml. Settings that can be shared among all Search.gov repos should be configured via the searchgov_style gem.

Running the app

Search

To run test searches, you will need a working Bing API key. You can request one from Bing, or ask a friendly coworker.

  1. Add the Bing web_subscription_id to config/secrets.yml:
  bing_v7:
    web_subscription_id: *****
  1. Start your local development environment:
bin/dev
  1. Test searches should return results:

Web results

News results

Video results

Creating a new local admin account

Login.gov is used for authentication.

To create a new local admin account we will need to:

  1. Create an account on Login's sandbox environment.
  2. Get the Login sandbox private key from a team member.
  3. Add an admin user to your local app.

1. Login sandbox

Create an account on Login's sandbox environment. This will need to be a valid email address that you can get emails at. You'll receive a validation email to set a password and secondary authentication method.

2. Get the Login sandbox private key

Ask your team members for the current config/logindotgov.pem file. This private key will let your local app complete the handshake with the Login sandbox servers. After adding the PEM file, start or restart your local Rails server.

3. Add a new admin user to your local app

Open the rails console, add a new user with the matching email.

u = User.where(email: '[email protected]').first_or_initialize
u.assign_attributes( contact_name: 'admin',
                     first_name: 'search',
                     last_name: 'admin',
                     default_affiliate: Affiliate.find_by_name('usagov'),
                     is_affiliate: true,
                     organization_name: 'GSA',
                   )

u.approval_status = 'approved'
u.is_affiliate_admin = true
u.save!

You should now be able to login to your local instance of search.gov.

Admin

Your user account should have admin privileges set. Now go here and poke around.

http://localhost:3000/admin

Asynchronous tasks

Several long-running tasks have been moved to the background for processing via Resque.

  1. Visit the resque-web sinatra app at http://localhost:3000/admin/resque to inspect queues, workers, etc.

  2. In your admin center, create a type-ahead suggestion (SAYT) "delete me". Now create a SAYT filter on the word "delete".

  3. Look in the Resque web queue to see the job enqueued.

  4. Start a Resque worker to run the job:

    $ QUEUE=* VERBOSE=true rake environment resque:work

  5. You should see log lines indicating that a Resque worker has processed a ApplySaytFilters job:

resque-workers_1 | *** Running before_fork hooks with [(Job{primary_low} | ApplySaytFilters | [])]

At this point, you should see the queue empty in Resque web, and the suggestion "delete me" should be gone from the sayt_suggestions table.

Queue names & priorities

Each Resque job runs in the context of a queue named 'primary' with priorities assigned at job creation time using the resque-priority Gem. We have queues named :primary_low, :primary, and :primary_high. When creating a new background job model, consider the priorities of the existing jobs to determine where your jobs should go. Things like fetching and indexing all Odie documents will take days, and should run as low priority. But fetching and indexing a single URL uploaded by an affiliate should be high priority. When in doubt, just use Resque.enqueue() instead of Resque.enqueue_with_priority() to put it on the normal priority queue.

(Note: newer jobs inherit from ActiveJob, using the resque queue adapter. We are in the process of migrating the older jobs to ActiveJob.)

Scheduled jobs

We use the resque-scheduler gem to schedule delayed jobs. Use ActiveJob's :wait or :wait_until options to enqueue delayed jobs, or schedule them in config/resque_schedule.yml.

Example:

  1. In the Rails console, schedule a delayed job:

    > SitemapMonitorJob.set(wait: 5.minutes).perform_later

  2. Run the resque-scheduler rake task:

    $ rake resque:scheduler

  3. Check the 'Delayed' tab in Resque web to see your job.

Additional developer resources

Production

Precompile assets

bin/rails assets:precompile

search-gov's People

Contributors

cmdrkeene avatar danswick avatar dawnpm avatar ddzz avatar dependabot[bot] avatar dgsearchdev avatar eriksarnold avatar ethanewing avatar greggersh avatar halahab avatar hhz avatar irrationaldesign avatar jamesisaacs avatar jamesmadhoc avatar jayvirdy avatar jmax-fearless avatar joshuakfarrar avatar krbhavith avatar lorawoodford avatar loren avatar lsamuels-fearless avatar mothonmars avatar nickmarden avatar noremmie avatar ondrae avatar peggles2 avatar rohitcolinrao avatar stevenbarragan avatar tmhammer avatar yogesh27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

search-gov's Issues

Upgrade custom indices to use Elasticsearch 5.6.x

See #3 for background. Once we can separate ES connections for Kibana/Logstash/analytics from ES connections for custom indices, we would like to upgrade our ES server version for custom indices to use ES 5.6.x.

Refactor SERP view code for consistency

We've completed a number of stories recently to add missing sections to various SERP layouts. Our view code is unnecessarily scattered and inconsistent. This story is intended to do two things:

  1. identify the sections/modules that should appear on all SERPs (index, i14y, docs, blended)
  2. consolidate those sections into shared partials, so that the layouts will be consistent

SERP sections:

  • SERP alerts
  • spelling suggestion
  • matching_site_limits
  • best bets
  • medical topics
  • jobs
  • tweets
  • video news items (recent)
  • news items (recent)

if search results are present: (Note: we might reconsider this logic. Why only show the following data if there are search results?)

  • results
  • offer_other_web_results
  • federal register docs
  • news items (older than 5 days, less than 4 months)
  • related searches
  • pagination

This story needs some fleshing out upon review of the various SERPs.

jobs searches should use the USAJOBS api instead of our jobs_api

As a developer
I would like the search-gov code to use the USAJOBs API instead of our custom jobs_api
because USAJOBS API is the official API for that data
and there is no longer a reason for us to maintain a separate API for that.

The jobs_api repo predates the USAJOBs repo, and was only created because there was not an official API.

It's entirely possible that we've added features to jobs_api that are not present in the USAJOBs API. The first step of this story should be to see whether the USAJOBs API meets the needs of the job searches in usasearch, and determine whether we need to nix any functionality.

Additional References:
https://search.gov/manual/govbox-jobs.html
https://search.gov/developer/jobs.html
https://github.com/GSA/search-gov/blob/master/lib/jobs.rb

Using a Collection's custom time scope should not send a searcher back to Page 1 results

This is a case we missed when working on #30. See #47.

Steps:

  • in the site admin center (/sites/<site id>), go to Content (in the left nav menu) > Collections > Add Collection
  • create a new document collection (The URL prefix can be any directory such as http://www.treasury.gov/about/)
  • In the left navigation panel, click on "Display" (/sites/<site id>/display/edit)
  • Under "Faceted Navigation", flip the switch to "on" for your collection and hit save
  • In the left navigation panel, click on "Preview" to visit the SERP page. You should see your collection displayed as a facet tab.
  • From a collection facet, run a search using a custom date range selected in the "Refine your search" options

Expected Results:

  • the original collection should be active

Actual Results:

  • the main SERP facet is active

Example in production:
https://search.usa.gov/search/docs?utf8=%E2%9C%93&affiliate=treas_searchgovreview_mct&sort_by=&dc=7922&query=money

Super Admins should be able to add and view sitemaps for searchgov domains

As a super admin
I would like to be able to add and view sitemaps for a domain on the admin/searchgov_domains page
because sites don't always put their sitemaps in expected places (like https://foo.gov/sitemap.xml) and they don't always list their sitemap locations on their robots.txt file.

Google allows webmasters to submit a sitemap URL in the Google Search Console, so some customers expect that functionality.

To support this, we should add a "Show" link for each searchgov domain on the admin/searchgov_domains page. (See the admin/affiliates page and code for how that should look & function). The "show" view for each searchgov domain should display the following info for that record:

  • domain
  • status
  • urls_count
  • unfetched_urls_count
  • created_at
  • updated_at

It should also include a sub-table that displays the sitemaps. For now, the display should only include the URL and created_at/updated_at columns. The super admin should be able to add a new sitemap url. An example of how that might work is the "Excluded domains" section for Affiliates (admin/affiliates > Edit > Settings):
image
I'm actually not sure if you can put an editable table within the "Show" view of an active scaffold record; it may need to be in an "Edit" view, but either way we need to ensure that the other searchgov domain values are not editable.

ActiveScaffold is the gem used for the super admin views. Its documentation leaves something to be desired, but our own super admin code can be very helpful for reverse-engineering, and there are some how-to's out there on the web.

Because the automagic nature of the active scaffold code makes it difficult to test, this should be tested with a cucumber feature in features/admin.feature.

fix flakey cucumber test: "Add/edit/remove Collection"

The "Add/edit/remove Collection" cucumber test in features/admin_center_manage_content.feature is failing intermittently on CircleCI. Example:
https://circleci.com/gh/GSA/search-gov/250

(::) failed steps (::)

expected to find text "You have removed News and Blog from this site" in "Skip to Main Content Search.gov Select a site Select a site Add Site Send an Idea Need Help? [email protected] Display Name: agency site (agency.gov) Set as my default site Send me today's snapshot as a daily email Stop filtering bot traffic Dashboard Analytics Content Display Preview Activate Manage Content Content Overview Domains Collections Best Bets: Text Best Bets: Graphics Routed Queries RSS YouTube Twitter Flickr CollectionsAdd Collection News and Blog Preview Edit" (RSpec::Expectations::ExpectationNotMetError)
./features/step_definitions/web_steps.rb:112:in `/^(?:|I )should see "([^"]*)"$/'
./features/support/hooks.rb:8:in `block (2 levels) in <top (required)>'
./features/support/hooks.rb:7:in `block in <top (required)>'
features/admin_center_manage_content.feature:263:in `Then I should see "You have removed News and Blog from this site"'

Failing Scenarios:
cucumber features/admin_center_manage_content.feature:229 # Scenario: Add/edit/remove Collection

IIRC I did some investigating of the root cause of this error and it's related to the confirmation modal (something like "are you sure you want to remove this collection?") not being discoverable in the DOM by the Capybara driver...because it simply didn't exist. I think the root cause is that the "click" on the "Delete" button doesn't register with the headless browser.

I have since lost the captured screenshot that proves this, but it's easy to reproduce using the capybara-screenshot gem.

Searchers can click to see more jobs w/o getting a 404

USAJobs updated their search engine and the query syntax changed. In the link below, we send users here (see screendshot):

https://www.usajobs.gov/JobSearch/Search/GetResults?organizationid=GS&PostingChannelID=USASearch&ApplicantEligibility=all

image

The new syntax for the above should be:

https://www.usajobs.gov/Search/?a=gs&hp=public

For agencies with multiple organization codes, the syntax has changed from:
organizationid=GS;HI
To:
a=GS&a=HI

So we need to update the endpoint we're sending people to, and change the parameters that get built in this URL within our jobs module on the serps.
organizationid -> a
ApplicantEligibility -> hp

Acceptance criteria:
[ ] clicking on the link to see more job postings on USAJobs should take me to a list of the relevant agency's job postings that are open to the public on USAJobs.

References:
https://search.gov/manual/govbox-jobs.html

Recover gracefully from failures to send email

As a user, I would like my account not to disappear if my welcome email can't be sent
because the account should still be valid.

A user created an account on 11/16, and the user record was assigned an id, then rolled back, apparently due to a Mandrill failure.

In addition, we should consider sending the verification emails in a background job.

The `fetch` link in Super Admin should enqueue a background job

As a Search.gov developer
I would like all URL fetching to happen via background jobs
because then I can control when urls are fetched (such as in cases where we don't want any fetching to happen during a reindex)
and I can control where fetching happens (because only certain hosts are configured to talk to our Tika servers, and our customers have whitelisted certain hosts).

This involves updating the fetch link for Searchgov urls in the admin/searchgov_domains page to enqueue a background job. Steps:
[ ] create a SearchgovUrlFetcherJob that:

  • uses the searchgov queue
  • takes a single searchgov_url record as a parameter
  • calls searchgov_url.fetch on that record

[ ] update the fetch method in app/controllers/admin/searchgov_urls_controller.rb to enqueue a SearchgovUrlFetcherJob for that record instead of record.fetch:
https://github.com/GSA/search-gov/blob/master/app/controllers/admin/searchgov_urls_controller.rb#L20

[ ] display a flash confirmation message: "Your URL has been added to the fetching queue."

For asynchronous processes, we use ActiveJob, with [Resque](as a backend). To test this locally, you'll need to ensure Redis and Resque are running on your machine:
https://github.com/GSA/search-gov#asynchronous-tasks

[#159246352] routed query keywords & descriptions should have a length validation

https://www.pivotaltracker.com/story/show/159246352

As a site admin
I would like my routed query keywords and description fields to be truncated to 255 characters
so that I don't try to enter values that are too long.

Steps:

  • in the admin center, go to Content > Routed Queries
  • Click "Add Routed Query"
  • Attempt to create a routed query with a description or keyword longer than 255 char

Expected Results:

  • the field is automatically limited to 255 char (we can use the HTML maxlength field attribute for this, i.e. f.text_field :keyword, maxlength: 255

Actual Results:

  • a MySQL error is raised, i.e.:
Data too long for column 'keyword' at row 1: INSERT INTO `routed_query_keywords` (`keyword`, `routed_query_id`, `created_at`, `updated_at`) VALUES ('school safety roles, school safety leaders, school safety partners, federal school safety, federal commission on school safety, school administrator, principal, vice principal, counselor, nurse, faculty, school staff emergency manager, emergency managers, school safety staff law enforcement, first responders, public health, medical, mental health', 412, '2018-07-23 16:26:54', '2018-07-23 16:26:54')

https://usasearch.airbrake.io/projects/8947/groups/2265420575443727016?tab=overview

Done criteria:
[ ] The above bug is fixed via the maxlength limitation on the HTML form field
[ ] A length validation is added at the model level for Routed Query keywords and descriptions

Separate ELK ES client config from custom index ES client config

Right now the ES module is designed in such a way that all accesses to Elasticsearch go to the same cluster endpoint(s).

We would like to be able to send requests related to analytics to one ES endpoint and requests related to our custom document indices (i.e. anything using Indexable) to another ES endpoint. This separation would allow us to upgrade our Elasticsearch usage from ES 1.x to ES 5.x or 6.x independently: first for analytics, then for our custom indices; or vice versa.

Webrat should be removed from our repo

As a developer, I would like webrat to be removed from our repo, so that we are not dependent on a gem that is no longer supported.

The webrat gem is not being actively developed, and it has mostly been replaced by Capybara. We should remove it from our gems and update our tests to use Capybara instead.

font-awesome-grunticon-rails is pointing to a branch that does not exist.

It looks like font-awesome-grunticon-rails is pointing to a branch that does not exist.

Removing the branch name appears to resolve issue.

eddie@TR9HDJ-2AZFH05 04:49:04 ~/Code/search-gov |master ✓| →  bundle install -V
Running `bundle install --verbose` with bundler 1.16.6
Warning: the running version of Bundler (1.16.6) is older than the version that created the lockfile (1.17.2). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.
Found no changes, using resolution from the lockfile
Bundler::PathError: The path `/Users/eddie/.rvm/gems/ruby-2.3.8@searchgov-rails42/bundler/gems/font-awesome-grunticon-rails-8ad9734a65f7` does not exist.
/Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/source/path.rb:198:in `load_spec_files'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/source/git.rb:200:in `load_spec_files'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/source/path.rb:100:in `local_specs'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/source/git.rb:167:in `specs'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/lazy_specification.rb:76:in `__materialize__'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/spec_set.rb:88:in `block in materialize'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/spec_set.rb:85:in `map!'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/spec_set.rb:85:in `materialize'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/definition.rb:204:in `missing_specs'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/definition.rb:209:in `missing_specs?'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/installer.rb:284:in `resolve_if_needed'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/installer.rb:83:in `block in run'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/process_lock.rb:12:in `block in lock'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/process_lock.rb:9:in `open'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/process_lock.rb:9:in `lock'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/installer.rb:72:in `run'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/installer.rb:25:in `install'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/cli/install.rb:65:in `run'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/cli.rb:224:in `block in install'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/settings.rb:136:in `temporary'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/cli.rb:223:in `install'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/vendor/thor/lib/thor/invocation.rb:126:in `invoke_command'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/vendor/thor/lib/thor.rb:387:in `dispatch'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/cli.rb:27:in `dispatch'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/vendor/thor/lib/thor/base.rb:466:in `start'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/cli.rb:18:in `start'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/bin/bundle:30:in `block in <main>'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/lib/ruby/site_ruby/2.3.0/bundler/friendly_errors.rb:124:in `with_friendly_errors'
  /Users/eddie/.rvm/rubies/ruby-2.3.8/bin/bundle:22:in `<main>'
The definition is missing dependencies, failed to resolve & materialize locally (https://github.com/gsa/font-awesome-grunticon-rails (at 8ad9734@8ad9734) is not yet checked out. Run `bundle install` first.)
Found no changes, using resolution from the lockfile

Remove Bing V2 code

Since we no longer use the Bing V2 API, we should remove all of the related code: engines, specs, and feature tests.

SearchgovCrawler should output links with the correct scheme

As a searchgov developer
I would like the SearchgovCrawler to output URLs with the correct scheme
because I don't want to clutter up our database with http links that will just be redirected to https.

HTML links have the correct scheme, because the crawler follows redirections for those links. The crawler does not follow PDF and other doc links, so those are currently output with whatever scheme the URL had in the original link.

Before testing:

  • add a depth_limit to the crawler options in app/models/searchgov_crawler.rb:
    @medusa_opts = {
     ...
      user_agent: user_agent,
      depth_limit: 2
    }

That will prevent you from crawling the entire bja.gov domain. Be sure to remove that line when you are done testing.

Steps:

  • crawl www.bja.gov:
    $ rake searchgov:crawl[www.bja.gov]

  • review the output file (the file path will be included in the task output)

Expected Results:

  • output file includes URLs with https links

Actual results:

  • output file includes:
    http://www.bja.gov/Publications/PERF-Compstat.pdf

Changing a Collection's time scope should not send a searcher back to Page 1 results

As a searcher,
When I change my Collections time scoping from the default "Any Time" to another time option,
I should not start seeing results from page 1.

Given that I'm on a Collection,
I should continue to see results only from within the Collection I was on
And the results should be filtered and sorted based on my specifications.

Background:
Reported by a customer on Feb 7 2018. The customer noticed that dc= was getting stripped away when she changed the time settings on the Consumer Information Collection of ftc_dev:

https://search.usa.gov/search/docs?affiliate=ftc_prod&dc=5047&query=amazon.com


Additional notes from @MothOnMars:

The time filter links should be using the /search/docs route, instead of /search:
Example:

link is: https://search.usa.gov/search?affiliate=ftc_prod&query=amazon.com&sort_by=r&tbs=m
link should be: https://search.usa.gov/search/docs?affiliate=ftc_prod&dc=5047&query=amazon.com&sort_by=r&tbs=m

Unfortunately, even when the correct link is used, it appears that the time filter and sort parameters are not used. The same results are returned for the following two searches, whereas the second search should filter out any results older than a month :

https://search.usa.gov/search/docs?affiliate=ftc_prod&dc=5047&query=amazon.com
https://search.usa.gov/search/docs?affiliate=ftc_prod&dc=5047&query=amazon.com&sort_by=r&tbs=m

So there are actually two bugs here:

  • one in our construction of the filter links for docs
  • one preventing the time filter params from being used in collection searches.

create sitemaps model

We need a standard ActiveRecord model for a new Sitemap class. A sitemap should:

  • belong to :searchgov_domain (which can have many sitemaps)
  • have a non-null, varchar(2000) url database column
    • the url should be unique
    • the url should be read-only
  • have a nullable varchar(255) last_crawl_status column
  • have a nullable datetime last_crawled_at column
  • determine the searchgov_domain it belongs to at the time of creation (see SearchgovUrl#set_searchgov_domain)
  • mix in the Fetchable module, which includes additional validations

The spec should include the line it_should_behave_like 'a record with a fetchable url' to include the default Fetchable tests. This class has a lot in common with the other Fetchable classes: IndexedDocument and SearchgovUrl. Those classes and their specs may be helpful. Try to use Shoulda's simple one-line matchers for standard tests such as validations, etc.

create SearchgovDomainDestroyerJob

As a searchgov developer
I would like searchgov domains and their related searchgov_url records to be destroyed in a background job
because the url record destruction should happen in batches to avoid using too much memory
and because all activity related to the searchgov I14y index should happen on the same server.

Requirements:

  • create a SearchgovDomainDestroyerJob that requires a SearchgovDomain record as a keyword argument: `SearchgovDomainDestroyerJob.perform_later(searchgov_domain: searchgov_domain)
  • it_behaves_like 'a searchgov job'
  • it destroys the SearchgovUrl records in batches:
searchgov_domain.searchgov_urls.find_each |url|
  url.destroy!
end
  • it destroys the SearchgovDomain record

Upgrade search-gov to Rails 5.x

It's time to say farewell to the fours and hello to the fives, in order to keep the codebase current with security patches and the latest Rails features.

The search-gov app currently uses 4.2.10, and we should upgrade either to the latest 5.1.x release (5.1.5 at the time of writing), or 5.2.x if it is released prior to working on this story.

Upgrade to jQuery3

Our jQuery implementations are currently a bit long in the tooth:

It would be great if both of these uses were upgraded to the latest-and-greatest jQuery 3 without any loss of functionality or the introduction of any Javascript errors in the SERP pages of Search.gov customers who embed the SAYT library in their own pages.

SearchgovCrawler should not output links from external domains

https://www.pivotaltracker.com/story/show/158878384

As a searchgov developer
I would like the SearchgovCrawler not to output URLs from external domains
because I don't want to clutter up our database and ES indices with junk I don't need indexed.

Our crawler should output a list of URLs from just the crawled domain. However, if a URL from the original domain is redirected to a different domain, the URL for the new domain will be included in the output. We do not want that to happen.

Before testing:

  • add a depth_limit to the crawler options in app/models/searchgov_crawler.rb:
    @medusa_opts = {
     ...
      user_agent: user_agent,
      depth_limit: 2
    }

That will prevent you from crawling the entire bja.gov domain. Be sure to remove that line when you are done testing.

Steps:

  • crawl www.bja.gov:
    $ rake searchgov:crawl[www.bja.gov,,,0]

  • review the output file (the file path will be included in the task output)

Expected Results:

Actual results:

Tabindex values should equal 0

We were alerted by a customer that our rails serp is failing their accessibility scanner, because we have tabindex greater than 0. Upon investigation, we've learned that tabindex=0 is best practice, any other values are discouraged, and also that our rails serp doesn't have tabindex=0 anywhere.

All verticals of rails SERPs should be checked and tabindex values updated to 0.

search_consumer SERPs can be ignored; see #34.

configure rspec to run specs in random order

Currently, our specs run in the same order each time, which can result in unexpected failures when tests are run out of order. This prevents us from running our tests in parallel (#55), and generally slows down development when unexpected errors occur.

This behavior can be see by running rspec spec/ --order rand.

The default spec_helper.rb file includes information on configuring random ordering:

  # Run specs in random order to surface order dependencies. If you find an
  # order dependency and want to debug it, you can fix the order by providing
  # the seed, which is printed after each run.
  #     --seed 1234
  config.order = :random

  # Seed global randomization in this process using the `--seed` CLI option.
  # Setting this allows you to use `--seed` to deterministically reproduce
  # test failures related to randomization by passing the same `--seed` value
  # as the one that triggered the failure.
  Kernel.srand config.seed

We should:

  • add those configuration options & notes to our spec_helper.rb to ensure specs are run in random order
  • fix any specs that are order-dependent

(Note: One reason I've found for some order-dependency is fixtures that are loaded in one test, and then used by subsequent tests that don't explicitly load those fixtures.)

RssFeedUrl#last_crawl_status value should be truncated before validation

We're seeing a lot of errors in our logs for last_crawl_status error messages that are too long for that column in the database:

Mysql2::Error: Data too long for column 'last_crawl_status' at row 1: UPDATE `rss_feed_urls` SET `last_crawl_status` = 'Mysql2::Error: Data too long for column \'last_crawl_status\' at row 1: UPDATE `rss_feed_urls` SET `last_crawl_status` = \'redirection forbidden: http://www.technology.wv.gov/_layouts/feed.aspx?xsl=1&web=%2Fnews&page=37e69109-afb9-4fb4-b956-bd41f735669e&wp=91e42068-5876-4e0f-8606-2161b12e7530 -> http://technology.wv.gov/_layouts/feed.aspx?xsl=1&web=%2Fnews&page=37e69109-afb9-4fb4-b956-bd41f735669e&wp=91e42068-5876-4e0f-8606-2161b12e7530\', `updated_at` = \'2018-04-17 18:02:33\' WHERE `rss_feed_urls`.`id` = 18790', `updated_at` = '2018-04-17 18:02:33' WHERE `rss_feed_urls`.`id` = 18790

We should truncate that value to 255 characters before validation.

Bug steps:
Attempt to save or create an rss_feed_url record with a last_crawl_status longer than 255 characters

too_long_status = 'x' * 500
RssFeedUrl.create!(rss_feed_owner_type: 'Affiliate',
  url: 'https://www.usa.gov/rss/updates.xml',
  last_crawl_status: too_long_status)

Expected Results:

  • last_crawl_status is truncated to 255 characters, and the record is successfully saved to the database

Actual Results:

  • the above MySQL error is raised

Typeahead suggestions should be readable by a screen reader

Reported by a customer: screen readers are not able to read or use the typeahead search suggestions in their dropdown list as we present it via the javascript. The USA.gov team reviewed it, and proposed the following fix. (They warned us that some assistive tech scanners would continue to flag it as an issue, but it's the only thing they could identify that would let screen reader access the options.)

From David Stenger:
https://haltersweb.github.io/Accessibility/autocomplete.html This example is how it should be coded and how to navigate the field with AT.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.