Giter VIP home page Giter VIP logo

fbopen's Introduction

FBOpen

FBOpen is an open API server, data import tools, and sample apps to help small businesses search for opportunities to work with the U.S. government.

The project began as an attempt to make it easier to search the content of FBO.gov, the U.S. government's system of record for opportunities to do business with the government. We downloaded the (XML) data from FBO's weekly data dump of opportunity listings, and loaded it into a Elasticsearch search server. Then we used a primitive crawler to download listings' attachments and load them into Elasticsearch -- something Elasticsearch makes easy thanks to their [Mapper Attachments Type] (https://github.com/elasticsearch/elasticsearch-mapper-attachments) plugin.

Underneath the Google-style query page (/sample-www), we built a simple REST API (really a thin layer over Elasticsearc's API) so you can build your own query tools.

Then someone realized we didn't have to limit this server to FBO data. There's a second sample data loader that can be used to load data nightly from grants.gov, and the API allows you to post opportunities, too. Many more data loaders are on their way.

As of 2014-03-12, the project was live at https://fbopen.gsa.gov. However in early 2017, the service was phased out.

Quickstart (OSX-only) (experimental)

This gets you a minimum viable setup:

$ cd fbopen
$ FBOPEN_ROOT=~/your/root/to/fbopen ./inital-dev-setup.sh

To clean out any new files created from that script, as well as uninstall Elasticsearch, you can run:

$ ./initial-dev-uninstall.sh

How to get started (manually)

  • Clone this repo.
  • This repo has an external dependency on another git repo, which needs to be populated at first, so cd to the repo and run: git submodule update --init --recursive.
  • Then install Elasticsearch. FBOpen requires at least version 1.2.
  • Get the API server up and running. See the README.md in /api.
  • Load data into the search index using the import tools in /loaders -- or roll your own, or use the API's POST /v0/opp to post opportunities one at a time (POST functionality is temporarily disabled).
  • To run a simple query web page, try the sample app in /sample-www.
    • A quick and easy way to access this page at localhost, provided you have Python installed, is to cd to the /sample-www directory and run: python -m SimpleHTTPServer. By default, you'll then be able to access the client at http://localhost:8000

Examples

  • You can add an FBOpen query to any HTML page by just copying and pasting a snippet of JavaScript with our FBOpen Widget Maker.
  • The BusinessUSA PIF team coded up a sample form specifically geared toward submitting or tweaking SBIR solicitations in FBOpen. The relevant code can be found here: https://github.com/GSA-OCSIT/hyabusa. Hyabusa is a test-bed Rails 4 app, and includes several other mini-applications, so look for the SbirSolicitationsController and related views.
  • One of the BusinessUSA PIF's also coded up a sample site to showcase how SBIR.gov could function if FBOpen were the backend data source for the solicitation listings. That repo, as of this writing very much a work in progress, can be found here: https://github.com/arowla/sbiropen. This is a Python app built with the Flask microframework.

Caveat

This project is brand new and very incomplete. No guarantees of data completeness or functionality are implied or should be assumed. There is lots to do!

Who

FBopen is a joint project of 18F, the Presidential Innovation Fellowship, and the GSA Integrated Award Environment.

Public domain

This project is in the worldwide public domain. As stated in CONTRIBUTING:

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

fbopen's People

Contributors

aaronsnow avatar adelevie avatar annalee avatar arowla avatar edsu avatar gbinal avatar jroo avatar kaitlin avatar konklone avatar leahbannon avatar meiqimichelle avatar noahkunin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fbopen's Issues

Don't require an API Key

Usability would be seriously improved if you didn't require an API key for requests under maybe 1,000/day per IP. As it stands, I can't do anything entirely client side right now which sucks. Plus I don't want to have to sign up just to see if I would want to use it.

OPTIONS request to api.data.gov takes a long time

When loading a search at fbopen.gsa.gov, the query performs:

  • OPTIONS HTTP request to url
  • GET HTTP request to url

These two occur in series and the UI doesn't render the search results until the GET request is fulfilled.

In testing (from the west coast), the OPTIONS request can take a long time (anecdotally, 650ms).

Two questions:

  • Why does api.data.gov take such a long time to respond to the OPTIONS request?
  • Is the OPTIONS request necessary given that the API documentation states that the GET request is supported?

Fix highlighting (esearch)

Highlighting is returning just a truncated snippet of the field where there are matches. This can be configured to return longer or more snippets.

We might also be able to return snippets from matching attachments...? That would be cool, to help people figure out why a given result is in the set.

Term Dictionary

Is there a reference document for all the returned records? There are some strange things that I get, like:

FBO_CLASSCOD
FBO_NAICS
solnbr
solnbr_ci
FBO_ARCHDATE_dt
FBO_SETASIDE
_version_
score
highlights

What do these mean?

Improve character encoding in JSON

Using the following JSON calls in Python, you get a lot of strangely escaped text data:

data = requests.get("https://api.data.gov/gsa/fbopen/v0/opps?q=software+development&api_key=JOE_API_KEY")
a = json.loads(data.text, encoding="utf-8")
print json.dumps(a, indent=4)

Newline characters and HTML-escaped characters are still present in the JSON text, despite adjusting for the proper encoding.

Scrape grants.gov attachments

We'll need to figure out how to do this, as there doesn't currently exist a scraper (@aaronsnow am I missing one?). But it should be fairly easy to fit into the new attachment retrieval framework (esearch branch).

Finish implementing query params

The esearch branch does not yet support all of the query parameters that are supported in master. We need to finish converting them all to Elasticsearch-ese.

Make field names more sanely named, consistent with FBOpen frontend

This might seem nitpicky, but I think FBOpen is a good opportunity to get away from the crazily abbreviated fields of fbo.gov (i.e. CLASSCOD, solnbr, offadd). Renaming these to "class_code", "solicitation_number" and "office_address" would just make it more readable to people who haven't already used fbo.gov a bunch. It seems like someone had this same thought when they used "Opportunity Number" on the FBOpen frontend site instead of "solnbr".

Weird facet results

Trying to do a search with Lucene query syntax, eg:

https://api.data.gov/gsa/fbopen/v0/opps?q=data_type:opp+AND+agency:NASA&api_key=

Gets me a super strange facet list result with what looks like an undocumented return.

Strip HTML

Not sure how I feel about this, but there could be value in stripping HTML from the results, especially since I've noticed some inline CSS that could mess things up as well as tags like <highlight> which I guess someone thought was HTML at some point?

Load AWARD Notices

We still need to figure out how we are going to treat awards. My initial thought is to have them be a different doc type in Elasticsearch.

But this will likely allow us to do some cool things, such as link awards to usaspending.gov. (See #16.)

Feature idea: usaspending.gov / itdashboard.gov integration

Not sure if this is the right place to request/suggest new features but linking out to usaspending.gov and/or itdashboard.gov or even better showing basic stats/figures from those sites on this new site would seem beneficial.

I'm a developer so let me know if this sounds like a good idea and point me in the right direction of where to begin work in this repo if possible.

Bury NAICS filtering until we get it (and related, better solutions) working

NAICS filtering is definitely not working right.

via @cjoh:

https://fbopen.gsa.gov/?q=web+design&data_source=&naics=541430&parent_only=&p=

Definitely not the results I'd hoped for. I added 541430 to an empty NAICS parameter up top. I think one thing you may want to do with NAICS is search for the original NAICS code the client asked for first. If that returns no results, I'd search within the 3 digit realm of that NAICS code (in this case 541). And if that returns no results, then don't return any results. If a NAICS code parameter is sent in, it really should limit scope.

One thing I learned from working at Jeeves is that it's better to return no results than it is bad results. No results means, to a user, change your query. Bad results means "slog through this stuff."

I modified this query to have a wildcard query string and return everything from this NAICS code (Graphic Design Services: http://www.naics.com/naics-code-description/?code=541430).

https://fbopen.gsa.gov/?q=*:*&data_source=&naics=541430&parent_only=&p=

This returns some super obvious false positives, such as "ACCUMULATOR, HYDRAUL (GAMBLE)", with a NAICS code of:

NAICS Code:
336 -- Transportation Equipment Manufacturing/336413 -- Other Aircraft Parts and Auxiliary Equipment Manufacturing

Obviously, this needs to be fixed or, if it can't be fixed, disabled. I think our first line of attack in the short term should just be to not add that parameter to the query string.

Surface the searches people are running

Per @cjoh:

expose search data so you can say to COs "more people are searching for 'Web Design' than they are 'Information Discovery Portal System Artistic Services'" in a scalable way. At the same time, it may help some younger companies like ours learn the language of government, and fine-tune our searches.

iOS devices: search appears to hang

Even on fast WiFi, an iPhone 5s doesn't appear to be loading search results; there is continuous network activity but no results are loaded/displayed. (iOS Simulator on a MBP does not suffer this problem.)

Secure API key exchanges

The documentation implies that POST functionality is coming, which means that API tokens will soon give upgraded privileges. API tokens should not be passed as GET parameters in the URL.

Recommend passing the API token in the header only in an authentication call that returns a session token, then subsequently passing only the session token.

API Parameter: limit

As an API user, I would like the ability to specify a limit on the amount of search results. For example, I'd like to get 200 results rather than just the default amount. Would probably cap the upper limit to 400 or 500 results.

Example API Call: Search for software opportunities, return up to 200 results

GET HTTP/1.1 http://api.data.gov/gsa/fbopen/v0/opps?q=software&limit=200&api_key=YOUR_KEY

Create Vagrant / Puppet configuration

Create a simple way to get a working dev environment up using vagrant with puppet provisioning over virtualbox. Will use 32-bit virtualbox image (Ubuntu 12.04 LTS) for universal support.

SSL by default

api.data.gov supports SSL, would be awesome if your docs presented that as the default.

Add "Loading" when waiting for search results

If you're on a bandwidth constrained or latency constrained device or network, it can take a while for the results to come back from api.data.gov.

Propose adding a Loading message and spinner in the results area that is shown until the search results return and can be rendered.

Page number does not clear on new searches

When a user navigates to a higher page result then runs another another search. The p=# does not reset. So, if the new search results has fewer pages the results page is blank.

Example

First Search

Go to page 8 of result set for term "data"
https://fbopen.gsa.gov/?q=data&show_noncompeted=on&data_source=grants.gov&naics=&parent_only=&p=8

Second Search

Do a new search for "statisitcs"
https://fbopen.gsa.gov/?q=statistics&show_noncompeted=on&data_source=grants.gov&naics=&parent_only=&p=8

image

Solution

remove the # for p=

https://fbopen.gsa.gov/?q=statistics&show_noncompeted=on&data_source=grants.gov&naics=&parent_only=&p=

website: filter by full and/or partial NAICS code

This is really two features:

  1. Filter by full NAICS code or by first two or three digits of NAICS code (e.g., 541_)
  2. When filtering by a full 6-digit NAICS code (e.g., naics=541430), if the query returns no results, fall back to searching for results that match the first three digits (e.g., 541_).

Suggested by @cjoh

Note that this is already supported by the API, just not the website. E.g.:

Exact NAICS match on 541712:
http://api.data.gov/gsa/fbopen/v0/opps?q=software+development&fq=FBO_NAICS:541712&api_key=DEMO_KEY

Three-digit NAICS match on 541_:
http://api.data.gov/gsa/fbopen/v0/opps?q=software+development&fq=FBO_NAICS:541_&api_key=DEMO_KEY

Match any NAICS code in the range 541500-541800:
http://api.data.gov/gsa/fbopen/v0/opps?q=software+development&fq=FBO_NAICS:{541500%20TO%20541800}&api_key=DEMO_KEY

Etc.

Add API parameter: agency

I want the ability to limit query results from the API to specific agencies (e.g. Department of the Air Force).

Right now, I specify the query string in quotes

GET http://api.data.gov/gsa/fbopen/v0/opps?q=%22department+of+the+air+force%22&api_key=SOME_KEY

Maybe instead query with all or part of the agency name. Example below queries for solicitations put out by the air force containing the word database

GET http://api.data.gov/gsa/fbopen/v0/opps?q=database&agency=air+force&api_key=SOME_KEY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.