Giter VIP home page Giter VIP logo

bulk-data's People

Contributors

jnyjny avatar jonquandt avatar llaplant avatar oghaffari avatar smatsushima1 avatar wbushey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bulk-data's Issues

Pre-Introduced XML Bill Text Associated with S. 448 (Introduced on 2/27/17)

Hello, the XML/HTML version of S. 448 is the pre-introduced bill text, and is causing some problems: DTD: https://www.congress.gov/bill/115th-congress/senate-bill/448/text

It looks like the xml file in the bulk data repository is still the pre-introduced text, even though the bill was introduced on 2/27/17: https://www.gpo.gov/fdsys/bulkdata/BILLS/115/1/s/BILLS-115s448is.xml

The PDF version of the bill text has the correct number and introduction date associated with it: https://www.congress.gov/115/bills/s448/BILLS-115s448is.pdf

Is there any way you could update the bill text XML to the introduced text file and not the pre-introduced file? Thank you!

CFR Issues

Hello! My name is Rachel Johns, I'm a product Manager at Fastcase. We have been looking through the CFR XML files and have noticed a few issues. I was hoping you could potentially help us with these?

  1. Issue One: Sections Incorrectly Grouped

For Title 47, 90.603-90.1338 all appear to be incorrectly nested under 60.601. Here is how it appears on Fastcase using the Bulk Data:

https://fc7.fastcase.com/outline/US/402?docUid=164798869

image

Also, 90.623-90.1338 are all incorrectly lumped under 90.621:

image

The source on the ECFR website lists all of these sections after 90.621 correctly:

https://gov.ecfr.io/cgi-bin/text-idx?SID=8bdc0140de420c937355f33e1a46d46d&mc=true&node=pt47.5.90&rgn=div5
image

In 90.621 in Fastcase:

image

Issue 2: All Images are missing

As far as we can tell, all of the images are missing on this source. Examples are below.

Title 49 Source:

image

Title 49 as it appears in Fastcase using Bulk Data

image

The Same problem occurs with math equations:

Source:

image

Code we used based on the Bulk Data:

<P>(g) Calculate the adhesion utilized at each axle as a function of braking ratio using the following equations:</P><MATH>
<MID>ER02FE95.014</MID>
</MATH>

Could you please let us know if these issues are possible to resolve?

Thank you!

ECFR Title 7 - Part 1773 - Subparts Duplicated?

Relatively new at this, so apologies for any phrasing mishaps.

I just downloaded ECFR title 7 from https://www.govinfo.gov/bulkdata/ECFR/title-7/ECFR-title7.xml and have a question regarding Part 1773 - Policy on Audits of RUS Borrowers and Grantees. It seems that the section headers begin at 1773.1 and increase up to 1773.49, as expected. However, there is another tag containing another Subpart A following the close of the previous containing Subpart E. Sections then seem to increase from 1773.1 to 1773.49, again. (See below for example)

Is this expected behavior? If so, is there more markup information that I haven't considered, to determine which set of Subparts is correct?

<DIV8 N="§ 1773.49" NODE="7:12.1.1.1.4.5.1.10" TYPE="SECTION">
<HEAD>§ 1773.49   OMB Control Number.</HEAD>
<P>The information collection requirements in this part are approved by the Office of Management and Budget (OMB) and assigned the OMB Control Number 0572-0095.
</P>
</DIV8>
</DIV6>

<DIV6 N="A" NODE="7:12.1.1.1.4.6" TYPE="SUBPART">
<HEAD>Subpart A - General Provisions</HEAD>
<DIV8 N="§ 1773.1" NODE="7:12.1.1.1.4.6.1.1" TYPE="SECTION">
<HEAD>§ 1773.1   General.</HEAD>

XSL error in billres.xsl isQuoteMustClose template with Saxon 9.9

Hi,

When trying to use the Saxon 9.9 XSLT processor to transform a bill (e.g. https://www.govinfo.gov/content/pkg/BILLS-116hr823rh/xml/BILLS-116hr823rh.xml) using billres.xsl from https://www.govinfo.gov/bulkdata/BILLS/resources, I am getting the following error:

[XPTY0004] A sequence of more than one item is not allowed as the first argument of fn:name() (<subsection>, <subsection>) 
    at .../billres-details.xsl:18833:73

The following change fixes this issue:

-					following-sibling::* and (name(self::*) = name(following-sibling::*)
+					following-sibling::* and (name(self::*) = following-sibling::*/name()

Specifically, if there are more than one siblings following the element, the /name() construct applies the name function to all of those elements. This means that this is checking if any sibling has the same name as the current element.

If the intended behaviour is to only check if the first sibling has the same name as the current element then the following change would be needed:

-					following-sibling::* and (name(self::*) = name(following-sibling::*)
+					following-sibling::* and (name(self::*) = name(following-sibling::*[1])

Kind regards,
Reece

XML files available on Govinfo.gov, not on Congress.gov

Hello, I'm seeing a similar issue to this ticket: #33

There are 92 new bill versions that have the PDF versions on the congress.gov bill pages, but do not have the XML version available that's in the bulk data repository on govinfo.gov. Is this the best place to flag the issue and have the XML versions of the text added to the congress.gov bill pages? Thank you! Listing the bills/versions below that are causing issues.

HR5347 (IH)
HR5399 (IH)
HR5401 (IH)
HR5406 (IH)
HR5407 (IH)
HR5421 (IH)
HR5422 (IH)
HR5424 (IH)
HR5425 (IH)
HR5426 (IH)
S1189 (RS)
S1310 (RS)
S2368 (RS)
S2556 (RS)
S2683 (RS)
S2688 (RS)
S2695 (RS)
S2702 (RS)
S2714 (RS)
S2977 (RS)
S3076 (RS)
HR2744 (RS)
SRES450 (ATS)
S1262 (RS)
S1890 (RS)
S2365 (RS)
S2393 (RS)
S2660 (RS)
HR4183 (RFS)
HR4719 (RFS)
SRES460 (ATS)
SRES461 (ATS)
S893 (RS)
HR4018 (PCS)
SRES456 (ATS)
S1228 (RS)
S2927 (RS)
S2799 (RS)
S2997 (RS)
SRES457 (ATS)
S2108 (RS)
S2399 (RS)
SRES459 (ATS)
SCONRES31 (ATS)
SRES152 (RS)
SRES297 (RS)
SRES343 (RS)
SRES371 (RS)
SRES375 (RS)
SRES395 (RS)
SRES447 (RS)
S876 (RS)
S1611 (RS)
S1739 (RS)
S1830 (RS)
S2425 (RS)
S2508 (RS)
S2547 (RS)
S2657 (RS)
S2668 (RS)
S3051 (RS)
HR4229 (PCS)
SRES260 (RS)
HRES767 (EH)
S1029 (ES)
S1309 (ES)
S1434 (ES)
S1822 (ES)
S1608 (ES)
S2096 (ES)
HRES773 (EH)
S3147 (ES)
HRES772 (EH)
HRES772 (RH)
HCONRES81 (ENR)
HCONRES82 (ENR)
HR1158 (ENR)
HR1865 (ENR)
HR3196 (ENR)
HR5130 (RH)
HR5146 (RH)
HR5430 (RH)
S1790 (ENR)
HR5392 (IH)
HR5393 (IH)
HR5397 (IH)
HR5398 (IH)
HR5415 (IH)
HRES768 (IH)
HRES769 (IH)
HR1865 (EAH)
HRES770 (LTH)

Possible to get nominations as bulk data?

Currently I am scraping congress.gov for nomination information. It seems like there could be a nomination status XML data in the same vein as bills. Is that possible?

Misplaced Appendix in CFRs

I am looking to parse the CFRs and break them down in a tree like structure using the bulk CFR data.

I came across various instances where the provided xml structure doesn't match up with the one presented on govinfo/app

For eg : -
2019 title 2 xml -
Please refer to attached pic -
title-2-xml-appendix-issue

In the above xml, all the Appendix nodes fall under Subpart F - Audit Requirements but on the govinfo/app source they fall under Part 200 - UNIFORM ADMINISTRATIVE REQUIREMENTS, COST PRINCIPLES, AND AUDIT REQUIREMENTS FOR FEDERAL AWARDS

Here's an example Appendix on govinfo/app for your reference.

I have noticed this issue throughout all the titles.

Is this a bug ?
How do I parse this correctly ?

Thanks.

Improve / enhance links to USCode in cases where currently unavailable

When looking at the following bulkdata file:

https://www.gpo.gov/fdsys/bulkdata/BILLS/115/1/hr/BILLS-115hr77ih.xml (view source)

If we look at it, we see two different references to US Code title / sections, but only one has an external link and associated xml tags, while the other does not.

"Section 706 of title 5, United States Code, is amended—" has no link, and no associated xml tag. However, "(44 U.S.C. 3501 note)" does have a link and an associated XML tag.

I would suggest to the Clerk of the House and the Secretary of the Senate that ensuring all references to US Code elements receive links is a good idea and useful for both developers and laymen.

Bills Earlier Than 113 Congress

The Congressional Bills, Bill Status and Bill Summaries only go back to 2013 (the 113th Congress). Are there any plans to add earlier archives of legislation, including their amendments and votes?

Possible to publish (additional) XSL files?

I see see the 'cfr.xml' XSL file to convert the CFR XML into "fancy" HTML. I wonder if there also exists an an XSL to convert the CFR XML into simple 'pre' XML text that you get here, and if so where is it? Or could it be posted here or in the CFR resources?

CFR 2019 Title-3 chapters misplaced

I am looking to parse the CFRs and break them down in a tree like structure using the bulk CFR data.

2019 title 3 xml vol 7 -
In all the other title xmls, all the chapter/subchapter/section data is within the title tag but in case of Title-3, it's misplaced. The title tag contains only the following information -

<TITLE>
    <LRH>Title 3—The President</LRH>
    <RRH>Proclamations</RRH>
</TITLE>

All other titles (eg Title-1) -

<TITLE>
     <LRH>1 CFR Ch. I (1-1-19 Edition)</LRH>
     <RRH>Admin. Comm. of the Federal Register</RRH>
     <CFRTITLE>
           <TITLEHD>
               <PRTPAGE P="1"/>
               <HD SOURCE="HED">Title 1—General Provisions</HD>
           </TITLEHD>
           <CFRTOC>...</CFRTOC>
      </CFRTITLE>
      <CHAPTER>...</CHAPTER>
      <CHAPTER>...</CHAPTER>
      <CHAPTER>...</CHAPTER>
      <CHAPTER>...</CHAPTER>
      <CHAPTER>...</CHAPTER>
      <CHAPTER>...</CHAPTER>
</TITLE>

Here's the concerned docs on govinfo/app for your reference.

How do I parse this correctly ?

Thanks.

Bulk Data XML Pages Loading Very Slow

We have a number of applications that rely on the content of the bulk data XML pages containing US Congressional legislation information. One app in particular normally takes ~20 minutes to run, and did so on Sunday October 20 in the afternoon. Sunday night it took ~2 hours, Monday October 21 (yesterday) it took ~4 hours, and today it has been running for more than six hours and has not completed. Are there any known performance problems with USGPO servers? If not, then can someone please share some insight as to why the exact same code that has been running in 20 minutes or so is now taking 20 times or more longer to obtain your bulk data in XML? Thanks!

Question on knowing the order of revisions of a given HR bill

While browsing: https://www.gpo.gov/fdsys/bulkdata/BILLS/115/1/hr

I see the following entries (subset):

BILLS-115hr70eh.xml 06-Jan-2017 03:32 0.04 M
BILLS-115hr70ih.xml 04-Jan-2017 05:23 0.04 M
BILLS-115hr70rfs.xml 07-Jan-2017 05:00 0.04 M

While via the UI I can guess via the timestamps as to which is newest, there does not appear to be a programatic way to do so. The suffixes (eh, ih, rfs) seem random and arbitrary, and the user guide for BILLS does not indicate what the suffixes mean, or if there's any way to tell what order the revisions are.

I would suppose I could tell by timestamp, but, as mentioned above, that's only visible in the UI of the website. If I were to download the zip file, BILLS-115-1-hr.zip, all of the files have the exact same timestamp with each other.

Even more strange, it is now 11PM EST on Tuesday, January 10th, but all of the files in the BILLS-115-1-hr.zip file that I just downloaded right now have a timestamp of January 11th, 2017, 3:44am, which is in the future for me. So I would imagine your server or build machine that generates the zips is somewhere many hours in the future (England?) or has an incorrect system clock.

It might be reasonable to have your build machine preserve timestamps of the files when generating the zips.

I now see that dc:date2017-01-05</dc:date> is in the XML, however, some files have that tag empty. (See view-source:https://www.gpo.gov/fdsys/bulkdata/BILLS/115/1/hr/BILLS-115hr70eh.xml). Is it fair to assume that an empty tag means 'initial revision' ? Perhaps this should be documented.

Final Question: Once the ammendment process begins, is a new revision generated for each new ammendment? Or is it one new revision / xml file generated per day, regardless of number of ammendments approved?

Missing bills

I noticed today that the bulk data seems to be missing all bills introduced in the House yesterday, September 28th (34 at time of writing). I can't find any of them in BILLSTATUS page, even though actions show up on congress.gov.

I haven't seen the delay in uploading before, is this typical?

House Clerk's "User Guide and Data Dictionary" is outdated

I'm trying to parse data from House and Senate bios from the House Clerk's and Senate Secretary's Member Data sheets (in xml format), and I noticed that the House Clerk User Guide is outdated. For instance, there are now 20 House committees instead of 19 (strangely, the Natural Resources cmte is the one missing from the User Guide), and there are 6 Joint Committees instead of 5.

Who do I talk to about getting this updated?

Link here: http://clerk.house.gov/member_info/MemberData_UserGuide.pdf

Updated Bill Status for H.R. 5515 (NDAA)

Good morning,

The bill actions for NDAA being signed into law are up to date on the congress.gov bill page: https://www.congress.gov/bill/115th-congress/house-bill/5515?q=%7B%22search%22%3A%5B%22hr5515%22%5D%7D&r=1

But the latest file available for actions in the bulk data repository hasn't been updated since August 2: https://www.govinfo.gov/bulkdata/BILLSTATUS/115/hr/BILLSTATUS-115hr5515.xml

Do you know when this will be made available, and what the expected delay time between congress.gov updates and bulk data updates? Thank you!

BillStatus XML: House Amendment to another House Amendment

In https://www.gpo.gov/fdsys/bulkdata/BILLSTATUS/113/hr/BILLSTATUS-113hr152.xml (HR 152, 113th Congress), the xml for House Amendment 4 (HAMDT 4) merges with Senate Amendment 4 (SAMDT 4).

According to https://www.congress.gov/bill/113th-congress/house-bill/152/amendments?pageSort=asc, HAMDT 4 is an amendment to HAMDT 3.

The xml content for the amendment has the correct description and purpose for HAMDT 4, but the data below it is for SAMDT 4 -- before the end-amendment tag for HAMDT 4.

Site maintenance

We will be performing maintenance activities ‎from 10am to 2pm today. As a result, sitemaps users may notice that values for some existing sitemap entries could change by up to an hour.

We apologize for any inconvenience this may cause.

Pretty-format XML and standardize alphabetic order of tag attributes

Hi Again:

I was playing around with bulk data last night. I downloaded two copies of the same HR bill. My goal was to see how the bill changed over time. After downloading both copies, I noticed substantial differences when using a diff-viewer, even in areas of the xml where no substantive change in the content had occurred.

This was primarily due to 1) whitespace differences, and 2) Order of attributes being reversed

A simple example is for BILLS-115hr71eh.xml and BILLS-115hr70rfs.xml . Even after ignoring whitespace changes, the diff still included the following:

  •  <attestation-date date="20170104" chamber="House">Passed the House of Representatives January 4, 2017.</attestation-date>
    
  •  <attestation-date chamber="House" date="20170104">Passed the House of Representatives January 4, 2017.</attestation-date>
    

While an XML Parser might not care about the difference, if one wanted to build (say) a git-tracker of various bills and their revisions, attributes swapping order would become annoying and make diffs difficult to read and full of false positives.

It would be nice if both Clerk of the House and the Secretary of the Senate could agree to use a properly-indented pretty-format such as cat file | xmllint --format - in order to standardize indentation for human readability...

And it would be even better if the order of attributes could be guaranteed to be alphabetical so as to minimize the diffs.

I understand that order of attributes is not significant in XML, and any parser will immediately understand even with swapped attribute order. However, for diff-ing versions of the same bill, it will make the diff-viewer much cleaner and easier to spot what's changed if the order of attributes could be guaranteed to be in alphabetical order.

trying to grab FR images

Hi there, I am trying to grab FR images using image address made available when right clicking the images.

using https://www.federalregister.gov/documents/2018/12/06/2018-26365/airworthiness-directives-the-boeing-company-airplanes as an example I get a url of https://s3.amazonaws.com/images.federalregister.gov/ER06DE18.000/original.png?1543928713 but the image is very dark and when I try to save it to my computer is looks completely black. Are the images available to download somewhere?

COVID -19 Regulations Missing

Hi Team,

We see the COVID-19 Regulation in ECFR Website. Below are the links for reference.

https://gov.ecfr.io/cgi-bin/searchECFR?ob=r&idno=&q1=coronavirus&rgn1=Section&op2=and&q2=&rgn2=Section&op3=and&q3=&rgn3=Section&SID=70fd0c6e7b4e455aee7e767246b099cf&mc=true
https://gov.ecfr.io/cgi-bin/searchECFR?ob=r&idno=&q1=COVID&r=&SID=2a7338cb81f747e3aab8e38266bb4d1f&mc=true

But the same data is not reflected in Bulk Data.
Could you please let us know the earliest date on receiving an update for the titles 29, 17, 9, 42, 44?

XML Markup for FAR 52.204-11 not properly marked as reserved

Hello team GPO! I believe I've found an error in the markup for 48 CFR § 52.204-11 (FAR 52.204-11). Despite it being a reserved clause, it's marked up as a standard (non-reserved clause).

Depending on how you have things organized, it may be easier to access as Volume 2 of title 48 as published in the 2014 CFR.

Steps to reproduce

  1. wget http://www.gpo.gov/fdsys/pkg/CFR-2014-title48-vol2/xml/CFR-2014-title48-vol2-chap1-subchapH.xml
  2. Navigate to line 2336 (or search for the 2nd occurrence of 52.204-11

Expected

The markup for a reserved section should be:

<SECTION>
  <SECTNO>52.204-11</SECTNO>
  <RESERVED>[Reserved]</RESERVED>
</SECTION>

For example, here's § 52.209-8:

<SECTION>
  <SECTNO>52.209-8 </SECTNO>
  <RESERVED>[Reserved]</RESERVED>
</SECTION>

Got

The actual markup for § 52.204-11 is:

<SECTION>
  <SECTNO>52.204-11</SECTNO>
  <SUBJECT>[Reserved]</SUBJECT>
</SECTION>

Notice the 2nd tag in the section is <SUBJECT> not <RESERVED>, subject being used exclusively by non-reserved clauses.

If you look at § 52.204-12 (the next section and a non-reserved section), you'll see <SUBJECT> indicates the title of non-reserved clauses:

<SECTION>
   <SECTNO>52.204-12</SECTNO>
   <SUBJECT>Data Universal Numbering System Number Maintenance.</SUBJECT>
   <P>As prescribed in 4.607(c), insert the following clause:</P>
   <EXTRACT>...</EXTRACT>
   <HD SOURCE="HD3">(End of clause)</HD>
   <CITA>[77 FR 67919, Nov. 20, 2012]</CITA>
 </SECTION>

Use case

If the context is helpful, I noticed the mistake when updating a FAR-parsing Ruby gem (so_far_so_good, to the 2014 CFR (the error does not exist in the 2013 version).

To verify my own parsing, I had a test to confirm no non-reserved clauses (detected by the absence of the <RESERVED> element, had a subject of [RESERVED], which § 52.204-11 did.

Thanks for an awesome service! 🇺🇸 📰 🏢

Misplaced Parts 703-745 in CFR Title 12

I am looking to parse the CFRs and break them down in a tree like structure using the bulk CFR data.

For eg : -
2019 title 12 xml vol 7 -
Please refer to attached pic -
Screen Shot 2020-04-27 at 10 34 19 AM
Screen Shot 2020-04-27 at 10 34 43 AM

In the above xml, all the Parts 703-745 are under Part 702 and within Appendix ?
Here's the concerned docs on govinfo/app for your reference.

How do I parse this correctly ?

Thanks.

XML Files Don't Contain All Versions of a Bill

I've noticed that many of the BILLSTATUS and BILLSUM XML files don't show all versions of a bill.

For example, as of 10-Aug-2016, the collection result https://www.gpo.gov/fdsys/browse/collection.action?collectionCode=BILLS&browsePath=114%2Fhr%2F%5B1000+-+1099%5D&isCollapsed=false&leafLevelBrowse=false&isDocumentResults=true&ycord=0

shows 6 versions of H.R. 1020.
H.R. 1020 (Introduced in House)
H.R. 1020 (Engrossed in House)
H.R. 1020 (Referred in Senate)
H.R. 1020 (Reported in Senate)
H.R. 1020 (Engrossed Amendment Senate)
H.R. 1020 (Enrolled Bill)

However, the file BILLSUM-114hr1020.xml (attached) only shows 5 versions of the bill.
BILLSUM-114hr1020.zip

How can we get all versions of a bill in the related XML files?

Timing of Updates

What time have the bulk updates been being pushed to the site each morning? It seems like they've been moving progressively later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.