Giter VIP home page Giter VIP logo

Comments (8)

jonkatz6 avatar jonkatz6 commented on June 26, 2024

using the url above the library did not work.
using the url: https://www.sec.gov/Archives/edgar/data/0001365135/000155837021005716/wu-20210331x10q.htm worked but did not result in a found DocumentPeriodEndDate.

using the url for the xbrl itself : https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml did work to find the DocumentPeriodEndDate

inst: XbrlInstance = xbrlParser.parse_instance('https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml')
for i in inst.facts:
    if i.concept.name == 'DocumentPeriodEndDate':
        print(i.concept)
        print(i.value)
out:
DocumentPeriodEndDate
2021-03-31

Sorry if you were looking for any information as to why, but I hope this helps.

from py-xbrl.

mrx23dot avatar mrx23dot commented on June 26, 2024

Interesting the xml version extracts the DocumentPeriodEndDate:
https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml

But the ixbrl original htm doesn't:
https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q.htm

Sounds like a parsing error.
manusimidt told me we should prefer ixbrl htm (original filling) over the SEC extracted xml.

from py-xbrl.

manusimidt avatar manusimidt commented on June 26, 2024

Yes, looks like a parsing error. The iXBRL Instance Document certainly contains the DocumentPeriodEndDate fact.

<ix:nonNumeric format="ixt:datemonthdayyearen" 
  contextRef="Duration_1_1_2021_To_3_31_2021_Pi9QpSqr-0e0RF1F9GgqSg" 
  name="dei:DocumentPeriodEndDate" 
  id="Narr_VydiiUz0MUOCCrLq-p-Mpw">
  <b style="font-weight:bold;">
    March 31, 2021
  </b>
</ix:nonNumeric>

I think the fact is not parsed because it contains additional HTML Elelemts (the bold tag).

from py-xbrl.

manusimidt avatar manusimidt commented on June 26, 2024

Yes, this is the issue:

py-xbrl/xbrl/instance.py

Lines 416 to 417 in a2aca03

if fact_elem.text is None or len(fact_elem.text.strip()) == 0:
continue

I will implement a fix.

from py-xbrl.

mrx23dot avatar mrx23dot commented on June 26, 2024

In bs4 we have these:
.text is recursive (what we want), there should be an equivalent for it in etree
.string is only for one given item (wouldn't go into bold tag)

from py-xbrl.

manusimidt avatar manusimidt commented on June 26, 2024

Yes, thats correct.
With bs4 it is really easy to extract the text recursively for the given element.
I could not find any equivalent for it in etree. Please let me know if you find a solution.

I am currently implementing a function that extracts the text recursively but i don't know if that is the best way of doing.

from py-xbrl.

mrx23dot avatar mrx23dot commented on June 26, 2024

Doc says:
xml.etree.ElementTree.tostring(element, encoding="us-ascii", method="xml", *, short_empty_elements=True)

"Generates a string representation of an XML element, including all subelements."
-> so it should be recursive too, might need playing with parameters.

*short_empty_elements is from v3.4

from py-xbrl.

gety9 avatar gety9 commented on June 26, 2024

@mrx23dot @jonkatz6

using the url above the library did not work. using the url: https://www.sec.gov/Archives/edgar/data/0001365135/000155837021005716/wu-20210331x10q.htm worked but did not result in a found DocumentPeriodEndDate.

using the url for the xbrl itself : https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml did work to find the DocumentPeriodEndDate

inst: XbrlInstance = xbrlParser.parse_instance('https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml')
for i in inst.facts:
    if i.concept.name == 'DocumentPeriodEndDate':
        print(i.concept)
        print(i.value)
out:
DocumentPeriodEndDate
2021-03-31

Sorry if you were looking for any information as to why, but I hope this helps.

and

Interesting the xml version extracts the DocumentPeriodEndDate: https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml

But the ixbrl original htm doesn't: https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q.htm

Sounds like a parsing error. manusimidt told me we should prefer ixbrl htm (original filling) over the SEC extracted xml.

guys, do you still have parser working for ixblr urls?

i am using py-2.0.7 and
https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q_htm.xml (xblr) works

https://www.sec.gov/ix?doc=/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q.htm (ixblr url with ? = characters) doesn't work at all

PermissionError: [Errno 1] Operation not permitted

but and https://www.sec.gov/Archives/edgar/data/1365135/000155837021005716/wu-20210331x10q.htm (ixblr clean url) does not work at all

ParseError: not well-formed (invalid token): line 9, column 1106

it worked for you at least particially (without DocumentPeriodEndDate), so i wonder what's the reason.

from py-xbrl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.