Giter VIP home page Giter VIP logo

amiclimate's Introduction

amiclimate

NLP and semantic software and material for managing climate knowledge

Python code for accessing and transforming key climate documents. A refactoring of the (bloated) pyamihtml repository. Will initially have functionality for downloading and parsing

  • IPCC reports
  • IPCC glossary
  • UNFCCC reports (COP, etc.)

This requires bespoke library routines and these are available in a new amilib repository.

This repository will NOT have the complete IPCC or UNFCCC corpus , but will have small exemplars.

tests

Applications are developed as tests, and then factored into command

amiclimate's People

Contributors

petermr avatar

Watchers

 avatar  avatar

amiclimate's Issues

Errors in amiclimate

windows 11
python version - 3.12.3, pytest-8.2.0, pluggy-1.5.0

'''

===================================================== short test summary info ======================================================
FAILED test/test_un.py::TestIPCC::test_ipcc_syr_lr_toc - AssertionError: assert 'Chapter 3: M...ng-term goals' == 'SYR Longer Report'
FAILED test/test_un.py::TestIPCC::test_ipcc_syr_lr_toc_full - IndexError: list index out of range
FAILED test/test_un.py::TestIPCC::test_symbol_indir - AssertionError: found 57 elements in C:\Users\asus\Desktop\Semantic\amiclimate\temp\queries\methane_norefs2.html, expected [60, ...
FAILED test/test_un.py::TestIPCC::test_symbolic_xpaths - AssertionError: found 10 elements in C:\Users\asus\Desktop\Semantic\amiclimate\temp\queries\methane_refs1.html
FAILED test/test_un.py::TestUNFCCC::test_make_nested_divs - AssertionError: assert False
================================= 5 failed, 50 passed, 19 skipped, 1 warning in 776.23s (0:12:56) ==================================

Errors Encountered While Running Pytest on amiclimate

PS C:\Users\hp\amiclimate> pytest
======================================================================= test session starts ========================================================================
platform win32 -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
rootdir: C:\Users\hp\amiclimate
collected 84 items

test\test_ipcc.py ..F.s
DevTools listening on ws://127.0.0.1:58046/devtools/browser/9fe9d268-4582-4dfc-9bb5-84c9f7b2790d

DevTools listening on ws://127.0.0.1:58067/devtools/browser/b02a4459-0905-40d8-ba0b-0b7dcb6cc37c
.
DevTools listening on ws://127.0.0.1:58088/devtools/browser/15709526-53cf-40b8-af21-92aca2d54f83

DevTools listening on ws://127.0.0.1:58120/devtools/browser/fda36b2a-8940-4f9d-b9e4-e75d4d450cca

DevTools listening on ws://127.0.0.1:58140/devtools/browser/ebd3e1d8-3d85-437c-95c1-eb8762f3dd00
[34916:11072:0621/104922.847:ERROR:ssl_client_socket_impl.cc(879)] handshake failed; returned -1, SSL error code 1, net_error -3
F......sss..
DevTools listening on ws://127.0.0.1:58174/devtools/browser/418e999e-7056-402c-872e-c0b7b6697dbf

DevTools listening on ws://127.0.0.1:58195/devtools/browser/0b5aa3f9-9e7a-4d05-9403-4a144be81660

DevTools listening on ws://127.0.0.1:58215/devtools/browser/1fe8c61f-3cf8-4ae6-8b3f-d8294f8585f9
.
DevTools listening on ws://127.0.0.1:58236/devtools/browser/d953d593-6b86-47e0-926e-e8d67116756a
.......ss....FF....s.............. [ 63%]
test\test_misc.py ..x...F.. [ 73%]
test\test_unfccc.py .ss.s....s.ssssssssss. [100%]

============================================================================= FAILURES =============================================================================
________________________________________________________________ TestIPCC.test_add_ipcc_hyperlinks _________________________________________________________________

self = <test.test_ipcc.TestIPCC testMethod=test_add_ipcc_hyperlinks>

def test_add_ipcc_hyperlinks(self):
    """resolves dumb links (e.g.
    {WGII SPM D.5.3; WGIII SPM D.1.1}) into hyperllinks
    target relies on SYR being sibling of WGIII, etc)
    The actual markup of the links is horrible. Sometime in spans, sometimes in naked text()
    nodes. Somes the nodes are labelled "refs", sometimes not. The safest way is to try to
    locate the actual text and find the relevant node.
    """

    syr_lr_content = Path(Resources.TEST_RESOURCES_DIR, IPCC_DIR, CLEANED_CONTENT, SYR,
                          SYR_LR, HTML_WITH_IDS_HTML)
    lr_html = ET.parse(str(syr_lr_content), HTMLParser())
    para_with_ids = lr_html.xpath("//p[@id]")
  assert len(para_with_ids) == 206

E assert 1163 == 206
E + where 1163 = len([<Element p at 0x173c1224370>, <Element p at 0x173c12243c0>, <Element p at 0x173c1224410>, <Element p at 0x173c1224460>, <Element p at 0x173c12244b0>, <Element p at 0x173c1224500>, ...])

test\test_ipcc.py:1318: AssertionError
____________________________________________________________ TestIPCC.test_cmdline_download_wg_reports _____________________________________________________________

self = <test.test_ipcc.TestIPCC testMethod=test_cmdline_download_wg_reports>

def test_cmdline_download_wg_reports(self):
    """download WG reports
    output in petermr/semanticClimate
    FAILS TO DOWNLOAD
    """

    inurl = f"{AR6_URL}/"
    outdir = Path(f"{TEMP_DIR}/debug/")
    wg = "wg1"
    chapters = ["chapter-1", SPM, TS]
    # assert indir.exists()
    FileLib.force_mkdir(outdir)
    assert outdir.exists()
    args = [
        "IPCC",
        "--indir", inurl,
        "--outdir", str(outdir),
        "--informat", GATSBY,
        "--chapter", chapters,
        "--report", wg,
        "--operation", IPCCArgs.DOWNLOAD,
        "--kwords", "chapter:chapter",  # for test
        "--debug",
    ]

    wgdir = Path(outdir, wg)
    FileLib.delete_directory_contents(wgdir, delete_directory=True)
    assert not wgdir.exists(), f"{wgdir} should have been deleted"

    ami_climate = AMIClimate()
    ami_climate.run_command(args)
    assert wgdir.exists(), f"{wgdir} should have been created"

    chap1_dir = Path(wgdir, chapters[0])
    assert chap1_dir.exists()
    raw_gatsby_file = Path(chap1_dir, f"{GATSBY_RAW}.html")
    assert raw_gatsby_file.exists()
    htmlx = HtmlUtil.parse_html_lxml(raw_gatsby_file)
    title = htmlx.xpath("/html/head/title")
    assert title
    txt = title[0].text
    print(f"title {txt}")
  FileLib.assert_exist_size(raw_gatsby_file, minsize=300000, abort=True)

test\test_ipcc.py:657:


..\AppData\Roaming\Python\Python312\site-packages\amilib\file_lib.py:438: in assert_exist_size
raise e


cls = <class 'amilib.file_lib.FileLib'>, file = WindowsPath('C:/Users/hp/amiclimate/temp/debug/wg1/chapter-1/gatsby_raw.html'), minsize = 300000, abort = True
debug = True

@classmethod
def assert_exist_size(cls, file, minsize, abort=True, debug=True):
    """asserts a file exists and is of sufficient size
    :param file: file or path
    :param minsize: minimum size
    :param abort: throw exception if fails (not sure what this does)
    :param debug: output filename
    """
    path = Path(file)
    if debug:
        print(f"checking {file}")
    try:
        assert path.exists(), f"file {path} must exist"
      assert (s := path.stat().st_size) > minsize, f"file {file} size = {s} must be above {minsize}"

E AssertionError: file C:\Users\hp\amiclimate\temp\debug\wg1\chapter-1\gatsby_raw.html size = 252498 must be above 300000

..\AppData\Roaming\Python\Python312\site-packages\amilib\file_lib.py:435: AssertionError
----------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------
command: ['IPCC', '--indir', 'https://www.ipcc.ch/report/ar6//', '--outdir', 'C:\Users\hp\amiclimate\temp\debug', '--informat', 'gatsby', '--chapter', ['chapter-1', 'summary-for-policymakers', 'technical-summary'], '--report', 'wg1', '--operation', 'download', '--kwords', 'chapter:chapter', '--debug']
running parse_and_process1 in util?
arg_dict: {'version': False, 'command': 'IPCC', 'debug': True, 'indir': 'https://www.ipcc.ch/report/ar6//', 'outdir': 'C:\Users\hp\amiclimate\temp\debug', 'kwords': 'chapter:chapter', 'chapter': ['chapter-1', 'summary-for-policymakers', 'technical-summary'], 'informat': 'gatsby', 'operation': 'download', 'report': 'wg1'}
home C:\Users\hp
outdir C:\Users\hp\amiclimate\temp\debug output None
input: https://www.ipcc.ch/report/ar6//
debug: True
report: ['wg1']
chapter: ['chapter-1', 'summary-for-policymakers', 'technical-summary']
outdir: C:\Users\hp\amiclimate\temp\debug
output: None
kwargs: {'chapter': 'chapter'}
query: None
xpath: None
indir https://www.ipcc.ch/report/ar6//, input None, output None, outdir C:\Users\hp\amiclimate\temp\debug
chap ['chapter-1', 'summary-for-policymakers', 'technical-summary'], report ['wg1']
downloading from: https://www.ipcc.ch/report/ar6///wg1/
web publisher assumed to be <class 'climate.ipcc.IPCCGatsby'>
Fetching page source from URL: https://www.ipcc.ch/report/ar6///wg1/chapter/chapter-1
no xpath_list specified
no output html
elements in lxml_root: 40
writing C:\Users\hp\amiclimate\temp\debug\wg1\chapter-1\gatsby_raw.html
Quitting the driver...
DONE
//div[contains(@Class, 'col-12')] removes 0 elems
//div[@data-gatsby-image-wrapper]/div[@aria-hidden='true'] removes 0 elems
downloading from: https://www.ipcc.ch/report/ar6///wg1/
web publisher assumed to be <class 'climate.ipcc.IPCCGatsby'>
Fetching page source from URL: https://www.ipcc.ch/report/ar6///wg1/chapter/summary-for-policymakers
no xpath_list specified
no output html
elements in lxml_root: 40
writing C:\Users\hp\amiclimate\temp\debug\wg1\summary-for-policymakers\gatsby_raw.html
Quitting the driver...
DONE
//div[contains(@Class, 'col-12')] removes 0 elems
//div[@data-gatsby-image-wrapper]/div[@aria-hidden='true'] removes 0 elems
downloading from: https://www.ipcc.ch/report/ar6///wg1/
web publisher assumed to be <class 'climate.ipcc.IPCCGatsby'>
Fetching page source from URL: https://www.ipcc.ch/report/ar6///wg1/chapter/technical-summary
no xpath_list specified
no output html
elements in lxml_root: 40
writing C:\Users\hp\amiclimate\temp\debug\wg1\technical-summary\gatsby_raw.html
Quitting the driver...
DONE
//div[contains(@Class, 'col-12')] removes 0 elems
//div[@data-gatsby-image-wrapper]/div[@aria-hidden='true'] removes 0 elems
argstr: None
title Chapter 1: Framing, Context and Methods | Climate Change 2021: The Physical Science Basis
checking C:\Users\hp\amiclimate\temp\debug\wg1\chapter-1\gatsby_raw.html
------------------------------------------------------------------------ Captured log call -------------------------------------------------------------------------
WARNING root:ami_args.py:120 ********** args for parse_and_process1 {'version': False, 'command': 'IPCC', 'debug': [True], 'input': None, 'indir': ['https://www.ipcc.ch/report/ar6//'], 'output': None, 'outdir': ['C:\Users\hp\amiclimate\temp\debug'], 'kwords': ['chapter:chapter'], 'chapter': [['chapter-1', 'summary-for-policymakers', 'technical-summary']], 'informat': ['gatsby'], 'operation': ['download'], 'query': None, 'report': ['wg1'], 'xpath': None}
WARNING C:\Users\hp\amiclimate\climate\ipcc.py:ipcc.py:91 no inputs given
WARNING amilib.ami_args:ami_args.py:203 kwargs_dict {'chapter': 'chapter'}
WARNING C:\Users\hp\amiclimate\climate\ipcc.py:ipcc.py:708 Unknown operation download
__________________________________________________________________ TestIPCC.test_ipcc_syr_lr_toc ___________________________________________________________________

self = <test.test_ipcc.TestIPCC testMethod=test_ipcc_syr_lr_toc>

def test_ipcc_syr_lr_toc(self):
    """analyses contents for IPCC syr longer report
    """
    """
        <!-- TOC (from UNFCCC)-->
        <div class="toc">

            <div>
                <span>Decision</span><span>Page</span></a>
            </div>

            <nav role="doc-toc">
                <ul>
                    <li>
                        <a href="../Decision_1_CMA_3/split.html"><span class="descres-code">1/CMA.3</span><span
                                class="descres-title">Glasgow Climate Pact</span></a>
                    </li>
                   ...
                </ul>
            </nav>
        </div>
    """
    report = 'longer-report'
    syr_lr_content = Path(Resources.TEST_RESOURCES_DIR, IPCC_DIR, CLEANED_CONTENT, SYR,
                          SYR_LR, HTML_WITH_IDS_HTML)
    logger.warning(f"SYR file is: {syr_lr_content}")
    print(f"SYR file is: {syr_lr_content}")
    assert syr_lr_content.exists()
    lr_html = ET.parse(str(syr_lr_content), HTMLParser())
    assert lr_html is not None
    body = HtmlLib.get_body(lr_html)
    header_h1 = body.xpath("div//h1")[0]
    assert header_h1 is not None
    header_h1_text = header_h1.text
    toc_title = "SYR Longer Report"
  assert header_h1_text == toc_title

E AssertionError: assert 'Chapter 3: M...ng-term goals' == 'SYR Longer Report'
E
E - SYR Longer Report
E + Chapter 3: Mitigation pathways compatible with long-term goals

C:\Users\hp\amiclimate\test\test_ipcc.py:1240: AssertionError
----------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------
SYR file is: C:\Users\hp\amiclimate\test\resources\ipcc\cleaned_content\syr\longer-report\html_with_ids.html
------------------------------------------------------------------------ Captured log call -------------------------------------------------------------------------
WARNING test.test_ipcc:test_ipcc.py:1230 SYR file is: C:\Users\hp\amiclimate\test\resources\ipcc\cleaned_content\syr\longer-report\html_with_ids.html
________________________________________________________________ TestIPCC.test_ipcc_syr_lr_toc_full ________________________________________________________________

self = <test.test_ipcc.TestIPCC testMethod=test_ipcc_syr_lr_toc_full>

def test_ipcc_syr_lr_toc_full(self):
    """creates toc recursively for IPCC syr longer report
    """
    filename = HTML_WITH_IDS_HTML
    syr_lr_content = Path(Resources.TEST_RESOURCES_DIR, IPCC_DIR, CLEANED_CONTENT, SYR,
                          SYR_LR, filename)
    lr_html = ET.parse(str(syr_lr_content), HTMLParser())
    body = HtmlLib.get_body(lr_html)
    publisher = IPCCGatsby()
    toc_html, ul = publisher.make_header_and_nav_ul(body)
    level = 0
  publisher.analyse_containers(body, level, ul, filename=filename)

C:\Users\hp\amiclimate\test\test_ipcc.py:1278:


C:\Users\hp\amiclimate\climate\ipcc.py:1377: in analyse_containers
self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\hp\amiclimate\climate\ipcc.py:1395: in add_container_infp_to_tree
self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\hp\amiclimate\climate\ipcc.py:1377: in analyse_containers
self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\hp\amiclimate\climate\ipcc.py:1395: in add_container_infp_to_tree
self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\hp\amiclimate\climate\ipcc.py:1377: in analyse_containers
self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\hp\amiclimate\climate\ipcc.py:1395: in add_container_infp_to_tree
self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\hp\amiclimate\climate\ipcc.py:1377: in analyse_containers
self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\hp\amiclimate\climate\ipcc.py:1395: in add_container_infp_to_tree
self.analyse_containers(h_container, level + 1, ul1, filename=filename)


self = <climate.ipcc.IPCCGatsby object at 0x00000173C01C4B30>, container = <Element div at 0x173c12e5270>, level = 4, ul = <Element ul at 0x173c34b8280>
filename = 'html_with_ids.html', debug = False

def analyse_containers(self, container, level, ul, filename=None, debug=False):
    """Part of ToC making"""
  container_xpath = f".//div[contains(@class,'{self.container_levels[level]}')]"

E IndexError: list index out of range

C:\Users\hp\amiclimate\climate\ipcc.py:1372: IndexError
_________________________________________________________________ MiscTest.test_large_nested_dicts _________________________________________________________________

self = <test.test_misc.MiscTest testMethod=test_large_nested_dicts>

def test_large_nested_dicts(self):
    indir = Path(Resources.TEST_IPCC_DIR, "wg3", "Chapter08", "html")
    infile1 = Path(indir, "page_1.json")
    dict1 = MiscUtil.load_json_from_file(str(infile1))
    dict1_copy = MiscUtil.load_json_from_file(str(infile1))
    assert dict1 == ApproxNestedMapping.nested_approx(dict1_copy)

    infile1_approx = Path(indir, "page_1approx.json")
    dict1_approx = MiscUtil.load_json_from_file(str(infile1_approx))
  assert dict1_approx == ApproxNestedMapping.nested_approx(dict1)

E AssertionError: assert {'annots': []..., 792.0], ...} == approx({'page...'annots': []})
E
E (pytest_assertion plugin: representation of details failed: C:\Users\hp\AppData\Roaming\Python\Python312\site-packages_pytest\python_api.py:264: TypeError: unsupported operand type(s) for -: 'list' and 'list'.
E Probably an object has a faulty repr.)

C:\Users\hp\amiclimate\test\test_misc.py:191: AssertionError
===================================================================== short test summary info ======================================================================
FAILED test/test_ipcc.py::TestIPCC::test_add_ipcc_hyperlinks - assert 1163 == 206
FAILED test/test_ipcc.py::TestIPCC::test_cmdline_download_wg_reports - AssertionError: file C:\Users\hp\amiclimate\temp\debug\wg1\chapter-1\gatsby_raw.html size = 252498 must be above 300000
FAILED test/test_ipcc.py::TestIPCC::test_ipcc_syr_lr_toc - AssertionError: assert 'Chapter 3: M...ng-term goals' == 'SYR Longer Report'
FAILED test/test_ipcc.py::TestIPCC::test_ipcc_syr_lr_toc_full - IndexError: list index out of range
FAILED test/test_misc.py::MiscTest::test_large_nested_dicts - AssertionError: assert {'annots': []..., 792.0], ...} == approx({'page...'annots': []})
================================================= 5 failed, 57 passed, 21 skipped, 1 xfailed in 403.33s (0:06:43) ==================================================

EROR: estUNFCCC.test_explicit_conversion_pipeline_IMPORTANT_CORPUS

python -- version: 3.10.11
windows 11; 64 bit
using cmd

______________________________________________ TestUNFCCC.test_explicit_conversion_pipeline_IMPORTANT_CORPUS ______________________________________________

self = <test.test_un.TestUNFCCC testMethod=test_explicit_conversion_pipeline_IMPORTANT_CORPUS>

@unittest.skipUnless(AmiAnyTest.run_long(), "run occasionally")
def test_explicit_conversion_pipeline_IMPORTANT_CORPUS(self):
    """reads a corpus of 12 sessions and generates split.html for each
    See test_explicit_conversion_pipeline_IMPORTANT_DEFINITIVE(self): which is run for each session document
    """
    sub_top = "unfcccdocuments1"
    in_dir = Path(UNFCCC_DIR, sub_top)
    top_out_dir = Path(UNFCCC_TEMP_DIR, sub_top)

    session_files = FileLib.posix_glob(str(in_dir) + "/*")
    session_dirs = [d for d in session_files if Path(d).is_dir()]
    print(f">session_dirs {session_dirs}")
    assert len(session_dirs) >= 1

    maxsession = 5  # otyherwise runs for ever
    for session_dir in session_dirs[:maxsession]:
      UNFCCC.run_pipeline_on_unfccc_session(
            in_dir,
            session_dir,
            top_out_dir=top_out_dir
        )

C:\Users\priya\amiclimate\test\test_un.py:1771:


cls = <class 'climate.un.UNFCCC'>, in_dir = WindowsPath('C:/Users/priya/amiclimate/test/resources/unfccc/unfcccdocuments1')
session_dir = PurePosixPath('C:\Users\priya\amiclimate\test\resources\unfccc\unfcccdocuments1\CP_20')
in_sub_dir = WindowsPath('C:/Users/priya/amiclimate/test/resources/unfccc/unfcccdocuments1/CP_20')
top_out_dir = WindowsPath('C:/Users/priya/amiclimate/temp/unfccc/unfcccdocuments1'), file_splitter = "span[@Class='Decision']"
targets = ['decision', 'paris', 'wmo', 'temperature'], directory_maker = <class 'climate.un.UNFCCC'>
markup_dict = {'Decision': {'class': 'Decision', 'components': ['', ('Decision', '\d+'), '/', ('type', {'CP|CMA|CMP'}), '\.', ('se...'class': 'para', 'example': ['26. '], 'idgen': {'parent': 'Decision', 'separator': ['_', '__']}, 'level': 2, ...}, ...}
inline_dict = {'adaptation_fund': {'href_template': 'https://unfccc.int/Adaptation-Fund', 'regex': '([Tt]he )?Adaptation Fund'}, 'ar...regex': '([Tt]he )?Conference of the Parties'}, 'date': {'class': 'date', 'example': '2019', 'regex': '20\d\d'}, ...}
param_dict = {'box_as_line_height': 1, 'footer_height': 50, 'footnote_top_line_xrange': [50, 300], 'header_bottom_line_xrange': [20, 700], ...}
styles = ['span.temperature {border: purple solid 0.5px;}', '.chapter {border: blue solid 0.8px; font-weight: bold; background:...eeeeee; opacity: 0.7}', '.subsubpara {border: blue dashed 0.2px; margin: 2px; background: #dddddd; opacity: 0.3}', ...]

@classmethod
def run_pipeline_on_unfccc_session(
        cls,
        in_dir,
        session_dir,
        in_sub_dir=None,
        top_out_dir=None,
        file_splitter=None,
        targets=None,
        directory_maker=None,
        markup_dict=None,
        inline_dict=None,
        param_dict=None,
        styles=None
):
    """
    directory structure is messy
    """

    session = Path(session_dir).stem
    if in_sub_dir is None:
        in_sub_dir = Path(in_dir, session)
    pdf_list = FileLib.posix_glob(str(in_sub_dir) + "/*.pdf")
    print(f"pdfs in session {session} => {pdf_list}")
    if not pdf_list:
        print(f"****no PDFs in {in_sub_dir}")
    subsession_list = [Path(pdf).stem for pdf in pdf_list]
    print(f"subsession_list {subsession_list}")
    if not top_out_dir:
        print(f"must give top_out_dir")
        return
    out_sub_dir = Path(top_out_dir, session)
    skip_assert = True
    if not file_splitter:
        file_splitter = "span[@class='Decision']"  # TODO move to dictionary
    if not targets:
        targets = ["decision", "paris", "wmo", "temperature"]
    if not directory_maker:
        directory_maker = UNFCCC
    if not markup_dict:
        markup_dict = MARKUP_DICT
    if not inline_dict:
        inline_dict = INLINE_DICT
    if not param_dict:
        param_dict = UNFCCC_DICT
    if not styles:
        styles = STYLES
    for subsession in subsession_list:
      HtmlPipeline.stateless_pipeline(
            file_splitter=file_splitter, in_dir=in_dir, in_sub_dir=in_sub_dir, instem=subsession,
            out_sub_dir=out_sub_dir,
            top_out_dir=top_out_dir,
            page_json_dir=Path(top_out_dir, "json"),
            directory_maker=directory_maker,
            markup_dict=markup_dict,
            inline_dict=inline_dict,
            param_dict=param_dict,
            targets=targets,
            styles=styles,
            force_make_pdf=True)

E NameError: name 'HtmlPipeline' is not defined

C:\Users\priya\amiclimate\climate\un.py:621: NameError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------

session_dirs [PurePosixPath('C:\Users\priya\amiclimate\test\resources\unfccc\unfcccdocuments1\CP_20')]
pdfs in session CP_20 => [PurePosixPath('C:\Users\priya\amiclimate\test\resources\unfccc\unfcccdocuments1\CP_20\1_CP_20.pdf'), PurePosixPath('C:\Users\priya\amiclimate\test\resources\unfccc\unfcccdocuments1\CP_20\2_12_CP_20.pdf')]
subsession_list ['1_CP_20', '2_12_CP_20']

Strange a[@id] in Captions

The captions in SYR/LongerReport (and maybe elsewhere) have a large number of a[@id="_idIndexMarkerddddd"] with no obvious purpose.

<p class="Caption" lang="en-US">
<span class="CharOverride-3">Figure 2.1: The causal chain from <a id="_idIndexMarker11529">
</a> <a id="_idIndexMarker11530">
</a> emissions to resulting <a id="_idIndexMarker11531">
</a> <a id="_idIndexMarker11532">
</a> <a id="_idIndexMarker11533">
</a> warming of the <a id="_idIndexMarker11534">
</a> climate system. </span>Emissions of GHG have increased rapidly over recent decades <span class="CharOverride-3"> (panel (a))</span>. Global net<a id="_idIndexMarker11535">
</a> <a id="_idIndexMarker11536">
</a> anthropogenic GHG<a id="_idIndexMarker11537">
</a> <a id="_idIndexMarker11538">
</a> emissions include CO<span class="Subscript-body CharOverride-23">2</span> from fossil fuel combustion and industrial processes (CO<span class="CharOverride-25">2</span>-FFI) (dark green); net CO<span class="CharOverride-25">2</span> from land use, land-use change and forestry (CO <span class="Subscript-body _idGenCharOverride-1">2</span>-LULUCF) (green); CH<span class="Subscript-body CharOverride-23">4</span>; N<span class="Subscript-body CharOverride-23">2</span>O; and fluorinated gases (HFCs, PFCs, SF<span class="CharOverride-25">6</span>, NF<span class="CharOverride-25">3</span>) (light blue). These <a id="_idIndexMarker11539">
</a> <a id="_idIndexMarker11540">
</a> emissions have led to increases in the atmospheric concentrations of several GHGs including the three major well-mixed GHGs CO<span class="Subscript-body CharOverride-23">2</span>, CH<span class="Subscript-body CharOverride-23">4</span> and N<span class="Subscript-body CharOverride-23">2</span>O <span class="CharOverride-3"> (panel (b) </span> , annual<a id="_idIndexMarker11541">
</a> values). To indicate their relative importance each subpanel’s vertical extent for CO <span class="Subscript-body _idGenCharOverride-1">2</span>, CH<span class="Subscript-body _idGenCharOverride-1">4</span> and N<span class="Subscript-body _idGenCharOverride-1">2</span>O is scaled to match the assessed individual direct effect (and, in the case of CH<span class="Subscript-body CharOverride-23">4</span> indirect effect via atmospheric chemistry<a id="_idIndexMarker11542">
</a> <a id="_idIndexMarker11543">
</a> impacts on tropospheric ozone) of historical<a id="_idIndexMarker11544">
</a> <a id="_idIndexMarker11545">
</a> emissions on<a id="_idIndexMarker11546">
</a> <a id="_idIndexMarker11547">
</a> temperature change from 1850–1900 to 2010–2019. This estimate arises from an assessment of<a id="_idIndexMarker11548">
</a> <a id="_idIndexMarker11549">
</a> effective <a id="_idIndexMarker11550">
</a> <a id="_idIndexMarker11551">
</a> radiative forcing and <a id="_idIndexMarker11552">
</a> climate sensitivity. The global surface <a id="_idIndexMarker11553">
</a> <a id="_idIndexMarker11554">
</a> temperature (shown as annual anomalies from a 1850–1900<a id="_idIndexMarker11555">
</a> baseline) has increased by around 1.1°C since 1850–1900<span class="CharOverride-3"> (panel (c))</span>. The vertical bar on the right shows the estimated <a id="_idIndexMarker11556">
</a> <a id="_idIndexMarker11557">
</a> temperature <span class="CharOverride-4"> (very likely range) </span> during the warmest multi-century period in at least the last 100,000 years, which occurred around 6500 years ago during the current interglacial period (Holocene). Prior to that, the next most recent warm period was about 125,000 years ago, when the assessed multi-century<a id="_idIndexMarker11558">
</a> <a id="_idIndexMarker11559">
</a> temperature range [0.5°C to 1.5°C] overlaps the observations of the most recent decade. These past warm periods were caused by slow (multi-millennial) orbital variations. Formal detection and <a id="_idIndexMarker11560">
</a> attribution studies synthesise information from<a id="_idIndexMarker11561">
</a> climate models and observations and show that the best estimate is that all the <a id="_idIndexMarker11562">
</a> warming observed between 1850–1900 and 2010–2019 is caused by humans <span class="CharOverride-3"> (panel (d))</span>. The panel shows<a id="_idIndexMarker11563">
</a> <a id="_idIndexMarker11564">
</a> temperature change attributed to: total human influence; its decomposition into changes in GHG concentrations and other human drivers (aerosols, ozone and land-use change (land-use reflectance)); solar and volcanic drivers; and internal<a id="_idIndexMarker11565">
</a> climate variability. Whiskers show <span class="CharOverride-4">likely </span> ranges. <span class="CharOverride-4">{<span class="refs">
<span class="CharOverride-31">WGI SPM A.2.2, WGI Figure SPM.1, WGI Figure SPM.2, WGI TS2.2, WGI 2.1; WGIII Figure SPM.1, WGIII A.III.II.2.5.1</span>
</span>}</span>
</p>```
I think they can be deleted 

Error in running pytest

Git pull and git checkout test1_pmr successfully but then while running pytest, the following error has been coming up since past 2 days:

'''

C:\Users\asus\Desktop\Semantic\amiclimate> pytest
======================================================= test session starts ========================================================
platform win32 -- Python 3.12.3, pytest-8.2.0, pluggy-1.5.0
rootdir: C:\Users\asus\Desktop\Semantic\amiclimate
collected 0 items / 1 error

============================================================== ERRORS ==============================================================
_________________________________________________ ERROR collecting test/test_un.py _________________________________________________
ImportError while importing test module 'C:\Users\asus\Desktop\Semantic\amiclimate\test\test_un.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
....\Internship\Lib\importlib_init_.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
test\test_un.py:24: in
from climate.amix import AMIClimate, REPO_DIR
climate\amix.py:21: in
from climate.ipcc import IPCCArgs
climate\ipcc.py:14: in
from amilib.util import AbstractArgs, Util
E ImportError: cannot import name 'AbstractArgs' from 'amilib.util' (C:\Users\asus\Desktop\Internship\Lib\site-packages\amilib\util.py)
===================================================== short test summary info ======================================================
ERROR test/test_un.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========================================================= 1 error in 7.83s =========================================================

[AssertionError] amiclimate pytest fail

Branch test1_pmr - 6 tests failed

System: Windows 11, Python 3.12.3, amilib 0.1.4

Note: The test session also took around 13 minutes to complete, a benchmark that can be improved.


C:\Users\User\Desktop\sciCli\amiclimate>pytest
=================================================================== test session starts ===================================================================
platform win32 -- Python 3.12.3, pytest-8.2.1, pluggy-1.5.0
rootdir: C:\Users\User\Desktop\sciCli\amiclimate
collected 74 items

test\test_un.py ....s
DevTools listening on ws://127.0.0.1:52700/devtools/browser/530bc3ab-8e9e-47b2-934a-c3abeef9a0bd

DevTools listening on ws://127.0.0.1:52728/devtools/browser/07863b7e-36ea-4d91-8133-b214d06c622d

DevTools listening on ws://127.0.0.1:52751/devtools/browser/e4f7c27c-fe42-4f4c-93ae-c287db50397b

DevTools listening on ws://127.0.0.1:52772/devtools/browser/4a8f4f78-fc19-4dfb-ab65-47764787edf9
......sss..
DevTools listening on ws://127.0.0.1:52801/devtools/browser/b82f4e76-d6da-4903-8605-46cfc4a40838

DevTools listening on ws://127.0.0.1:52820/devtools/browser/430b7e76-e0ae-4bd4-91b7-a9e5fb7e9316

DevTools listening on ws://127.0.0.1:52841/devtools/browser/95baafb9-9c1b-4d7f-870f-6e813f77065a

DevTools listening on ws://127.0.0.1:52864/devtools/browser/88888071-a617-44eb-880a-515913ab09bb

DevTools listening on ws://127.0.0.1:52887/devtools/browser/9dab2476-813a-4b73-bd9e-81ddf73e4596

DevTools listening on ws://127.0.0.1:52906/devtools/browser/3535dee2-c75f-4d2b-95d4-ea2539528aea

DevTools listening on ws://127.0.0.1:52926/devtools/browser/f95a31f7-44f5-43da-bb97-2fe7dc037a0c

DevTools listening on ws://127.0.0.1:52945/devtools/browser/c4982276-5204-403a-8bdb-6dc2f51fd652

DevTools listening on ws://127.0.0.1:52965/devtools/browser/3aa7d266-93c9-4aea-83cb-531375108343
.
DevTools listening on ws://127.0.0.1:52984/devtools/browser/16420088-7498-41f8-a7a1-9ad293e7e586
F............FF................FF..ss.s....s.Fsssssssss..                                                           [100%]

======================================================================== FAILURES =========================================================================
______________________________________________ TestIPCC.test_download_wg_chapter_spm_ts_using_dict_IMPORTANT ______________________________________________

self = <test.test_un.TestIPCC testMethod=test_download_wg_chapter_spm_ts_using_dict_IMPORTANT>

    def test_download_wg_chapter_spm_ts_using_dict_IMPORTANT(self):
        """downlaods all parts of WG reports
        writes:
        gatsby_raw.html
        gatsby_raw.html
        de_gatsby.html
        para_list

        """
        reports = [
            IP_WG1,
            # IP_WG2,
            # IP_WG3,
        ]
        chapters = [
            # SPM,
            # TS,
            "chapter-1",
            # "chapter-2",
            # "chapter-3",
            # "chapter-4",
            # "chapter-5",
            # "chapter-6",
            # "chapter-7",
            # "chapter-8",
            # "chapter-9",
            # "chapter-10",
            # "chapter-11",
            # "chapter-12",
            # "chapter-13",
            # "chapter-14",
            # "chapter-15",
            # "chapter-16",
            # "chapter-17",
            # "chapter-18",
            # "chapter-19",
        ]
        # ipcc_dict = IPCC_DICT.get_ipcc_dict()
        # ar6_url = ipcc_dict.get()
        web_publisher = IPCCGatsby()
        for report in reports:
            wg_url = f"{AR6_URL}{report}/"
            print(f"report: {report}")
            for chap in chapters:
                print(f"chapter: {chap}")
                outdir = Path(TEMP_DIR, report, chap)
                IPCC.download_save_chapter(report, chap, wg_url, outdir=TEMP_DIR, sleep=1)
                raw_outfile = Path(outdir, f"{GATSBY_RAW}.html")
                FileLib.assert_exist_size(raw_outfile, minsize=20000, abort=False)

                gatsby_file = Path(outdir, f"{GATSBY_RAW}.html")
                html_elem = web_publisher.remove_unnecessary_markup(gatsby_file)
                assert html_elem is not None, f"{gatsby_file} should not give None html"
                body = HtmlLib.get_body(html_elem)
>               elems = body.xpath(".//*")
E               AttributeError: 'NoneType' object has no attribute 'xpath'

test\test_un.py:651: AttributeError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
report: wg1
chapter: chapter-1
web publisher assumed to be <class 'climate.ipcc.IPCCGatsby'>
Fetching page source from URL: https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-1
no xpath_list specified
no output html
elements in lxml_root: 40
writing C:\Users\User\Desktop\sciCli\amiclimate\temp\wg1\chapter-1\gatsby_raw.html
Quitting the driver...
DONE
//div[contains(@class, 'col-12')] removes 0 elems
//div[@data-gatsby-image-wrapper]/div[@aria-hidden='true'] removes 0 elems
______________________________________________________________ TestIPCC.test_ipcc_syr_lr_toc ______________________________________________________________

self = <test.test_un.TestIPCC testMethod=test_ipcc_syr_lr_toc>

    def test_ipcc_syr_lr_toc(self):
        """analyses contents for IPCC syr longer report
        """
        """
            <!-- TOC (from UNFCCC)-->
            <div class="toc">

                <div>
                    <span>Decision</span><span>Page</span></a>
                </div>

                <nav role="doc-toc">
                    <ul>
                        <li>
                            <a href="../Decision_1_CMA_3/split.html"><span class="descres-code">1/CMA.3</span><span
                                    class="descres-title">Glasgow Climate Pact</span></a>
                        </li>
                       ...
                    </ul>
                </nav>
            </div>
        """
        report = 'longer-report'
        syr_lr_content = Path(Resources.TEST_RESOURCES_DIR, IPCC_DIR, CLEANED_CONTENT, SYR,
                              SYR_LR, HTML_WITH_IDS_HTML)
        assert syr_lr_content.exists()
        lr_html = ET.parse(str(syr_lr_content), HTMLParser())
        assert lr_html is not None
        body = HtmlLib.get_body(lr_html)
        header_h1 = body.xpath("div//h1")[0]
        assert header_h1 is not None
        header_h1_text = header_h1.text
        toc_title = "SYR Longer Report"
>       assert header_h1_text == toc_title
E       AssertionError: assert 'Chapter 3: M...ng-term goals' == 'SYR Longer Report'
E
E         - SYR Longer Report
E         + Chapter 3: Mitigation pathways compatible with long-term goals

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1224: AssertionError
___________________________________________________________ TestIPCC.test_ipcc_syr_lr_toc_full ____________________________________________________________

self = <test.test_un.TestIPCC testMethod=test_ipcc_syr_lr_toc_full>

    def test_ipcc_syr_lr_toc_full(self):
        """creates toc recursively for IPCC syr longer report
        """
        filename = HTML_WITH_IDS_HTML
        syr_lr_content = Path(Resources.TEST_RESOURCES_DIR, IPCC_DIR, CLEANED_CONTENT, SYR,
                              SYR_LR, filename)
        lr_html = ET.parse(str(syr_lr_content), HTMLParser())
        body = HtmlLib.get_body(lr_html)
        publisher = IPCCGatsby()
        toc_html, ul = publisher.make_header_and_nav_ul(body)
        level = 0
>       publisher.analyse_containers(body, level, ul, filename=filename)

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1262:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1347: in analyse_containers
    self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1365: in add_container_infp_to_tree
    self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1347: in analyse_containers
    self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1365: in add_container_infp_to_tree
    self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1347: in analyse_containers
    self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1365: in add_container_infp_to_tree
    self.analyse_containers(h_container, level + 1, ul1, filename=filename)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1347: in analyse_containers
    self.add_container_infp_to_tree(debug, filename, h_container, level, texts, ul)
C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1365: in add_container_infp_to_tree
    self.analyse_containers(h_container, level + 1, ul1, filename=filename)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <climate.ipcc.IPCCGatsby object at 0x0000018EC5678BF0>, container = <Element div at 0x18ece457110>, level = 4, ul = <Element ul at 0x18ef1aaa3c0>
filename = 'html_with_ids.html', debug = False

    def analyse_containers(self, container, level, ul, filename=None, debug=False):
        """Part of ToC making"""
>       container_xpath = f".//div[contains(@class,'{self.container_levels[level]}')]"
E       IndexError: list index out of range

C:\Users\User\Desktop\sciCli\amiclimate\climate\ipcc.py:1342: IndexError
_______________________________________________________________ TestIPCC.test_symbol_indir ________________________________________________________________

self = <test.test_un.TestIPCC testMethod=test_symbol_indir>

    def test_symbol_indir(self):

        infile = "**/html_with_ids.html"
        outdir = f"{Path(Resources.TEMP_DIR, 'queries')}"
        output = f"{Path(outdir, 'methane_norefs2')}.html"
        query = "methane"

        AMIClimate().run_command(
            ['IPCC', '--indir', "_IPCC_REPORTS", '--input', "_HTML_IDS", '--query', "methane", '--outdir', "_QUERY_OUT",
             "--output", output, '--xpath',
             "_NOREFS"])
>       self.check_output_tree(output, expected=[60,300], xpath=".//a[@href]")

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1053:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test.test_un.TestIPCC testMethod=test_symbol_indir>, output = 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_norefs2.html'
expected = [60, 300], xpath = './/a[@href]'

    def check_output_tree(self, output, expected=None, xpath=None):
        assert xpath, f"must give xpath"
        assert output, f"output cannot be None"
        html_tree = ET.parse(output)
        assert html_tree is not None, f"html_tree is None"
        if expected:
            pp = len(html_tree.xpath(xpath))
            if type(expected) is list and len(expected) ==  2:
>               assert expected[0] <= pp <= expected[1], f"found {pp} elements in {output}, expected {expected}"
E               AssertionError: found 57 elements in C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html, expected [60, 300]
E               assert 60 <= 57

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1317: AssertionError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
command: ['IPCC', '--indir', '_IPCC_REPORTS', '--input', '_HTML_IDS', '--query', 'methane', '--outdir', '_QUERY_OUT', '--output', 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_norefs2.html', '--xpath', '_NOREFS']
running parse_and_process1 in util?
arg_dict: {'version': False, 'command': 'IPCC', 'debug': False, 'input': '_HTML_IDS', 'indir': '_IPCC_REPORTS', 'output': 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_norefs2.html', 'outdir': '_QUERY_OUT', 'informat': 'PDF', 'query': 'methane', 'xpath': '_NOREFS'}
home C:\Users\User
outdir C:\Users\User\Desktop\sciCli\amiclimate\temp\queries output C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html
empty list from C:/Users/User/Desktop/sciCli/amiclimate/temp/queries/C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html
inputs: 33 > [PurePosixPath('C:/Users/User/Desktop/sciCli/amiclimate/test/resources/ipcc/cleaned_content\\sr15\\Chapter01\\html_with_ids.html'), PurePosixPath('C:/Users/User/Desktop/sciCli/amiclimate/test/resources/ipcc/cleaned_content\\sr15\\Chapter02\\html_with_ids.html'), PurePosixPath('C:/Users/User/Desktop/sciCli/amiclimate/test/resources/ipcc/cleaned_content\\sr15\\Chapter03\\html_with_ids.html')]...
debug: True
report: None
chapter: None
outdir: C:\Users\User\Desktop\sciCli\amiclimate\temp\queries
output: C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html
kwargs: {}
query: ['methane']
xpath: //p[@id and not(ancestor::*[@id='references'])]
wrote: C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html
argstr: None
-------------------------------------------------------------------- Captured log call --------------------------------------------------------------------
WARNING  amilib.html_args:html_args.py:24 creating HTML Args
WARNING  amilib.pdf_args:pdf_args.py:81 creating PDFArgs
WARNING  root:ami_args.py:120 ********** args for parse_and_process1 {'version': False, 'command': 'IPCC', 'debug': [False], 'input': ['_HTML_IDS'], 'indir': ['_IPCC_REPORTS'], 'output': ['C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_norefs2.html'], 'outdir': ['_QUERY_OUT'], 'kwords': None, 'chapter': None, 'informat': 'PDF', 'operation': None, 'query': ['methane'], 'report': None, 'xpath': ['_NOREFS']}
______________________________________________________________ TestIPCC.test_symbolic_xpaths ______________________________________________________________

self = <test.test_un.TestIPCC testMethod=test_symbolic_xpaths>

    def test_symbolic_xpaths(self):

        infile = str(
            Path(Resources.TEST_RESOURCES_DIR, 'ipcc', 'cleaned_content', 'wg1', 'Chapter02', 'html_with_ids.html'))
        outdir = f"{Path(Resources.TEMP_DIR, 'queries')}"
        query = "methane"

        output = f"{Path(outdir, 'methane_refs1')}.html"
        AMIClimate().run_command(
            ['IPCC', '--input', infile, '--query', query, '--output', output, '--xpath', "_REFS"])
>       self.check_output_tree(output, expected=7, xpath=".//a[@href]")

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1034:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test.test_un.TestIPCC testMethod=test_symbolic_xpaths>, output = 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_refs1.html'
expected = 7, xpath = './/a[@href]'

    def check_output_tree(self, output, expected=None, xpath=None):
        assert xpath, f"must give xpath"
        assert output, f"output cannot be None"
        html_tree = ET.parse(output)
        assert html_tree is not None, f"html_tree is None"
        if expected:
            pp = len(html_tree.xpath(xpath))
            if type(expected) is list and len(expected) ==  2:
                assert expected[0] <= pp <= expected[1], f"found {pp} elements in {output}, expected {expected}"
            else:
>               assert pp == expected, f"found {pp} elements in {output}"
E               AssertionError: found 10 elements in C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_refs1.html
E               assert 10 == 7

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1319: AssertionError
------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------
command: ['IPCC', '--input', 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\ipcc\\cleaned_content\\wg1\\Chapter02\\html_with_ids.html', '--query', 'methane', '--output', 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_refs1.html', '--xpath', '_REFS']
running parse_and_process1 in util?
arg_dict: {'version': False, 'command': 'IPCC', 'debug': False, 'input': 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\ipcc\\cleaned_content\\wg1\\Chapter02\\html_with_ids.html', 'output': 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_refs1.html', 'informat': 'PDF', 'query': 'methane', 'xpath': '_REFS'}
home C:\Users\User
outdir None output C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_refs1.html
inputs: 1 > [PurePosixPath('C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\ipcc\\cleaned_content\\wg1\\Chapter02\\html_with_ids.html')]...
debug: True
report: None
chapter: None
outdir: None
output: C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_refs1.html
kwargs: {}
query: ['methane']
xpath: //p[@id and ancestor::*[@id='references']]
wrote: C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_refs1.html
argstr: None
-------------------------------------------------------------------- Captured log call --------------------------------------------------------------------
WARNING  amilib.html_args:html_args.py:24 creating HTML Args
WARNING  amilib.pdf_args:pdf_args.py:81 creating PDFArgs
WARNING  root:ami_args.py:120 ********** args for parse_and_process1 {'version': False, 'command': 'IPCC', 'debug': [False], 'input': ['C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\ipcc\\cleaned_content\\wg1\\Chapter02\\html_with_ids.html'], 'indir': None, 'output': ['C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\temp\\queries\\methane_refs1.html'], 'outdir': None, 'kwords': None, 'chapter': None, 'informat': 'PDF', 'operation': None, 'query': ['methane'], 'report': None, 'xpath': ['_REFS']}
____________________________________________________________ TestUNFCCC.test_make_nested_divs _____________________________________________________________

self = <test.test_un.TestUNFCCC testMethod=test_make_nested_divs>

    def test_make_nested_divs(self):
        """IMPORTANT not finished"""
        """initial div files are 'flat' - all divs are siblings, Use parents in markup_dict to assemble
        """
        input_dir = Path(UNFCCC_DIR, "unfcccdocuments1", "CMA_3")
        infile = Path(input_dir, "1_4_CMA_3_section", f"normalized.sections.html")
>       assert str(infile).endswith(
            "test/resources/unfccc/unfcccdocuments1/CMA_3/1_4_CMA_3_section/normalized.sections.html")
E       AssertionError: assert False
E        +  where False = <built-in method endswith of str object at 0x0000018E85109E70>('test/resources/unfccc/unfcccdocuments1/CMA_3/1_4_CMA_3_section/normalized.sections.html')
E        +    where <built-in method endswith of str object at 0x0000018E85109E70> = 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\unfccc\\unfcccdocuments1\\CMA_3\\1_4_CMA_3_section\\normalized.sections.html'.endswith
E        +      where 'C:\\Users\\User\\Desktop\\sciCli\\amiclimate\\test\\resources\\unfccc\\unfcccdocuments1\\CMA_3\\1_4_CMA_3_section\\normalized.sections.html' = str(WindowsPath('C:/Users/User/Desktop/sciCli/amiclimate/test/resources/unfccc/unfcccdocuments1/CMA_3/1_4_CMA_3_section/normalized.sections.html'))

C:\Users\User\Desktop\sciCli\amiclimate\test\test_un.py:1593: AssertionError
================================================================= short test summary info =================================================================
FAILED test/test_un.py::TestIPCC::test_download_wg_chapter_spm_ts_using_dict_IMPORTANT - AttributeError: 'NoneType' object has no attribute 'xpath'
FAILED test/test_un.py::TestIPCC::test_ipcc_syr_lr_toc - AssertionError: assert 'Chapter 3: M...ng-term goals' == 'SYR Longer Report'
FAILED test/test_un.py::TestIPCC::test_ipcc_syr_lr_toc_full - IndexError: list index out of range
FAILED test/test_un.py::TestIPCC::test_symbol_indir - AssertionError: found 57 elements in C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_norefs2.html, expected [60, 300]
FAILED test/test_un.py::TestIPCC::test_symbolic_xpaths - AssertionError: found 10 elements in C:\Users\User\Desktop\sciCli\amiclimate\temp\queries\methane_refs1.html
FAILED test/test_un.py::TestUNFCCC::test_make_nested_divs - AssertionError: assert False
================================================== 6 failed, 51 passed, 17 skipped in 828.86s (0:13:48) ===================================================

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.