Following an upgrade to Pytest 8, we are seeing a change in the way duplicate items ar

Thanks for the bug report <a class="user-mention notranslate" data-hovercard-type="use

Issue with duplicates handling in Pytest 8 about pytest HOT 4 OPEN

nonatomiclabs commented on May 26, 2024

Issue with duplicates handling in Pytest 8

from pytest.

Comments (4)

bluetech commented on May 26, 2024 2

Thanks for the bug report @nonatomiclabs, I agree the new behavior is buggy. I will take a look.

The duplicate handling is a bit of a headache and as you said it was quite broken also before #11646.

from pytest.

bluetech commented on May 26, 2024 1

I started looking at this, but it's surprisingly tricky to come up with a self-consistent, intuitive and reasonably backward-compatible logic for the duplicate handling, if you think about it. Though I'll keep trying :)

from pytest.

ericYuan17 commented on May 26, 2024

Hello! I've been trying to pick at this issue report for a few days now, and I think I understand the recursive "loop" that's called to navigate through directories. However, I'm having a bit of difficulty understanding how ihooks work. Is there a place I could read about this?

from pytest.

bluetech commented on May 26, 2024

First, some explanations of the details that are relevant to duplicate handling, then some thoughts on how it should work.

Collection arguments

The collection arguments are the inputs that pytest starts collecting from. These are usually the positional command line arguments but can also be testpaths and others - doesn't matter.

A collection arg has two parts - the path and (optionally) and parts within the file.

All of the collection args are given to Session.collect() which collects them and yields the initial set of nodes.

`Session.collect()`

Suppose the collection arguments are a/aa/aaa.py, /ab/. Then Session.collect() will produce two nodes (the order is important):

0: <Module a/aa/aaa.py>
1: <Dir a/ab>

But remember that nodes from a tree, so to get the full picture we need to look at the parents of the nodes:

<Dir a>
  <Dir a/aa>
    <Module a/aa/aaa.py> (0)
  <Dir a/ab> (1)

Note that the <Dir a> parent is the same for both collection args.

`genitems`

After Session.collect() takes the collection arguments and returns the initial nodes, the function genitems takes each node and recursively expands by calling collect() on each collector node and yielding item nodes (the leaves, i.e. the tests).

So it can look something like this:

genitems(<Module a/aa/aaa.py>)
   <Module a/aa/aaa.py>.collect() -> <Function test_it>, <Class TestCls>
   genitems(<Function test_it>)
     yield <Function test_it>
   genitems(<Class TestCls>)
      <Class TestCls>.collect() -> <Function test_meth1>, <Function test_meth2>
      genitems(<Function test_meth1>)
        yield <Function test_meth1>
      genitems(<Function test_meth2>)
        yield <Function test_meth2>

`keep-duplicates` flag

Pytest has a --keep-duplicates flag (off by default) documented here but is mostly unspecified.

I think we can ignore whatever it does currently and make it mean what we want.

How should duplicates work?

I think first we need to decide on the semantics. I quickly wrote a test case with some scenarios, it is incomplete but can be a basis for discussion. It includes roughly the behavior that I think it should have but it will definitely change with more consideration (and there are some more interesting cases I should add).

Test cases

def test_duplicate_handling(pytester: Pytester) -> None:
    pytester.makepyfile(
        **{
            "top1/__init__.py": "",
            "top1/test_1.py": (
                """
                def test_1(): pass

                class TestIt:
                    def test_2(): pass

                def test_3(): pass
                """
            ),
            "top1/test_2.py": (
                """
                def test_1(): pass
                """
            ),
            "top2/__init__.py": "",
            "top2/test_1.py": (
                """
                def test_1(): pass
                """
            ),
        },
    )

    result = pytester.runpytest_inprocess("--collect-only", ".")
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess("--collect-only", "top2", "top1")
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess("--collect-only", "top1", "top1/test_2.py")
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess("--collect-only", "top1/test_2.py", "top1")
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only", "--keep-duplicates", "top1/test_2.py", "top1"
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only", "top1/test_2.py", "top1/test_2.py"
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "    <Module test_2.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess("--collect-only", "top2/", "top2/")
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only", "top2/", "top2/", "top2/test_1.py"
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "  <Package top2>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only", "top1/test_1.py", "top1/test_1.py::test_3"
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "      <Function test_3>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only", "top1/test_1.py::test_3", "top1/test_1.py"
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_3>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "",
        ],
        consecutive=True,
    )

    result = pytester.runpytest_inprocess(
        "--collect-only",
        "--keep-duplicates",
        "top1/test_1.py::test_3",
        "top1/test_1.py",
    )
    result.stdout.fnmatch_lines(
        [
            "<Dir *>",
            "  <Package top1>",
            "    <Module test_1.py>",
            "      <Function test_3>",
            "      <Function test_1>",
            "      <Class TestIt>",
            "        <Function test_2>",
            "      <Function test_3>",
            "",
        ],
        consecutive=True,
    )

Here are some guidelines I based my expected outcomes on (these are debatable):

The order of the collection args matters and should be respected as much as possible
A collection arg specified explicitly (i.e. not a descendant of another arg) should always be duplicated
Otherwise whether or not to duplicate should depend on --keep-duplicates
Emitting multiple items with the same nodeid is fine, but emitting the same Item object itself multiple times should never happen
Collectors should be shared as much as possible, unless explicitly specified requires duplication

from pytest.

Issue with duplicates handling in Pytest 8 about pytest HOT 4 OPEN

Comments (4)

Collection arguments

`Session.collect()`

`genitems`

`keep-duplicates` flag

How should duplicates work?

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (4)

Collection arguments

Session.collect()

genitems

keep-duplicates flag

How should duplicates work?

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

`Session.collect()`

`genitems`

`keep-duplicates` flag