Giter VIP home page Giter VIP logo

zyte-common-items's People

Contributors

burnzz avatar gallaecio avatar guillermo-bondonno avatar kishan3 avatar kmike avatar pyexplorer avatar rromaniuc avatar vmruiz avatar wrar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zyte-common-items's Issues

Fix coverage detection

Currently tests don't collect coverage data and print a warning about that. Running pytest directly works while running tox doesn't. I feel like it's because of the path problem currently hidden by PY_IGNORE_IMPORTMISMATCH=1 (as coverage collection should depend on file paths), but I don't know how to fix it.

mypy 1.5.0 and Item.__slots__

mypy 1.5.0 has an additional warning:

zyte_common_items/base.py:53: error: Trying to assign name "_unknown_fields_dict" that is not in "__slots__" of type "zyte_common_items.base.Item"  [misc]
zyte_common_items/base.py:77: error: Trying to assign name "_unknown_fields_dict" that is not in "__slots__" of type "zyte_common_items.base.Item"  [misc]

Item doesn't define __slots__ but its base class _ItemBase does (and it includes _unknown_fields_dict). _ItemBase is a normal class, Item is decorated with @attrs.define. There are some entries about attr classes and __slots__ in the mypy 1.5.0 changelog, but I don't see if it's a false positive or a newly detected problem in our code. python/mypy#15639 is likely the change that causes this.

Also related:

https://mypy.readthedocs.io/en/stable/class_basics.html#slots

https://www.attrs.org/en/stable/examples.html#slots

https://www.attrs.org/en/stable/glossary.html#term-slotted-classes

A collection of processors to clean-up field values

Stemming off from the discussion in #15.


We need a library of functions that can be used to preprocess the data values placed into the item. Ideally, this should be used by https://github.com/scrapinghub/web-poet fields. For example:

from web_poet import ItemPage, field
from zyte_common_items import Product
from zyte_common_items.processors import clean_str


class ProductPage(ItemPage[Product]):
    @field
    @clean_str
    def name(self):
        return "  some value \n\n\t"

page = ProductPage()
assert page.name == "some value"
assert page.name == page.to_item().name

A guideline in web-poet should be created as well to properly document the processors found in zyte-common-items.

Support dicts in converters

zyte_common_items.util.convert_to_class() expects an attrs-class and so returning a dict from metadata() is no longer supported.

Combine and simplify item decorators.

Currently, items are declared like:

import attrs
from zyte_common_items.util import export

@export
@attrs.define(slots=True, kw_only=True)
class SomeItem:
    ...

It would be nice if we can combine the two decorators into a single one to have a more concise code.

Use default black options

Currently we override line length to be 120. I'm not sure what's the reason for this, and propose to use black defaults.

HasMetadata doesn't work in some subclasses

This fails:

def test_metadata_with_returns():
    class MyProduct(Product):
        pass

    class CustomProductPage(ProductPage, Returns[MyProduct]):
        pass

    url = ResponseUrl("https://example.com")
    html = b""
    page = CustomProductPage(response=HttpResponse(url=url, body=html))
    metadata = page.metadata
    @field
    def metadata(self) -> MetadataT:
>       value = self.metadata_cls()
E       TypeError: 'NoneType' object is not callable

zyte_common_items/pages.py:67: TypeError

zyte_common_items.pages._get_metadata_class in the case of this CustomProductPage cannot find any HasMetadata base and returns None.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.