Giter VIP home page Giter VIP logo

favicon's People

Contributors

alexjacobson95 avatar scottwernervt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

favicon's Issues

Add request timeout

I'm scraping a big list of urls, and for each website I use favicon to extract images that I then store in database. And only when new website gets added to the list - it performs get request:

icons = favicon.get("http://" + value)
return icons[0].url

image
The problem is that some laggy websites may delay page load for minutes, and all I want is to somehow limit time of each request.
For example :


if images not downloaded after 5 seconds:
    return None

Thanks.

Set Icon's format using Headers if there's no file extension

Occasionally I encounter favicons that don't have a file extension. e.g. https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32

Getting this results in a list of Icon objects like this, with an empty format:

Icon(
    url='https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32',
    width=16,
    height=16,
    format=''
)

In a situation like this could/should favicon use the response headers from requests to determine the format instead? For example, doing:

response = requests.get("https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32")

then response.headers includes:

'Content-Type': 'image/jpeg',
'Content-Disposition': 'inline; filename="bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49.jpeg"'

Perhaps fall back to using one of those to determine the likely file extension? At the moment, from outside favicon, it's impossible to get this data without manually using requests again myself.

(Is this project still maintained?)

Error with URL https://www.commercecentric.com causes no valid icons to be returned

Though there are some valid icons, none are returned. Instead, this error is thrown:

Traceback (most recent call last):
  File "/src/logic/website_scraping.py", line 312, in getWebsiteScrapedDataForURL
    potentialIcons = favicon.get(
  File "/.venv/lib/python3.8/site-packages/favicon/favicon.py", line 66, in get
    link_icons = tags(response.url, response.text)
  File "/.venv/lib/python3.8/site-packages/favicon/favicon.py", line 142, in tags
    width, height = dimensions(tag)
  File "/.venv/lib/python3.8/site-packages/favicon/favicon.py", line 176, in dimensions
    width, height = re.split(r'[x\xd7]', size[0])
ValueError: not enough values to unpack (expected 2, got 1)

size is an array with contents: ['32/32'].

Use given website for icon url scheme

favicon assumes the schema https when creating a icon url with <link href="//static.openr.co/main/favicons/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon">. When trying to download the icon using requests, it fails with requests.exceptions.SSLError. Instead of defaulting to https we should parse the scheme from the passed website.

Add extra icon discovery methods

Add new icon discovery methods:

  • manifest.json and browserconfig.xml
  • Microsoft <meta name='msapplication-TileImage' content='icon.png'> (#15)
  • base64 icons data:image/png;base64,...

Pass html string as param

Hi. Thanks for your module.
It would be great if the get method could accept not only url but also preloaded html string as an optional parameter.

Incorrect location of favicon.ico

The library currently tries to find the default favicon.ico file in the wrong place.

According to https://html.spec.whatwg.org/multipage/links.html#rel-icon:

"In the absence of a link with the icon keyword . . . Let request be a new request whose url is the URL record obtained by resolving the URL "/favicon.ico" against the Document object's URL".

In other words the favicon.ico is stored at the site root.

Example:
Original URL: https://github.com/scottwernervt/favicon/
Correct Favicon URL: github.com/favicon.ico
Incorrect Favicon URL: github.com/scottwenervt/favicon/favicon.ico

Right now the library searches the incorrect url.

Handle poor html values in links

The fav icon for http://www.iposcoop.com/ has a tab \t in the filename. We should handle poor html formatting by peforming strip() on the filename.

Icon(url='https://www.iposcoop.com/wp-content/uploads/2014/02/favicon.ico\t', width=0, height=0, format='ico\t'),
Icon(url='https://www.iposcoop.com/favicon.ico', width=0, height=0, format='ico'),
Icon(url='https://www.iposcoop.com/wp-content/themes/flatsome/apple-touch-icon-precomposed.png', width=0, height=0, format='png')

ValueError when making an int from empty width/height

I don't know which site/favicon my code was trying to fetch when the final line in favicon.py generated:

File "/webapps/oohdir/code/venv/lib/python3.10/site-packages/favicon/favicon.py", line 66, in get
link_icons = tags(response.url, response.text)
File "/webapps/oohdir/code/venv/lib/python3.10/site-packages/favicon/favicon.py", line 142, in tags
width, height = dimensions(tag)
File "/webapps/oohdir/code/venv/lib/python3.10/site-packages/favicon/favicon.py", line 188, in dimensions
return int(width), int(height)
ValueError: invalid literal for int() with base 10: ''

But I've replicated the error for my tests with an HTML page that has an element like:

<link rel="icon" type="image/jpeg" sizes="x" href="/favicon.jpg" />

That sizes attribute results in the code trying to make a width/height from "" and generating the ValueError.

'NoneType' object has no attribute 'strip'

Hi, thanks for this awesome egg :)

I think that I found a bug when a website contains this meta: <link href="" rel="shortcut icon"/> .

This causes tag.get('href') or tag.get('content') at favicon.py to return None instead of a string.

You may want to not include metas with empty href/content into the one to parse 😊

Thank you in advance for your time and I wish you a great week :)

AttributeError: 'NoneType' object has no attribute 'lower'

The following tag <meta content="en-US" data-rh="true" itemprop="inLanguage"/> causes an exception because it does not have name or proprety attribute.

Traceback (most recent call last):
  File "/opt/pycharm-professional/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/opt/pycharm-professional/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/opt/pycharm-professional/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/opt/pycharm-professional/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/swerner/development/projects/favicon/debug.py", line 3, in <module>
    fav_icons = favicon.get('https://www.nytimes.com/')
  File "/home/swerner/development/projects/favicon/src/favicon/favicon.py", line 69, in get
    link_icons = tags(response.url, response.text)
  File "/home/swerner/development/projects/favicon/src/favicon/favicon.py", line 125, in tags
    if meta_type.lower() == name.lower():
AttributeError: 'NoneType' object has no attribute 'lower'

UserWarning: No parser was explicitly specified

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
  
  The code that caused this warning is on line 8 of the file /home/swerner/development/projects/favicon/tests/test_favicon.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
  
    s = BeautifulSoup('')

-- Docs: https://docs.pytest.org/en/latest/warnings.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.