Giter VIP home page Giter VIP logo

play-scraper's People

Contributors

ayush-mandowara-bst avatar danieliu avatar karn avatar masroore avatar parisbs avatar ratson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

play-scraper's Issues

IndexError: list index out of range while getting collections

Describe the bug
While requesting NEW_FREE collection I get IndexError exception:

File ".../python2.7/site-packages/play_scraper/api.py", line 41, in collection
return s.collection(collection, category, **kwargs)
File ".../python2.7/site-packages/play_scraper/scraper.py", line 134, in collection
for app_card in soup.select('div[data-uitype="500"]')]
File ".../python2.7/site-packages/play_scraper/utils.py", line 358, in parse_card_info
developer_id = dev_soup.attrs['href'].split('=')[1]
IndexError: list index out of range

To Reproduce
run play_scraper.collection(collection='NEW_FREE',results=100,page=0,gl='us')

Expected behavior
Should receive list of app metadata from specified Chart in Google Play

Screenshots
Not applicable

Desktop (please complete the following information):

  • OS: [Linux Ubuntu 12.04.5 LTS, Precise Pangolin, MacOS Mojave 10.14.3]
  • Python Version: [2.7.3, 2.7.10]
  • play_scraper Version: [0.5.2: 2019-01-19]

Additional context
It seems, like the scraper is getting wrong response from Google Play and cannot parse it correctly.

Outdated GPlay class name + uncaught exception

Describe the bug
Scraper crashes due to outdated class name. This issue causes an Exception to be raised later in the parsing process. This exception is not caught and crashes the scraper.

Traceback (most recent call last):
File "", line 1, in
File "site-packages/play_scraper/api.py", line 22, in details
return s.details(app_id)
File "site-packages/play_scraper/scraper.py", line 83, in details
app_json = parse_app_details(soup)
File "site-packages/play_scraper/utils.py", line 312, in parse_app_details
soup.select_one('.xyOfqd'))
File "site-packages/play_scraper/utils.py", line 138, in parse_additional_info
section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'

To Reproduce
Request details of any package name.

Expected behavior

  1. Scraper returns additional_info_data variable in utils.py/parse_app_details(soup) as empty dictionary or
    2.scraper makes check if utils.py/parse_additional_info(soup)/soup is not None.
    Depends on the author's decision.

Screenshots
Not applicable

Desktop (please complete the following information):

  • OS: [macOS 10.14.3]
  • Python Version [2.7.10]
  • play_scraper Version [0.5.3]

Additional context
While obtaining details about an app today (2019-03-26) I have noticed, that tool starts crashing on utils.py, line 312 (... soup.select_one('.xyOfqd')))

  1. After some investigation I have found out, that class name xyOfqd is not used by Google Play anymore. Instead class name IxB2fe is currently used by Google Play Store.
  2. Preventing utils.py/parse_app_details(soup)/additional_info_data to be set as None might help with NoneType object accessing and prevent the AttributeError in utils.py, line 138

Error running simple example. Help please.

import play_scraper
print play_scraper.details('com.android.chrome')
File "<ipython-input-8-60b6c1359646>", line 2
    print play_scraper.details('com.android.chrome')
                     ^
SyntaxError: invalid syntax

Running on Mac OS with python:
3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

play-scraper always return empty array for screenshot

Describe the bug
screenshot attribute always empty array, description, description_html

To Reproduce
location = "kr"
lang="ko"
ajson = play_scraper.details("com.dena.a12026418",gl=location,hl=lang)
print(ajson)
--------------- return -------------------
{'title': 'Pokémon Masters', 'icon': 'https://lh3.googleusercontent.com/Qow956nxep_gy5lWMRXd7hTX-SUE-m8Un4etpm6o1A3AAjFvesAq-YyM1Fy9qjr1uZBe', 'screenshots': [], 'video': 'https://www.youtube.com/embed/FV2ISpwZRck', 'category': ['GAME_ROLE_PLAYING'], 'score': '3.8', 'histogram': {}, 'reviews': 0, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': False, 'developer_id': '5614074995304947897', 'updated': None, 'size': None, 'installs': None, 'current_version': None, 'required_android_version': None, 'content_rating': None, 'iap_range': None, 'interactive_elements': None, 'developer': None, 'developer_email': None, 'developer_url': None, 'developer_address': None, 'app_id': 'com.dena.a12026418', 'url': 'https://play.google.com/store/apps/details?id=com.dena.a12026418'}

In this page, we found screen shot and description html
https://play.google.com/store/apps/details?id=com.dena.a12026418&hl=ko&gl=kr

Expected behavior
screen shot array and description and description_html appear

Desktop (please complete the following information):

  • OS: [e.g. Windows 10]
  • Python Version [e.g. 3.7.3]
  • play_scraper Version [0.5.5]

Country Codes Does Not Match Google's Own Country Code List

Describe the bug
play_scraper/constants.py - GL_COUNTRY_CODES list, any gl= must match, but this does not match Google's own country code list at https://developers.google.com/public-data/docs/canonical/countries_csv, which leads to 404 errors on countries listed in GL_COUNTRY_CODES and valid country codes being rejected.

To Reproduce

  1. Run any play_scraper call with a gl="kp", valid according to play-scraper, not valid according to Google:
    requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/developer?id=&gl=kp&hl=en

  2. Run play_scraper with gl='im' results in error from play-scraper:

ValueError: im is not a valid geolocation country code.

However https://play.google.com/store/apps/developer?id=&gl=im results in valid content from Google.

Expected behavior
More maintained GL_COUNTRY_CODE list, or allowing overrides to not validate internally.

Change in class id: the solution

Google has changed class id for additional info.
Please change row 310 in utils.py from
soup.select_one('.xyOfqd'))
to
soup.select_one('.IxB2fe'))

Tests are failing

Describe the bug
Tests are failing. There are 404 and 405 issues which seems to result in assertion errors.

To Reproduce

$ python3 setup.py test

Expected behavior
Passing of all tests.

Desktop (please complete the following information):

  • OS: Fedora 31
  • Python Version: Python 3.7.5 (default, Dec 15 2019, 17:54:26)
  • play_scraper Version 0.6.0

Additional context

Output
Executing(%check): /bin/sh -e /var/tmp/rpm-tmp.HGuyQ3
+ umask 022
+ cd /home/fab/rpmbuild/BUILD
+ cd play-scraper-0.6.0
+ /usr/bin/python3 setup.py test
running test
Searching for requests-futures>=0.9.7
Reading https://pypi.org/simple/requests-futures/
Downloading https://files.pythonhosted.org/packages/47/c4/fd48d1ac5110a5457c71ac7cc4caa93da10a80b8de71112430e439bdee22/requests-futures-1.0.0.tar.gz#sha256=35547502bf1958044716a03a2f47092a89efe8f9789ab0c4c528d9c9c30bc148
Best match: requests-futures 1.0.0
Processing requests-futures-1.0.0.tar.gz
Writing /tmp/easy_install-87wce9d5/requests-futures-1.0.0/setup.cfg
Running requests-futures-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-87wce9d5/requests-futures-1.0.0/egg-dist-tmp-kjhu22lj
creating /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg
Extracting requests_futures-1.0.0-py3.7.egg to /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs

Installed /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/.eggs/requests_futures-1.0.0-py3.7.egg
running egg_info
writing play_scraper.egg-info/PKG-INFO
writing dependency_links to play_scraper.egg-info/dependency_links.txt
writing requirements to play_scraper.egg-info/requires.txt
writing top-level names to play_scraper.egg-info/top_level.txt
reading manifest file 'play_scraper.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'play_scraper.egg-info/SOURCES.txt'
running build_ext
test_categories_ok (tests.test_scraper.CategoryTest) ... ok
test_different_language_and_country (tests.test_scraper.CategoryTest) ... ok
test_default_num_results (tests.test_scraper.CollectionTest) ... ERROR
test_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL
test_detailed_collection_different_language (tests.test_scraper.CollectionTest) ... FAIL
test_family_with_age_collection (tests.test_scraper.CollectionTest) ... ERROR
test_invalid_category_id (tests.test_scraper.CollectionTest) ... ok
test_invalid_collection_id (tests.test_scraper.CollectionTest) ... ok
test_invalid_num_results_over_120 (tests.test_scraper.CollectionTest) ... ok
test_invalid_page_x_results_over_500 (tests.test_scraper.CollectionTest) ... ok
test_non_detailed_collection (tests.test_scraper.CollectionTest) ... FAIL
test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest) ... ERROR
test_promotion_collection_id (tests.test_scraper.CollectionTest) ... FAIL
test_app_with_no_developer (tests.test_scraper.DetailsTest) ... ERROR
test_fetching_app_in_spanish (tests.test_scraper.DetailsTest) ... ok
test_fetching_app_with_all_details (tests.test_scraper.DetailsTest) ... ok
test_developer_parameter_float_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_int_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_long_invalid (tests.test_scraper.DeveloperTest) ... ok
test_developer_parameter_string_digits_invalid (tests.test_scraper.DeveloperTest) ... ok
test_different_language_and_country (tests.test_scraper.DeveloperTest) ... ERROR
test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest) ... ERROR
test_fetching_developer_default_results (tests.test_scraper.DeveloperTest) ... ERROR
test_maximum_results (tests.test_scraper.DeveloperTest) ... ERROR
test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest) ... ERROR
test_page_out_of_range (tests.test_scraper.DeveloperTest) ... ok
test_init_with_defaults (tests.test_scraper.PlayScraperTest) ... ok
test_init_with_language_and_geolocation (tests.test_scraper.PlayScraperTest) ... ok
test_invalid_geolocation_code_raises (tests.test_scraper.PlayScraperTest) ... ok
test_invalid_language_code_raises (tests.test_scraper.PlayScraperTest) ... ok
test_basic_search (tests.test_scraper.SearchTest) ... ok
test_different_language_and_country (tests.test_scraper.SearchTest) ... ok
test_page_out_of_range_not_between_0_and_12 (tests.test_scraper.SearchTest) ... ok
test_search_with_app_detailed (tests.test_scraper.SearchTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47640), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50764), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50760), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47632), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47626), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47628), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50768), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47624), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47630), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47636), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_different_language_and_country (tests.test_scraper.SimilarTest) ... ok
test_similar_ok (tests.test_scraper.SimilarTest) ... ok
test_similar_with_app_detailed (tests.test_scraper.SimilarTest) ... /home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47664), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47680), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47666), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=9, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50796), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50794), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50798), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47662), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=11, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47678), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 47674), raddr=('172.217.168.46', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py:71: ResourceWarning: unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.0.193', 50802), raddr=('216.58.215.238', 443)>
return multi_futures_app_request(app_ids, params=self.params)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
ok
test_different_language_and_country (tests.test_scraper.SuggestionTest) ... ok
test_empty_query (tests.test_scraper.SuggestionTest) ... ok
test_query_suggestions (tests.test_scraper.SuggestionTest) ... ok
test_list_url_both_args (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_no_args (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_only_category (tests.test_utils.TestBuildListUrl) ... ok
test_list_url_only_collection (tests.test_utils.TestBuildListUrl) ... ok
test_building_app_url (tests.test_utils.TestBuildUrl) ... ok
test_building_multiple_word_dev_name (tests.test_utils.TestBuildUrl) ... ok
test_building_simple_dev_name (tests.test_utils.TestBuildUrl) ... ok
test_default_post_data (tests.test_utils.TestGeneratePostData) ... ok
test_first_page_data (tests.test_utils.TestGeneratePostData) ... ok
test_only_num_results (tests.test_utils.TestGeneratePostData) ... ok
test_page_token (tests.test_utils.TestGeneratePostData) ... ok
test_request_with_params (tests.test_utils.TestSendRequest) ... ok
test_send_normal_request (tests.test_utils.TestSendRequest) ... ok

======================================================================
ERROR: test_default_num_results (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in test_default_num_results
self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 223, in
self.assertTrue(all(key in apps[0] for key in BASIC_KEYS))
IndexError: list index out of range

======================================================================
ERROR: test_family_with_age_collection (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 231, in test_family_with_age_collection
age='SIX_EIGHT')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/FAMILY/collection/topselling_free?hl=en&gl=us&age=AGE_RANGE2

======================================================================
ERROR: test_non_detailed_different_language_and_country (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 191, in test_non_detailed_different_language_and_country
apps = s.collection('TOP_PAID', 'LIFESTYLE', results=5)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 132, in collection
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/LIFESTYLE/collection/topselling_paid?hl=da&gl=dk

======================================================================
ERROR: test_app_with_no_developer (tests.test_scraper.DetailsTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 82, in details
response = send_request('GET', url, params=self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=us

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 156, in test_app_with_no_developer
app_data = self.s.details('org.selfie.beauty.camera.pro')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 86, in details
app=app_id, error=e))
ValueError: Invalid application ID: org.selfie.beauty.camera.pro. 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=org.selfie.beauty.camera.pro&hl=en&gl=us

======================================================================
ERROR: test_different_language_and_country (tests.test_scraper.DeveloperTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 302, in test_different_language_and_country
apps = s.developer('Google LLC', results=5)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=da&gl=dk

======================================================================
ERROR: test_fetch_developer_apps_detailed (tests.test_scraper.DeveloperTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 294, in test_fetch_developer_apps_detailed
apps = self.s.developer('Disney', results=3, detailed=True)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us

======================================================================
ERROR: test_fetching_developer_default_results (tests.test_scraper.DeveloperTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 272, in test_fetching_developer_default_results
apps = self.s.developer('Disney')
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us

======================================================================
ERROR: test_maximum_results (tests.test_scraper.DeveloperTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 280, in test_maximum_results
apps = self.s.developer('Google LLC', results=120)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us

======================================================================
ERROR: test_over_max_results_fetches_five (tests.test_scraper.DeveloperTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 287, in test_over_max_results_fetches_five
apps = self.s.developer('Google LLC', results=121)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/scraper.py", line 165, in developer
response = send_request('POST', url, data, self.params)
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/play_scraper/utils.py", line 121, in send_request
response.raise_for_status()
File "/usr/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: https://play.google.com/store/apps/developer?id=Google+LLC&hl=en&gl=us

======================================================================
FAIL: test_detailed_collection (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 200, in test_detailed_collection
self.assertEqual(1, len(apps))
AssertionError: 1 != 0

======================================================================
FAIL: test_detailed_collection_different_language (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 210, in test_detailed_collection_different_language
self.assertEqual(1, len(apps))
AssertionError: 1 != 0

======================================================================
FAIL: test_non_detailed_collection (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 178, in test_non_detailed_collection
self.assertEqual(2, len(apps))
AssertionError: 2 != 0

======================================================================
FAIL: test_promotion_collection_id (tests.test_scraper.CollectionTest)

Traceback (most recent call last):
File "/home/fab/Documents/repos/rpmbuild/BUILD/play-scraper-0.6.0/tests/test_scraper.py", line 241, in test_promotion_collection_id
self.assertEqual(2, len(apps))
AssertionError: 2 != 0


Ran 53 tests in 37.547s

FAILED (failures=4, errors=9)
Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>
error: Test failed: <unittest.runner.TextTestResult run=53 errors=9 failures=4>
error: Bad exit status from /var/tmp/rpm-tmp.HGuyQ3 (%check)

'NoneType' object has no attribute 'attrs'

It gives an error on line 240,
icon = (soup.select_one('.dQrBL img.ujDFqe')
On replacing it with : icon = (soup.select_one ('. XSyT2c img.T75of')` as suggested in Issue 42, the error was still not resolved.
Kindly suggest on what should be done to resolve this issue.

links are dead by a recent google update

Describe the bug
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME_RACING/collection/movers_shakers?gl=us&hl=en

To Reproduce

import play_scraper
print(play_scraper.collection(
... collection='TRENDING',
... category='GAME_RACING',
... results=5,
... page=1))

Expected behavior
list of trending games

Desktop (please complete the following information):

  • OS: Windows
  • Python Version : 2.7
  • play_scraper Version : latest

Improve requirements.txt

Hello @danieliu,

First, thank you for that great module.
Then, could you replace the == of the requirements.txt by >= ? Unless your module does not need these exact versions, I think it can be compatible with newer versions.

Cheers !

Exception Thrown by play-scraper.details Method

Describe the bug
Attempting to run the example described in the README:
print(play_scraper.details('com.android.chrome'))
returns a AttributeError.

To Reproduce

import play_scraper
print(play_scraper.details('com.android.chrome'))

Screenshots
image

Desktop (please complete the following information):

  • OS: macOS 10.14.1
  • Python Version: 3.6.5
  • play_scraper Version: 0.5.3

Additional context
Seems related to Issue #2

Proxy support

Is your feature request related to a problem? Please describe.
play-scraper did not support proxy
so if a program is using play-scraper is behind a proxy, it will fail

Describe the solution you'd like
proxy support in play-scraper

Specifying different languages breaks selectors used to find various app details

Passing in a different hl results in the alt attributes and the titles in the additional details section to change.

Selectors should be generalized to use obfuscated classes or other attributes that do not change between languages.

The additional details section HTML isn't specific enough for a clear way to differentiate between subsections and correctly parse the data out, unfortunately.

Search Pagination

Hi!
The library is great so far and it is helping me a lot in one of my projects. I just have one question. In the search function there is now a limit of 12 to the number of pages. I have noticed that this is related to the PAGE_TOKENS in the settings. My question is: how can it be augmented to retrieve an arbitrary amount of page results?

Thanks!

add Privacy URL

Is your feature request related to a problem? Please describe.
The developer field has no privacy URL. So if it could be added will be great.

Describe the solution you'd like
To add privacy URL to developer detail.

I don't know whether the code beyond worked or not, just for advice.

Thanks for your reading.

developer_privacy = value_div.select('div')[-2].contents[0]

categories returns empty result

Describe the bug
When calling categories, it returns empty result

To Reproduce

import play_scraper
play_scraper.version
'0.5.5'
play_scraper.categories()
{}

Desktop (please complete the following information):

  • Ubuntu 16
  • Python 3.5.2
  • play_scraper Version 0.5.5

Wrong selector beautiful soup

Describe the bug
The scraper crash because selector css not found

To Reproduce
play_scraper.details('whatever_id')

'NoneType' object has no attribute 'attrs'

Expected behavior
No crash when selector not found

Desktop

  • OS: Win 7
  • Python Version 3.7
  • play_scraper Version 0.5.4

The issue is on utils.py line 241
soup.select_one('.dQrBL img.ujDFqe')

screenshots Problem

com.Rain.Teslagrad

for egs in gPlayBilgi['screenshots']:
            print("Ekran:",egs)

Output:
data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

output format

Hi, thanks for this cool scraper.
I just want to mention that the output data is not in a clean json format. when i used python json parsing library it shows error.

Thanks!

get reviews

I would like to know if I could extract all reviews of an app.
Thanks

Outdated css class name for GPlay

Describe the bug
Google probably made update yesterday causing detail method to stop working
play_scraper/utils.py", line 240, in parse_app_details\n icon = (soup.select_one('.dQrBL img.ujDFqe')\nAttributeError: 'NoneType' object has no attribute 'attrs''

To Reproduce
Call detail method of with any package name

Desktop (please complete the following information):

  • play-scraper==0.5.4

Getting similar apps returns empty list

Describe the bug
Getting similar apps is broken, it does not return any apps. This also fails in the unit tests with AssertionError: 0 not greater than 0

To Reproduce
Easiest way is to run the unit tests.

Expected behavior
The tests do not fail and similar apps are returned when using the method.

Desktop (please complete the following information):

  • OS: macOS 10.14
  • Python Version 3.7
  • play_scraper Version 0.5.5

AttributeError: 'NoneType' object has no attribute 'strip'

Traceback (most recent call last): File "test_desc.py", line 2, in <module> print (play_scraper.details('com.z2p.devops.mathpuzzel')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 341, in details app_json = self._parse_app_details(soup) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 254, in _parse_app_details soup.select_one('.xyOfqd')) File "/home/vmstation/.local/lib/python2.7/site-packages/play_scraper/scraper.py", line 168, in _parse_additional_info 'developer_address': developer_address.strip()} AttributeError: 'NoneType' object has no attribute 'strip'

Missing apps are not skipped

Describe the bug
In rare situations, an app will be listed as a result in the search function, while the app actually has been (temporarily) removed from the Play store. When using the detailed=True argument; the package will throw an error once the missing app is scraped, as it tries to access the actual app page.

To Reproduce
Steps to reproduce the behavior, e.g. the full example code, not just a snippet of where the error occurs!

 $ print(play_scraper.search('CAUTI', gl='nl', detailed='True', page=6))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/api.py", line 79, in search
    return s.search(query, page, detailed)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 224, in search
    apps = self._parse_multiple_apps(response)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/scraper.py", line 71, in _parse_multiple_apps
    return multi_futures_app_request(app_ids, params=self.params)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 531, in multi_futures_app_request
    result = response.result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/sessions.py", line 653, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 504, in parse_app_details_response_hook
    details = parse_app_details(soup)
  File "(blahblah)/.env/lib/python3.6/site-packages/play_scraper/utils.py", line 239, in parse_app_details
    title = soup.select_one('h1[itemprop="name"] span').text
AttributeError: 'NoneType' object has no attribute 'text'

In the original usecase (function that iterated over the pages using celery) the following error was thrown as well:

[2019-10-28 10:48:41,362: ERROR/ForkPoolWorker-1] Error occurred fetching uk.incrediblesoftware.mpcmachine.demo: 404 Client Error: Not Found for url: https://play.google.com/store/apps/details?id=uk.incrediblesoftware.mpcmachine.demo&hl=en&gl=nl&q=CAUTI&c=apps

From this I tried to check out the actual play store page for uk.incrediblesoftware.mpcmachine.demo; which as expected, throws an HTTP 404 error.

Expected behavior
I hoped the package would print the 404-error; skip over this one and still return the remaining results. I can catch errors in my code to prevent problems, but that way an entire page of apps will still be excluded from the results.

Desktop (please complete the following information):

  • OS: Windows 10 - Running WSL Ubuntu 18.04
  • Python Version 3.6.8
  • play_scraper Version 0.6.0

feature request: module for collecting reviews

Hi,

this is already a pretty nice package, but one addition would make it even better: an option for collecting information about the reviews of users:

  • date of review
  • stars of review
  • thumbs up of review
  • text of review
  • name of reviewer (might be good to hash it for maintaining anonymity)

There is a lot of stuff that could be done with reviewer information, for instance constructing relations between apps and their users (reviewers), or examining whether a fixed core of users is producing a lot of positive / negative reviews for some content.

Getting Errors

Hi,

This is what i running:

import play_scraper
import csv
import json
data=( play_scraper.search("Disney", page=1, detailed=True))
print(data.count(data))

also same when running:
print play_scraper.details('com.android.chrome')

Getting errors:

    data=( play_scraper.search("Disney", page=1, detailed=True))
  File "C:\Python27\lib\site-packages\play_scraper\api.py", line 79, in search
    return s.search(query, **kwargs)
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 415, in search
    apps = self._parse_multiple_apps(response)
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 273, in _parse_multiple_apps
    apps.append(self._parse_app_details(soup))
  File "C:\Python27\lib\site-packages\play_scraper\scraper.py", line 155, in _parse_app_details
    updated = additional_info.select_one('div[itemprop="datePublished"]')
AttributeError: 'NoneType' object has no attribute 'select_one'

Thanks,

"developer" from the details() method is not working

Hi,
I already had contact with daniel and he told me to put this issue here.
I am using:

  • windows 10 operating system
  • pyhon 3.7.0
  • play-scraper 0.2.4

The rest seems to work fine. Atleast for the developer() method.

Code:
app_id = "com.igg.android.lordsmobile"
details = play_scraper.details(app_id)
print(details["developer"])

It keeps giving me Google Commerce Ltd for every app_id I put in.
The Google Commerce Ltd can been found on the bottom of each app page.

Offered By
Google Commerce Ltd

Maybe this has something to do with it?

What's new / "recent_changes" returns None

Describe the bug
What's New / "recent_changes" returns None even though it exists.

To Reproduce

import play_scraper as ps

data = ps.details('com.supercell.clashofclans')
print(data['recent_changes'])

Expected behavior
Should return the value for 'what's new' when it exists.

Screenshots
Not required

Desktop (please complete the following information):

  • OS: [Windows 10]
  • Python Version [3.7.2]
  • play_scraper Version [0.5.2]

Additional context
Issue can be resolved by changing code to
recent_changes = changes_soup.text

play_scraper always returns None for description and description_html

Describe the bug
always return None for description and description_html

{'title': 'WiFi Map — Free Passwords & Hotspots', 'icon': 'https://lh3.googleusercontent.com/SnhTxeJgxWHS7AlwEl_QNa2lpwCNFsL3Dqrzl-jMqvwFzYMIZW5V3IHUWFU3bKe6N8Kg', 'screenshots': [], 'video': 'https://www.youtube.com/embed/yl95ZgDTYp0', 'category': ['TRAVEL_AND_LOCAL'], 'score': '4.3', 'histogram': {5: None, 4: None, 3: None, 2: None, 1: None}, 'reviews': 700308, 'description': None, 'description_html': None, 'recent_changes': None, 'editors_choice': False, 'price': '0', 'free': True, 'iap': True, 'developer_id': '8565181914239008089', 'updated': 'April 20, 2019', 'size': '38M', 'installs': '50,000,000+', 'current_version': '4.1.17', 'required_android_version': '4.3 and up', 'content_rating': ['Everyone'], 'iap_range': ('$0.99', '$19.99'), 'interactive_elements': ['Users Interact, Digital Purchases'], 'developer': 'WiFi Map LLC', 'developer_email': '[email protected]', 'developer_url': 'https://www.wifimap.io/', 'developer_address': '25 Broadway, 9th Floor\nNew York, NY 10004', 'app_id': 'io.wifimap.wifimap', 'url': 'https://play.google.com/store/apps/details?id=io.wifimap.wifimap'}

To Reproduce

>>> import play_scraper
>>> play_scraper.details('io.wifimap.wifimap')

Expected behavior
fetch description and description_html correctly.

Screenshots
N/A

Desktop (please complete the following information):

  • OS: macOS 10.14.4
  • Python Version 3.7.3
  • play_scraper Version 0.5.5

Additional context
Add any other context about the problem here.

Double encoding in search.

Description:
If search contains special characters, it will be encoded twice: the first one due to the quote_plus library and the second one is done by Google's servers. It is easily solved by removing quote_plus from parameters. That is, having:

        self.params.update({
            'q': query,
            'c': 'apps',
        })

instead of:

        self.params.update({
            'q': quote_plus(query),
            'c': 'apps',
        })

To Reproduce
The input will be the developer web of Instagram: https://help.instagram.com/
When running res = play_scraper.search("https://help.instagram.com/", detailed=True)
Play_scraper does the following query:
/store/search?q=https%253A%252F%252Fhelp.instagram.com%252F&c=apps&gl=us&hl=en
Which is not right. If we look for that url in a browser, a query with encodings is written automatically in the searchbox:

Screenshot from 2020-01-27 10-37-02

Expected behavior
If we manually put https://help.instagram.com/ in the searchbox, the url will be:
https://play.google.com/store/search?q=https%3A%2F%2Fhelp.instagram.com%2F&c=apps&hl=en&gl=us

If we use the piece of code without quote_plus, the url that is searched for is exactly the same as the desired one.

Desktop (please complete the following information):

  • play_scraper Version 0.6.0

AttributeError: 'NoneType' object has no attribute 'select'

I tried running the sample code from the readme:

import play_scraper
print(play_scraper.details('com.android.chrome'))

I get the following output:

Traceback (most recent call last):
  File "bug.py", line 2, in <module>
    print(play_scraper.details('com.android.chrome'))
  File "/usr/local/lib/python3.7/site-packages/play_scraper/api.py", line 22, in details
    return s.details(app_id)
  File "/usr/local/lib/python3.7/site-packages/play_scraper/scraper.py", line 83, in details
    app_json = parse_app_details(soup)
  File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 312, in parse_app_details
    soup.select_one('.xyOfqd'))
  File "/usr/local/lib/python3.7/site-packages/play_scraper/utils.py", line 138, in parse_additional_info
    section_titles_divs = [x for x in soup.select('div.hAyfc div.BgcNfc')]
AttributeError: 'NoneType' object has no attribute 'select'

Desktop (please complete the following information):

  • OS: macOS 10.14.2
  • Python Version 3.7.2
  • play_scraper Version 0.5.1

RecursionError: maximum recursion depth exceeded while calling a Python object

Tried running a call in python shell. Error --> RecursionError: maximum recursion depth exceeded while calling a Python object. See trace below:

>>> print(play_scraper.details('com.android.chrome'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/api.py", line 22, in details
    return s.details(app_id)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/scraper.py", line 292, in details
    response = send_request('GET', url)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/play_scraper/utils.py", line 120, in send_request
    verify=verify)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/connection.py", line 314, in connect
    cert_reqs=resolve_cert_reqs(self.cert_reqs),
  File "/home/shreyas/Projects/python-virtualenv/python3-workspace/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 269, in create_urllib3_context
    context.options |= options
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/lib/python3.6/ssl.py", line 465, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  [Previous line repeated 323 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object

save data to csv file

Really it is great post. Thank you so much.
As I am very new to python , I am unable to save the output to csv file for filtering based on number of installations.

Do it support Python3.7

when I invoke 'python3 -m pip install play_scraper', it say:
command "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3 -u -c "import setuptools, tokenize;__file__='/private/tmp/pip-install-cj268d40/lxml/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/tmp/pip-record-mzwfg42a/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-install-cj268d40/lxml/

Got : AttributeError: 'module' object has no attribute '_base'

Got this error after installing play_scraper and try to import it
please help,

thanks

[root@CT114 ~]# python
Python 2.7.5 (default, Aug  4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import play_scraper
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/play_scraper/__init__.py", line 13, in <module>
    from play_scraper.api import (
  File "/usr/lib/python2.7/site-packages/play_scraper/api.py", line 11, in <module>
    from play_scraper import scraper
  File "/usr/lib/python2.7/site-packages/play_scraper/scraper.py", line 15, in <module>
    from bs4 import BeautifulSoup, SoupStrainer
  File "/usr/lib/python2.7/site-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/usr/lib/python2.7/site-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/usr/lib/python2.7/site-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'

suddenly fires error 'App not found (404)' on request

Describe the bug
I implemented this module in my service and worked well.
recently, module fires 'App not found(404)' error 70~80% of request.
strange thing is request success sometime.
that error on category only page

To Reproduce
import play_scraper
print(play_scraper.collection(
collection='TOP_GROSSING',
category='GAME',
gl='jp',
results=10,
page=0))

Error message
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://play.google.com/store/apps/category/GAME/collection/topgrossing?hl=en&gl=jp

Desktop (please complete the following information):

  • OS: macOS 10.14.6
  • Python Version 3.7.4
  • play_scraper Version 0.5.5

AttributeError: 'NoneType' object has no attribute 'attrs'

File "/home/huma/humascraper/HumaScraper/spiders/store/store.py", line 229, in processGooglePlay data = android.fetchFromGooglePlay(self.fakeName if self.fakeName else response.meta["packageName"]) File "/home/huma/humascraper/HumaScraper/common/util/android.py", line 243, in fetchFromGooglePlay gplay_info = play_scraper.details(package_name) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/api.py", line 22, in details return s.details(app_id) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/scraper.py", line 83, in details app_json = parse_app_details(soup) File "/home/huma/humascraper/.env/lib/python3.5/site-packages/play_scraper/utils.py", line 240, in parse_app_details icon = (soup.select_one('.dQrBL img.ujDFqe') AttributeError: 'NoneType' object has no attribute 'attrs'

i used play-scraper 0.5.4 and i think google change googleplay html pages

Developer field not working for play_scraper.collection() with gl<>'us' and detailed=True

Edit : not a problem with the library, the webpage itself is wrong when using a non default gf. So I guess nothing you can do about it (by the way thanks for this very useful library). Workaround I used : going through the list of apps scraped a second time using gl='us'.
When using play_scraper.collection() with gl<>'us' (ex:gl='fr') and with detailed=True, the developer field is always 'Google Commerce Ltd'.
Developer field is correct when using default gl.

server error for developer request

Hi!

Thanks for this package! I experience a problem with the most recent version:

Describe the bug
When using the code from the Github readme for requesting developer info, a 502 server error is raised.

To Reproduce

import play_scraper
print(play_scraper.developer('Disney', results=5))
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-23-86155d48969c> in <module>()
----> 1 dev = play_scraper.developer('Disney', results = 5)

~\Anaconda3\lib\site-packages\play_scraper\api.py in developer(developer, hl, gl, **kwargs)
     53     """
     54     s = scraper.PlayScraper(hl, gl)
---> 55     return s.developer(developer, **kwargs)
     56 
     57 

~\Anaconda3\lib\site-packages\play_scraper\scraper.py in developer(self, developer, results, page, detailed)
    158         url = build_url('developer', developer)
    159         data = generate_post_data(results, 0, pagtok)
--> 160         response = send_request('POST', url, data, self.params)
    161 
    162         if detailed:

~\Anaconda3\lib\site-packages\play_scraper\utils.py in send_request(method, url, data, params, headers, timeout, verify, allow_redirects)
    119             allow_redirects=allow_redirects)
    120         if not response.status_code == requests.codes.ok:
--> 121             response.raise_for_status()
    122     except requests.exceptions.RequestException as e:
    123         log.error(e)

~\Anaconda3\lib\site-packages\requests\models.py in raise_for_status(self)
    938 
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 
    942     def close(self):

HTTPError: 502 Server Error: Bad Gateway for url: https://play.google.com/store/apps/developer?id=Disney&hl=en&gl=us

Desktop (please complete the following information):

  • Windows 10
  • Python Version 3.6.
  • play_scraper Version e.g. 0.5.2

Additional context
I think the problem is due to the gl parameter, might also be related to #21.

Items in resulting dict are BeautifulSoup entities

Items in resulting dict are BeautifulSoup entities, not primitives as user may think after looking on examples.
I'm using this lib with multiprocessing. And I found out that results cannot be pickled by multiprocessing's map (RecursionError). I propose converting values in resulting dict into true primitives.

problem installing with install-option

Hello,
trying to install the package:
pip install play-scraper --install-option="--prefix=/airflow"

is causing this error:

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-3jPoyb/play-scraper/setup.py", line 10, in
with open('README.md', 'r', 'utf-8') as f:
File "/usr/lib64/python2.7/codecs.py", line 881, in open
file = builtin.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'README.md'

how can we solve?

P.S. installing without the option works fine

thanks,
Lorenzo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.