Giter VIP home page Giter VIP logo

google-analytics-and-search-console's Introduction

Google-analytics-and-search-console

These script can scrape Google analytics and the Google search console for info about your websites.

Usage

NewDownloads.py

usage: NewDownloads.py [-h] [-t {image,video,web}] [-d DIMENSIONS] [-n NAME]
                       [-g GOOGLEACCOUNT]
                       start_date end_date

positional arguments:
  start_date            start date in format yyyy-mm-dd or 'yesterday'
                        '7DaysAgo'
  end_date              start date in format yyyy-mm-dd or 'today'

optional arguments:
  -h, --help            show this help message and exit
  -t {image,video,web}, --type {image,video,web}
                        Search types for the returned data, default is web
  -d DIMENSIONS, --dimensions DIMENSIONS
                        The dimensions are the left hand side of the table,
                        default is page. Options are date, query, page,
                        country, device. Combine two by specifying -d
                        page,query
  -n NAME, --name NAME  File name for final output, default is search-console-
                        + the current date. You do NOT need to add file
                        extension
  -g GOOGLEACCOUNT, --googleaccount GOOGLEACCOUNT
                        Name of a google account; does not have to literally
                        be the account name but becomes a token to access that
                        particular set of secrets. Client secrets will have to
                        be in this a file that is this string concatenated
                        with client_secret.json. OR if this is the name of a
                        text file then every line in the text file is
                        processed as one user and all results appended
                        together into a file file

GACombined2.py

This script download from Google Analytics but ONLY views which are marked as starred/fav

usage: GACombined2.py [-h] [-f FILTERS] [-d DIMENSIONS] [-m METRICS] [-n NAME] [-t [TEST]] [-g GOOGLEACCOUNT]
                      start_date end_date

positional arguments:
  start_date            start date in format yyyy-mm-dd or 'yesterday' '7DaysAgo'
  end_date              start date in format yyyy-mm-dd or 'today'

optional arguments:
  -h, --help            show this help message and exit
  -f FILTERS, --filters FILTERS
                        Filter, default is 'ga:pageviews>2'
  -d DIMENSIONS, --dimensions DIMENSIONS
                        The dimensions are the left hand side of the table, default is pagePath. YOU HAVE TO
                        ADD 'ga:' before your dimension
  -m METRICS, --metrics METRICS
                        The metrics are the things on the left, default is pageviews. YOU HAVE TO ADD 'ga:'
                        before your metric
  -n NAME, --name NAME  File name for final output, default is analytics- + the current date. You do NOT need
                        to add file extension.
  -t [TEST], --test [TEST]
                        Test option which makes the script output only n results, default is 3.
  -g GOOGLEACCOUNT, --googleaccount GOOGLEACCOUNT
                        Name of a google account; does not have to literally be the account name but becomes
                        a token to access that particular set of secrets. Client secrets will have to be in
                        this a file that is this string concatenated with client_secret.json. OR if this is
                        the name of a text file then every line in the text file is processed as one user and
                        all results appended together into a file file

#pip commands copy and paste these into the terminal

pip install argparse datetime win_unicode_console google-api-python-client pandas openpyxl progress oauth2client httplib2 progress urllib3

You need a Oauth2 account and put clients_secrets.json in same folder as script https://developers.google.com/webmaster-tools/search-console-api-original/v3/quickstart/quickstart-python

If you are using multiple google accounts then for every google account "[email protected]" create a secrets file called [email protected]_secrets.json

For detailed instructions see the file: google-client-secrets-instructions.md

google-analytics-and-search-console's People

Contributors

edwardthelegend avatar raymondlowe avatar

Stargazers

 avatar

Watchers

 avatar  avatar

google-analytics-and-search-console's Issues

Allow from multiple different user IDs

Currently only allows a single user ID, Google account, and when the first run for the first time then prompt for the user and the permission. There is no way to switch to a different user ID.

it would be better if you could specify option in the command line for which Google account you want to run under for this run. it would be even better if you could pass a filename of a text file that contains a list of Google account, and then run the command once each for each of those Google accounts. of course during the first run for a new Google account in will have to prompt for permission, but after that it should save their credentials.

TimeoutError: [WinError 10060] A connection attempt failed

Traceback (most recent call last):
  File "\\dell\business\GitHub\Google-analytics-and-search-console\NewDownloads.py", line 92, in <module>
    results = service.searchanalytics().query(
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\googleapiclient\_helpers.py", line 131, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\googleapiclient\http.py", line 922, in execute
    resp, content = _retry_request(
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\googleapiclient\http.py", line 221, in _retry_request
    raise exception
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\googleapiclient\http.py", line 190, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\oauth2client\transport.py", line 173, in new_request
    resp, content = request(orig_request_method, uri, method, body,
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\oauth2client\transport.py", line 280, in request
    return http_callable(uri, method=method, body=body, headers=headers,
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\httplib2\__init__.py", line 1985, in request
    (response, content) = self._request(
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\httplib2\__init__.py", line 1650, in _request
    (response, content) = self._conn_request(
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\httplib2\__init__.py", line 1589, in _conn_request
    response = conn.getresponse()
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 1347, in getresponse
    response.begin()
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

KeyError: "None of [Index

python GACombined2.py -f "ga:pageviews>0" -d ga:page -m ga:adsenseRevenue,ga:pageviews 2019-07-01 2019-07-30
�[KProcessing |████████████████████████████████| 112/112
Traceback (most recent call last):
  File "GACombined2.py", line 170, in <module>
    combinedDF[splitMetrics] = combinedDF[splitMetrics].apply(pd.to_numeric)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 2934, in __getitem__
    raise_missing=True)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 1354, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "C:\Users\raymo\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\indexing.py", line 1246, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: "None of [Index(['ga:adsenseRevenue', 'ga:pageviews'], dtype='object')] are in the [columns]"

"key1" is a reserved word

Under some circumstances, the output xlsx file has columns named "key1" and "key2" etc. This prevents using Excel pivot DAX as key* is a reserved word. Changing to something else like "keys-1"

Split the get services function into a module

Both scripts, the GSC and the GA one, use the same get services function based on the sample code from Google.

Split this function out into a separate .py file that can be imported as a module.

Therefore upgrades to this function need only to be done once.

Debug levels

implement a -d --debug switch that takes a number such as 0, 1, or 3 to change behaviour to more or less debugging

"options" sheet for GSC downloader

The NewDownload.py script which downloads from Google Search Console needs to output an "options" sheet that details the options of that report.

Same things as the GA script is already done, just need to transfer that functionality.

Number fields are text in xlsx

When opening the xlsx file I find that the number columns are all "Number stored as text". Although it is easy to fix it would be better if the code did this as it has to be done every time.

Automatically make more informative filenames

Make the automatically generated filenames of report .xlsx more informative. Include the options that are in the command line parameters become part of the filename.

instead of:

analytics-2020-02-25-13-16-41.xlsx

automatically create a filename with more useful information

analytics-generated-2020-02-25-13-16-41-date-range-2020-01-01-2020-01-31-days-31-filtered-page-views-gt-2-dimension-pagePath-metric-adsenseRevenue.xlsx

Probably should make it an option as not every time will want all those details.

KeyError: 'siteEntry'

Traceback (most recent call last):
  File "\\dell\business\GitHub\Google-analytics-and-search-console\NewDownloads.py", line 77, in <module>
    bar = IncrementalBar('Processing',max=len(profiles['siteEntry']))
KeyError: 'siteEntry'

Multiple metric doesn't work

`GACombined2.py -m adsenseRevenue,ga:adsenseAdUnitsViewed yesterday today

gives an error

ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Automatically add commonly used columns for GSC output xls

Every time I use the GSC output I create some additional analysis columns, it would save a little time if that was automated by the script:

  • For reports showing "siteUrl" add "root domain" column - "example.com" instead of "https://www.example.com"
  • For reports showing a "key" which is a url show the stripped url - "example.com/path/filename.html" instead of "https://www.example.com/path/filename.html"
  • For reports showing a "query" add "number of words" - "this is a query" = 4

These should be additional columns.

Error sometimes with multiple metrics

Only sometimes. Maybe if a site doesn't have required fields? Probably ok just to skip that item

GACombined2.py -d ga:source -m ga:sessions,ga:goalCompletionsAll

Traceback (most recent call last):
File "\dell\business\GitHub\Google-analytics-and-search-console\GACombined2.py", line 187, in
combinedDF[splitMetrics] = combinedDF[splitMetrics].apply(pd.to_numeric)
File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\frame.py", line 2912, in getitem
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1254, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "C:\Users\raymo\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1298, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['ga:sessions', 'ga:goalCompletionsAll'], dtype='object')] are in the [columns]"

Test-only switch

-t n / --test n where n defaults to 3

To speed up development have a switch that only does a test of the first n number of things.

So for every loop such as loop through profiles or loop through accounts exit the loop after n

Needs installation instructions

Should specify which version of python and packages are required. Preferably just a list of PIP commands that can be cut-and-pasted to set up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.