Giter VIP home page Giter VIP logo

google_raw_search_estimator's Introduction

Google Raw Search Estimator

Provides an estimate of how many people in the United States search for the provided keyword annually. Estimate is calculated by comparing the relative search frequency of the keyword to search volumes of various standardized tests in the US (GMAT, GRE, LSAT, MCAT, SAT) and their respective search populations (how many people take the exam per year).

Relative search frequency data is from Google Trends via pytrends.

Rationale & Approach

I thought of this project when I discovered Google Trends while studying for the GRE. While useful, Google Trends only provides the relative search frequency for any given keyword (relative to itself, or other keywords it is being compared to). This means that if you are doing market research for say a new product idea such as "dog sweaters", you can only see how search patterns have changed, not the number of people who have actually been searching. I noticed that I had been searching for the GRE relatively often, and thought that given the number of people who took the GRE and the relative search frequency for "GRE" against "dog sweaters", maybe you could even get a rough estimate of how many people are searching for "dog sweaters".

Example:

People Interested in the GRE = Annual Test Takers = 574,677 people

Google Trends Relative Interest ("GRE") = 75

Google Trends Relative Interest ("dog sweaters") = 3

(People Interested in Dog Sweaters)/(People Interested in the GRE) = (Google Trends Relative Interest ("dog sweaters"))/(Google Trends Relative Interest ("GRE"))

People Interested in Dog Sweaters = (3/75) * (547,677 people)

People Interested in Dog Sweaters = 21,907.08 people

Expanding this idea to a combination of other standardized tests (GMAT, GRE, LSAT, MCAT, SAT) and the number of people who are known to take them annually, I made an easy python tool to estimate population sizes for certain keyword searches.

Example

On September 17th, 2020:

search_population = raw_search_estimation("unemployment")
print("Estimated Search Population (Annually): " + str(search_population))

Estimated Search Population (Annually): 21460222.81721452

The estimate of 21.4 million people for the year using this method is remarkably close to the reported peak of 20.5 million people who were unemployed in May due to the COVID-19 crisis according to the Pew Research Center. This suggests this tool may in fact be useful for estimating the number of people searching for given terms and could have high-impact applications in both academic and market research contexts.

Issues & Considerations

"All models are wrong, but some are useful"

Generally, the two variables that contribute to search volume for a given keyword are 1) the number of people who search (population size) and 2) how frequently they search (interest level).

This approach is inherently useful because it does not account for the interest level of

The similarity between how many people are estimated to searching for "unemployment" this year and how many people were actually unemployed in the United States may simply be a coincidence. On the other hand, one could speculate that it may have been accurate because the reference for the estimate (standardized tests) and the query "unemployment" had similar high interest levels.

Future Work

  • Returning a range of values instead of a single value for the estimate
  • Exploring ways to account and quantify interest levels, possibly by finding additional reference data

google_raw_search_estimator's People

Contributors

hansen-han avatar

Stargazers

Zee Agency avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.