Comments (4)
It looks as though something unusual is happening as a result of the inverted commas. If you have a look at the Google ngram viewer page linked to below, you'll see the same result as the ngramr code generates.
Note that this chart is case sensitive, so will not include the variants with the i's capitalised.
I'm not sure exactly what is happening with the ngram chart you've created directly in the Google viewer, but I note there are warning messages displayed, including
Replaced "international order" with " international order " to match how we processed the books.
Also, the frequencies in the inverted comma chart are far lower (by two orders of magnitude) than in the chart without inverted commas, so it looks as though it's missing a lot of cases. I would therefore suggest that the results you are getting from the ngramr
code are in fact more accurate. To ensure that you get a case insensitive search you can use the parameters case_ins=TRUE
and aggregate=TRUE
(without the latter the data will split, for example, 'international institutions' and 'International institutions' separately).
As an aside, the sample code is a little long-winded and you can instead use something like this:
ggram(c("international order", "international institutions", "international regimes"),
year_start=1900, year_end=2019, smoothing=2, case_ins=TRUE, aggregate=TRUE)
or
data_long <- ngram(c("international order", "international institutions", "international regimes"),
year_start=1900, year_end=2019, smoothing=2, case_ins=TRUE, aggregate=TRUE)
ggram(data_long)
While that doesn't completely clarify what is going on, with any luck this enough to keep you going. Let me know how you go.
from ngramr.
Let me take a look and get back to you...
from ngramr.
Ah, fantastic, thanks so much. I had no idea that the ngram interface was so fragile. Noted for future reference, and thanks for a really cool and easy-to-use package (easier than the ngram viewer itself, it turns out....)
from ngramr.
No problem. Happy to help! I hadn't realised this particular peculiarity myself. I'm also conscious that the fragility of the interface can translate to fragility of the package since it just scrapes calls to the web page.
This comparison highlights more clearly the difference between searches with and without inverted commas:
from ngramr.
Related Issues (20)
- Install instructions should read install_github("seancarmody/ngramr") HOT 1
- Error on All Queries - "CHAR() can only be applied to a 'CHARSXP', not to a 'NULL' HOT 15
- ngramw doesn't work HOT 1
- 2012 and 2019 corpus counts the same HOT 1
- ngram case-sensitive output HOT 1
- Has the wildcard syntax changed? HOT 5
- Error in open.connection(x, "rb") HOT 4
- Could not get ng <- ngram(c("hacker", "programmer"), year_start = 1950) example to work HOT 4
- Counts question HOT 10
- problem with stringr HOT 9
- Not getting an answer from google HOT 5
- Ngram Viewer not working HOT 2
- ngramr not working behind corporate proxy
- ngramr error in r: 'subscript out of bounds' or 'NULL' in environment HOT 6
- Apparent new problem. HOT 5
- NGRAM fails to install osx 12.6 HOT 3
- "Error parsing ngram data, please contact package maintainer." HOT 4
- ngramr not working HOT 3
- Recent versions of ngramr failing simple example test HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ngramr.