s-bose7 / ngrams-viewer Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 48 KB

Exploring the history of word usage in English texts with a weighted popularity history plot.

Java 76.84% CSS 4.17% HTML 9.91% JavaScript 9.08%

n-grams popularity-analysis text-corpus

ngrams-viewer's People

Contributors

Watchers

ngrams-viewer's Issues

TimeSeries API

A TimeSeries is a special purpose extension of the existing TreeMap class where the key type parameter is always Integer, and the value type parameter is always Double. Each key will correspond to a year, and each value a numerical data point for that year.

For example, the following code would create a TimeSeries and associate the year 1992 with the value 3.6 and 1993 with 9.2.

TimeSeries ts = new TimeSeries();
ts.put(1992,3.6);
ts.put(1993,9.2);

The TimeSeries class provides some additional utility API to the TreeMap class, which it extends. i.e.

plus (TimeSeries ts) which returns the year-wise sum of this TimeSeries with the given TS. If both TimeSeries don't contain any years, return an empty TimeSeries. If one TimeSeries contains a year that the other one doesn't, the returned TimeSeries should store the value from the TimeSeries that contains that year.

public TimeSeries plus(TimeSeries ts);

dividedBy(TimeSeries ts) which returns the quotient of the value for each year this TimeSeries divided by the value for the same year in TS. Should return a new TimeSeries (does not modify this TimeSeries). If TS is missing a year that exists in this TimeSeries, throw an IllegalArgumentException. If TS has a year that is not in this TimeSeries, ignore it.

public TimeSeries dividedBy(TimeSeries ts);

Note:

TimeSeries objects should have no instance variables.
You may assume that the dividedBy operation never divides by zero.
Several methods require that you compare the data of two TimeSeries. You should not have any code which fills in a zero if a year or value is unavailable.

Related to #1

Allowing users to visualize the relative historical popularity of words

Build a version of this tool that only handles 1grams. Only be able to handle individual words. Unlike Google Ngrams Viewer which handles 2grams and 3grams as well. Which means one can able to visualise words as well as phrases.
Use a small subset (around 300 megabytes) of the full 1grams dataset. The whole dataset can be found here.

Implement NgramMap service

NgramMap API

The NGramMap class will provide various convenient methods for interacting with Google’s NGrams dataset.

Input File Formats:

The NGram dataset comes in two different file types. The first type is a “words file”. Each line of a words file provides tab separated information about the history of a particular word in English during a given year. i.e.

Word        Year    Occurrence     Sources
airport     2007    175702         32788
airport     2008    173294         31271
request     2005    646179         81592
request     2006    677820         86967
request     2007    697645         92342
request     2008    795265         125775
wandered    2005    83769          32682
wandered    2006    87688          34647
wandered    2007    108634         40101
wandered    2008    171015         64395

The other type of file is a “counts file”. Each line of a counts file provides comma separated information about the total corpus of data available for each calendar year. i.e.

Year, Total words, Total pages, Sources 
1470,    984,         10,         1
1472,    117652,      902,        2
1475,    328918,      1162,       1
1476,    20502,       186,        2
1477,    376341,      2479,       2

Related to #1

Implement HistoryHandler

Related to #1

s-bose7 / ngrams-viewer Goto Github PK

ngrams-viewer's People

Contributors

Watchers

ngrams-viewer's Issues

Implement HistoryTextHandler

Implement TimeSeries service

TimeSeries API

Note:

Allowing users to visualize the relative historical popularity of words

Implement NgramMap service

NgramMap API

Input File Formats:

Implement HistoryHandler

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent