Giter VIP home page Giter VIP logo

ngrams-viewer's People

Contributors

s-bose7 avatar

Watchers

 avatar

ngrams-viewer's Issues

Implement TimeSeries service

TimeSeries API

A TimeSeries is a special purpose extension of the existing TreeMap class where the key type parameter is always Integer, and the value type parameter is always Double. Each key will correspond to a year, and each value a numerical data point for that year.

For example, the following code would create a TimeSeries and associate the year 1992 with the value 3.6 and 1993 with 9.2.

TimeSeries ts = new TimeSeries();
ts.put(1992,3.6);
ts.put(1993,9.2);

The TimeSeries class provides some additional utility API to the TreeMap class, which it extends. i.e.

plus (TimeSeries ts) which returns the year-wise sum of this TimeSeries with the given TS. If both TimeSeries don't contain any years, return an empty TimeSeries. If one TimeSeries contains a year that the other one doesn't, the returned TimeSeries should store the value from the TimeSeries that contains that year.

public TimeSeries plus(TimeSeries ts);

dividedBy(TimeSeries ts) which returns the quotient of the value for each year this TimeSeries divided by the value for the same year in TS. Should return a new TimeSeries (does not modify this TimeSeries). If TS is missing a year that exists in this TimeSeries, throw an IllegalArgumentException. If TS has a year that is not in this TimeSeries, ignore it.

public TimeSeries dividedBy(TimeSeries ts);

Note:

  • TimeSeries objects should have no instance variables.
  • You may assume that the dividedBy operation never divides by zero.
  • Several methods require that you compare the data of two TimeSeries. You should not have any code which fills in a zero if a year or value is unavailable.

Related to #1

Implement NgramMap service

NgramMap API

The NGramMap class will provide various convenient methods for interacting with Google’s NGrams dataset.

Input File Formats:

The NGram dataset comes in two different file types. The first type is a “words file”. Each line of a words file provides tab separated information about the history of a particular word in English during a given year. i.e.

Word        Year    Occurrence     Sources
airport     2007    175702         32788
airport     2008    173294         31271
request     2005    646179         81592
request     2006    677820         86967
request     2007    697645         92342
request     2008    795265         125775
wandered    2005    83769          32682
wandered    2006    87688          34647
wandered    2007    108634         40101
wandered    2008    171015         64395

The other type of file is a “counts file”. Each line of a counts file provides comma separated information about the total corpus of data available for each calendar year. i.e.

Year, Total words, Total pages, Sources 
1470,    984,         10,         1
1472,    117652,      902,        2
1475,    328918,      1162,       1
1476,    20502,       186,        2
1477,    376341,      2479,       2

Related to #1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.