Giter VIP home page Giter VIP logo

autocomplete's Introduction

Welcome to AutoComplete! AutoComplete aims to provide suggestions to autocomplete user's input.

Table of Contents

Design Details

  • This project is using Kaggle All the news dataset as raw data. The original dataset includes columns such as title, content, year of publication, url, etc. By extracting title and content columns from the news dataset on Kaggle, the program can construct some text corpora as the input of MapReduce job.

  • After processing data, the final output will be saved to the MySQL database. Based on the data saved in the database, The program will be able to provide autocomplete web services as shown in the demo.

  • How to autocomplete? Let's denote user input as I and the phrase following I as F(i). By comparing the number of appearances of IF(0), IF(1), ..., IF(n), we can find which phrase is more likely to appear following user input I. For example: given recent news corpus as president obama, president trump travel ban, president trump immigration, president trump said . If the user input president, then it is more likely they will input trump as the next word rather than obama.

Run Code and Deploy

Database Configurations

Requirements:

Your table should be like:

Field Type Null Key Default Extra
starting_words varchar(3000) YES NULL
following_word varchar(3000) YES NULL
word_count int(11) YES NULL

You need to configurate database related fields in /demo/config.py and model/ModelDriver.java.

Demo

To run demo code, make sure you have install Python3 and the following packages:

flask==1.0.2
pymysql==0.9.2
flask_sqlalchemy==2.3.2
flask_bootstrap==3.3.7.1

Alternatively, you can run the following command to install packages:

$ pip install -r demo/requirements.txt

After all required packages has been intalled, we can run the demo code now:

$ python3 demo/run.py

Hadoop Mapreduce

Requirements:

$ hadoop com.sun.tools.javac.Main [PATH_TO_MODEL_FOLDER]/*.java
$ jar cf model.jar *.class    
$ hadoop jar [PATH_TO_MODEL_FOLDER]/model.jar ModelDriver [PATH_TO_INPUT_FOLDER] [PATH_TO_OUTPUT_FOLDER] [N] [threshold] [topK]

References and Resources

autocomplete's People

Contributors

hk-mp5a3 avatar

Watchers

 avatar

autocomplete's Issues

Error While running hadoop com.sun.tools.javac.Main model/*.java

`Vigneshwaran, [09.02.21 16:56]
vignesh@vignesh-Vostro-3558:~/Downloads/AutoComplete-master/model$ /home/vignesh/Downloads/hadoop/bin/hadoop com.sun.tools.javac.Main ./*.java./DBOutput.java:9: error: package org.apache.hadoop.mapreduce.lib.db does not exist
import org.apache.hadoop.mapreduce.lib.db.DBWritable;
^
./DBOutput.java:23: error: cannot find symbol
public class DBOutput implements Writable, DBWritable {
^
symbol: class DBWritable
./LanguageModel.java:8: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Mapper;
^
./LanguageModel.java:9: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Reducer;
^
./LanguageModel.java:34: error: cannot find symbol
extends Mapper < LongWritable, Text, Text, Text > {
^
symbol: class Mapper
location: class LanguageModel
./LanguageModel.java:56: error: cannot find symbol
public void setup(Context context) {
^
symbol: class Context
location: class StartFollowMapper
./LanguageModel.java:61: error: cannot find symbol
public void map(LongWritable key, Text value, Context context)
^
symbol: class Context
location: class StartFollowMapper
./LanguageModel.java:117: error: cannot find symbol
extends Reducer < Text, Text, DBOutput, NullWritable > {
^
symbol: class Reducer
location: class LanguageModel
./LanguageModel.java:129: error: cannot find symbol
public void setup(Context context) {
^
symbol: class Context
location: class ProbabilityReducer
./LanguageModel.java:135: error: cannot find symbol
Context context
^
symbol: class Context
location: class ProbabilityReducer
./ModelDriver.java:8: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Job;
^
./ModelDriver.java:9: error: package org.apache.hadoop.mapreduce.lib.db does not exist
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
^
./ModelDriver.java:10: error: package org.apache.hadoop.mapreduce.lib.db does not exist
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
^
./ModelDriver.java:11: error: package org.apache.hadoop.mapreduce.lib.input does not exist
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
^
./ModelDriver.java:12: error: package org.apache.hadoop.mapreduce.lib.output does not exist
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
^
./NGramBuilder.java:11: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Mapper;
^
./NGramBuilder.java:12: error: package org.apache.hadoop.mapreduce does not exist
import org.apache.hadoop.mapreduce.Reducer;
^
./NGramBuilder.java:36: error: cannot find symbol
extends Mapper < Object, Text, Text, IntWritable > {
^
symbol: class Mapper
location: class NGramBuilder
./NGramBuilder.java:197: error: cannot find symbol
public void setup(Context context) {
^
symbol: class Context
location: class TokenizerMapper
./NGramBuilder.java:202: error: cannot find symbol
public void map(Object key, Text value, Context context)
^
symbol: class Context
location: class TokenizerMapper
./NGramBuilder.java:260: error: cannot find symbol
extends Reducer < Text, IntWritable, Text, IntWritable > {
^
symbol: class Reducer
location: class NGramBuilder
./NGramBuilder.java:265: error: cannot find symbol
Context context
^

Vigneshwaran, [09.02.21 16:56]
symbol: class Context
location: class IntSumReducer
./DBOutput.java:67: error: method does not override or implement a method from a supertype
@OverRide
^
./DBOutput.java:74: error: method does not override or implement a method from a supertype
@OverRide
^
./LanguageModel.java:55: error: method does not override or implement a method from a supertype
@OverRide
^
./LanguageModel.java:128: error: method does not override or implement a method from a supertype
@OverRide
^
./ModelDriver.java:36: error: cannot find symbol
Job ngramBuilderJob = Job.getInstance(conf1, "N-Gram Builder");
^
symbol: class Job
location: class ModelDriver
./ModelDriver.java:36: error: cannot find symbol
Job ngramBuilderJob = Job.getInstance(conf1, "N-Gram Builder");
^
symbol: variable Job
location: class ModelDriver
./ModelDriver.java:45: error: cannot find symbol
TextInputFormat.setInputPaths(ngramBuilderJob, new Path(args[0]));
^
symbol: variable TextInputFormat
location: class ModelDriver
./ModelDriver.java:46: error: cannot find symbol
TextOutputFormat.setOutputPath(ngramBuilderJob, new Path(args[1]));
^
symbol: variable TextOutputFormat
location: class ModelDriver
./ModelDriver.java:57: error: cannot find symbol
DBConfiguration.configureDB(conf2,
^
symbol: variable DBConfiguration
location: class ModelDriver
./ModelDriver.java:63: error: cannot find symbol
Job languageModelJob = Job.getInstance(conf2, "Language Model");
^
symbol: class Job
location: class ModelDriver
./ModelDriver.java:63: error: cannot find symbol
Job languageModelJob = Job.getInstance(conf2, "Language Model");
^
symbol: variable Job
location: class ModelDriver
./ModelDriver.java:76: error: cannot find symbol
languageModelJob.setInputFormatClass(TextInputFormat.class);
^
symbol: class TextInputFormat
location: class ModelDriver
./ModelDriver.java:77: error: cannot find symbol
languageModelJob.setOutputFormatClass(DBOutputFormat.class);
^
symbol: class DBOutputFormat
location: class ModelDriver
./ModelDriver.java:79: error: cannot find symbol
DBOutputFormat.setOutput(
^
symbol: variable DBOutputFormat
location: class ModelDriver
./ModelDriver.java:88: error: cannot find symbol
TextInputFormat.setInputPaths(languageModelJob, new Path(args[1]));
^
symbol: variable TextInputFormat
location: class ModelDriver
./NGramBuilder.java:196: error: method does not override or implement a method from a supertype
@OverRide
^
38 errors`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.