project-6's Introduction

Overview

The task has three parts:

data collection
data exploration/algorithm developmnet
prediction

Collection

In teams, collect at least 300 pages across 3 categories using the Wikipedia API and load these pages into a Postgres database.

You must build a python script that:

will be run via a command line argument
- e.g. ./download #ARGS#
can take a filename for which it will read categories
- e.g. ./download categories.yml
- here categories.yml would look like
```
categories:
  - Machine_learning
  - Business_software
```
can take a category as an argument
- e.g. ./download Machine_learning
loads the returned pages into our shared Postgres database

Search

Individually, perform a search over the data we collected.

You must build a python script that:

returns a text snippet from each of the top five related articles to a search query
- a query could be any string of words
- e.g. ./search top principal component analysis
returns the full text from the top related article with related words colored in red
- e.g. ./search full principal component analysis

Predict

Build a predictive model over your data. When a new article comes along, you must be able to predict the category into which that article should fall.

This section will have two scripts:

a training script, ./train-model, that will train a predictive model over your dataset

a prediction script that takes as argument an article from Wikipedia

e.g.

$ ./predict Random_forest
Predict Category: Machine_learning
Confidence: 0.9

project-6

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

toddmoffett / project-6 Goto Github PK

project-6's Introduction

Overview

Collection

Search

Predict

project-6

project-6

project-6

project-6's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent