Giter VIP home page Giter VIP logo

delbot's Introduction

Delbot™

It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.

Check out the demo video.

Chatbots Magazine featured my Delbot article in The Top 100 articles on Chatbots Magazine.

How to Run

  1. Install the required packages.
  2. Open a command prompt and navigate to root folder of project.
  3. Enter python app.py in command prompt to launch web service.
  4. Go to http://localhost:5000 (or whichever IP and port you specified).

Roadmap

  1. Statistical model to determine category such as who, why, what, and when of knowledge questions.
  2. Headlines-only news request.
  3. Better UI!

Index

  1. Introduction
  2. Overview
    1. News
    2. Knowledge
  3. How It Works
    1. News Queries
      1. Parts of speech and tags
      2. Noun chunks
      3. Adpositions? Did you mean prepositions?
      4. Implementation
    2. Knowledge Queries
      1. Parts of speech and tags
      2. Noun chunks
      3. Auxiliary verbs (or their absence)
      4. Implementation
  4. Summarization
  5. Libraries
  6. Web App
  7. Limitations
  8. Conclusion and Future Work
  9. Demo
  10. References and Links

Introduction

Bots remain a hot topic. Everyone is talking about them.

How about building one from scratch? The simple one we will build today will understand and answer questions like:

  • What is the latest news on Star Wars in the New York Times?
  • Who is Donald Trump?
  • Read me the latest on Brexit.
  • What are RDF triples?
  • Who was Joan of Arc?
  • Give me news about the UK government from the Guardian.

Our goal is to code a bot from the ground up and use nature language processing (NLP) while doing so.

In addition, our bot will be voice-enabled and web-based if you complete the web app section as well. The best part is we do not need to do anything fancy for speech recognition and synthesis: we will use a built-in capability of modern web browsers.

Overview

At a high level, we want to be able to understand two broad types of queries. Following is the flowchart.

Delbot flow diagram

News

We might ask for news. E.g.:

What is the latest on Fantastic Beasts in the Guardian?

The bot will query the API of the requested news source (New York Times if none is specified) and summarize the results:

[...] Comparing the first Harry Potter film (2001’s Harry Potter and the Philosopher’s Stone) with the last (2011’s Harry Potter and the Deathly Hallows Part Two) is somewhat akin to comparing Bambi with Reservoir Dogs. We first meet him in 1920s New York – almost 60 years before Harry is even born – where he is [...]
(source: https://www.theguardian.com/books/2016/nov/25/jk-rowling-fantastic-beasts-screenplay)

Knowledge

We might ask a knowledge question. E.g.:

What are RDF triples?

And the bot will answer:

A semantic triple, or simply triple, is the atomic data entity in the Resource Description Framework .\nThis format enables knowledge to be represented in a machine-readable way. Particularly, every part of an RDF triple is individually addressable via unique URIs \u2014 for example, the second statement above might be represented in RDF as http://example.name#BobSmith12 http://xmlns.com/foaf/0.1/knows http://example.name#JohnDoe34.
(source: https://en.wikipedia.org/wiki/Semantic_triple)

How It Works

We define a simple rule to categorize inputs: if the query contains either of the words news or latest, it is a news query. Otherwise, it is a knowledge query.

The predict function of the QueryAnalyzer class is the main entry point for our bot. It performs the above categorization. It calls other functions to

  1. Extract the query and, if applicable, the source from the input
  2. Make necessary API calls
  3. Summarize lengthy content

Finally, it returns the output and a flag indicating if there was any error.

News Queries

We assume input to be of one of the following forms.

What is the latest news on Star Wars in the New York Times?
Read me the latest on Brexit.
Give me news about Marvel Cinematic Universe movies in 2017 from the Guardian.

Parts of speech and tags

Token Give me the latest news on Donald Trump from the New York Times .
POS VERB PRON DET ADJ NOUN ADP PROPN PROPN ADP DET PROPN PROPN PROPN PUNCT
TAG VB PRP DT JJS NN IN NNP NNP IN DT NNP NNP NNP .

Noun chunks

  1. the latest news
  2. Donald Trump
  3. the New York Times

Adpositions? Did you mean prepositions?

There is a pattern in sentences structured as above. And prepositions are key.

The topic of search is between the first and the last prepositions. The requested source is at the end after the last preposition. The last noun chunk is the source.

In case a source is not specified, as in the second example, everything after the first preposition is assumed to be the topic of search.

Adpositions, simply put, are prepositions and postpositions.

In a head-initial language like English, adpositions usually precede the noun phrase. E.g. characters from the Marvel Cinematic Universe. While in a head-final language like Gujarati, adpositions follow the noun phrase. These are postpositions. E.g. માર્વેલ ચલચિત્ર જગત_ના_ પાત્રો, which translates word by word to: Marvel Cinematic Universe of characters.

Implementation

We invoke get_news_tokens from the QueryExtractor class, which extracts the source and the query from the input. Internally, it calls _split_text to extract noun chunks, parts of speech, and the fully parsed text from the input. We lemmatize terms in the query.

Next, we invoke the get_news function using query on one of the Aggregator classes in media_aggregator.py based on the source. This returns a list of news articles that were sent as a response by the news API. We currently support The Guardian API and The New York Times API.

Finally, we pick the first item (by default) from the response list and summarize it using the shorten_news function.

Knowledge Queries

We assume input to be of one of the following forms.

John Deere
Joan of Arc
Who is Donald Trump?
Who was JRR Tolkien?
What is subject predicate object?
Tell me about particle physics.

Parts of speech and tags

Example 1

Token What is an RDF triple ?
POS NOUN VERB DET PROPN NOUN PUNCT
TAG WP VBZ DT NNP NN .

Example 2

Token Tell me about he - man and the masters of the universe .
POS VERB PRON ADP PRON PUNCT NOUN CONJ DET NOUN ADP DET NOUN PUNCT
TAG VB PRP IN PRP HYPH NN CC DT NNS IN DT NN .

Noun chunks

Example 1

  1. What
  2. an RDF triple

Example 2

  1. me
  2. he-man
  3. the masters
  4. the universe

Auxiliary verbs (or their absence)

If we find an auxiliary verb, we treat everything after its first occurrence as the query. Thus, in Example 1, the query is RDF triple.

Otherwise, we treat all noun chunks after the first as the query. Thus, in Example 2, the query is he-man the masters the universe.

Implementation

We invoke get_knowledge_tokens from the QueryExtractor class, which extracts the query.

We pass this to the get_gkg function, which queries the Wikipedia API through the wikipedia Python package and returns a 5-sentence summary of the top result.

Summarization

I used the FrequencySummarizer class from Text summarization with NLTK. Alternatively, you could use sumy.

Libraries

In addition to the packages re, bs4, requests, operator, collections, heapq, string and nltk, we will use the following.

  1. spaCy: Please set it up as given in the Install spaCy docs. spaCy will help us do some quick NLP. We could use NLTK but spaCy will get you going faster. We use spaCy in this project.

  2. Wikipedia: This helps query the Wikipedia API. You can read the docs of the wikipedia Python package here.

  3. Summarizer: The one I used was borrowed from The Glowing Python blog written by JustGlowing. It summarizes lengthy content. Alternatively, you could use sumy.

  4. Flask-RESTful, Flask (Optional): These are for building a web app and operationalizing our bot through a RESTful web service.

Web App (Optional)

We add a cool webpage from which you can fire off voice queries and have the browser read out the response content. We make use of the Web Speech API for this.

Web Service

We get our Flask-based REST web service up and running in under 20 lines of code. The QueryService class handles requests.

As of now, we only need one service call to send input from our web app to our bot. This is done through the post function of the QueryService class. post, in turn, calls the predict function, which is the main entry point as mentioned above.

Web Site

I built a basic webpage to demonstrate the bot. It uses the Web Speech API to receive voice input and read out content. You can find the index.html file in the templates folder. Make sure you have installed all the required packages and libraries, and that the web service is up and running before you open the website.

Limitations

Our simple bot understands a limited range of requests. It cannot understand other kinds of requests such as follows.

  1. Knowledge requests with a different structure
    Explain to me what bootstrap aggregation is.
    Tell me something about computational neuroscience.

  2. News requests with a different structure
    What does the New York Times say about Roger Federer's latest match?
    What's happening in the world of tennis?

  3. Knowledge requests of other types
    How is cheese made?
    Where was JK Rowling born?
    Can we build a sky city on Venus?
    When did the French Revolution take place?
    Why does Jupiter have The Great Red Spot?

  4. Follow-up questions and context
    Explain to me what bootstrap aggregation is.
    and then: How does it relate to random forests?

Understanding what it refers to in the follow-up question comes under what is known as anaphora resolution. It is all a part of understanding context. Different words mean different things in different contexts. While humans have a nuanced understanding of these, it is significantly more difficult to teach machines the same.

Conclusion and Future Work

We achieved our goal of building a bot based on some rules we defined. We also made use of some NLP techniques. Finally, we deployed our bot onto a web application. However, our bot is limited in the kinds of queries it can understand and answer. Why is its scope of understanding so narrow?

In general, making computers really understand language is an AI-hard problem. There is a field known as NLU (Natural Language Understanding) within NLP dedicated to this.

We could implement a machine learning-based solution so our bot could potentially understand a much wider range of requests.

References and Links

  1. Alphabetical list of part-of-speech tags used in the Penn Treebank Project
  2. Stanford typed dependencies manual
  3. Wikipedia articles
    1. Head-directionality parameter
    2. AI-hard
    3. NLU (Natural Language Understanding)
    4. anaphora resolution
    5. prepositions and postpositions
    6. head-initial
  4. Web Speech API
  5. Text summarization with NLTK
  6. New York Times Developer API
  7. The Guardian Open Platform
  8. Quora thread: What makes natural language processing difficult?

Please make sure to read the terms of use of the APIs used here.


Check out the demo [video](https://youtu.be/iVmj1gHOF0w) or read my Delbot article published in [Chatbots Magazine](https://chatbotsmagazine.com/delbot-nlp-python-bot-1a46d865e38b).

delbot's People

Contributors

shaildeliwala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

delbot's Issues

murmurhash.mrmr does not export expected C function hash128_x86

Intenté ejecutar app.py, pero tuve un problema con la importación de QueryService, que depende de SpaCy. Dice que :

(.delbot_env) mike@mike-thinks:~/Programming/delbot$ python2 app.py 
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from resources.query_service import QueryService
    ...
        from .optimizers import Adam, SGD, linear_decay
      File "optimizers.pyx", line 13, in init thinc.neural.optimizers
      File "ops.pyx", line 1, in init thinc.neural.ops
    ImportError: murmurhash.mrmr does not export expected C function hash128_x86

Maybe it is because we're missing the requirements ?

(delbot_env) mike@mike-thinks:~/Programming/delbot$ python2 app.py 
Traceback (most recent call last):
  File "app.py", line 20, in <module>
    from resources.query_service import QueryService
  File "/home/mike/Programming/delbot/resources/query_service.py", line 22, in <module>
    import query_extractor as _qe
  File "/home/mike/Programming/delbot/query_extractor.py", line 20, in <module>
    import spacy as _s
  File "/usr/local/lib/python2.7/dist-packages/spacy/__init__.py", line 4, in <module>
    from .cli.info import info as cli_info
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/__init__.py", line 1, in <module>
    from .download import download
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/download.py", line 10, in <module>
    from .link import link
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/link.py", line 8, in <module>
    from ..compat import symlink_to, path2str
  File "/usr/local/lib/python2.7/dist-packages/spacy/compat.py", line 9, in <module>
    from thinc.neural.util import copy_array
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/__init__.py", line 1, in <module>
    from ._classes.model import Model
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/_classes/model.py", line 12, in <module>
    from ..train import Trainer
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/train.py", line 3, in <module>
    from .optimizers import Adam, SGD, linear_decay
  File "optimizers.pyx", line 13, in init thinc.neural.optimizers
  File "ops.pyx", line 1, in init thinc.neural.ops
ImportError: murmurhash.mrmr does not export expected C function hash128_x86

versions:

OS : Linux 16.04

Python: 2.7.12

Obtuve el mismo error que por encima cuando intento tener información sobre SpaCy:

(.delbot_env) mike@mike-thinks:~/Programming/delbot$ python2 -m spacy info
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/usr/lib/python2.7/runpy.py", line 111, in _get_module_details
    __import__(mod_name)  # Do not catch exceptions initializing package
  File "/usr/local/lib/python2.7/dist-packages/spacy/__init__.py", line 4, in <module>
    from .cli.info import info as cli_info
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/__init__.py", line 1, in <module>
    from .download import download
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/download.py", line 10, in <module>
    from .link import link
  File "/usr/local/lib/python2.7/dist-packages/spacy/cli/link.py", line 8, in <module>
    from ..compat import symlink_to, path2str
  File "/usr/local/lib/python2.7/dist-packages/spacy/compat.py", line 9, in <module>
    from thinc.neural.util import copy_array
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/__init__.py", line 1, in <module>
    from ._classes.model import Model
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/_classes/model.py", line 12, in <module>
    from ..train import Trainer
  File "/usr/local/lib/python2.7/dist-packages/thinc/neural/train.py", line 3, in <module>
    from .optimizers import Adam, SGD, linear_decay
  File "optimizers.pyx", line 13, in init thinc.neural.optimizers
  File "ops.pyx", line 1, in init thinc.neural.ops
ImportError: murmurhash.mrmr does not export expected C function hash128_x86

can't install resources.queryservices

Traceback (most recent call last):
File "app.py", line 20, in
from resources.query_service import QueryService
File "C:\Users\RAMESH\Desktop\project\delbot-master\delbot-master\resources\query_service.py", line 22, in
import query_extractor as _qe
File "C:\Users\RAMESH\Desktop\project\delbot-master\delbot-master\query_extractor.py", line 25, in
_nlp = s.load('en')
File "C:\Users\RAMESH\Anaconda3\envs\py2\lib\site-packages\spacy_init
.py", line 21, in load
return util.load_model(name, **overrides)
File "C:\Users\RAMESH\Anaconda3\envs\py2\lib\site-packages\spacy\util.py", line 119, in load_model
raise IOError(Errors.E050.format(name=name))
IOError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.