calpoly-csai / api Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 4.0 725 KB

Official API for the NIMBUS Voice Assistant accessible via HTTP REST protocol.

Home Page: https://nimbus.api.calpolycsai.com/

License: GNU General Public License v3.0

Python 95.25% Shell 2.95% Dockerfile 1.74% Makefile 0.06%

csai-nimbus mysql python

api's People

Contributors

Stargazers

Watchers

Forkers

snekiam austinsilveria jonpizza mfekadu

api's Issues

Create NimbusDatabase Entity & Function to store scraped faculty/professors

Objective

Add API to store scraped faculty/professors following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped faculty/professors are saved in database.

Details

Add API to accept a list of scraped faculty/professors
Create save_faculty() function in database_wrapper to build SQLAlchemy Faculty object and save it to the database.

Create an SQLAlchemy object to store QuestionAnswerPairs

Objective

Create an SQLAlchemy object to store QuestionAnswerPairs

Key Result

Commit code to the Entities folder that defines the QuestionAnswerPairs object
Commit code to the database_wrapper.py that allows save_qa_pair and get_all_qa_pairs

Create a NimbusDatabase function to handle statistical/aggregation questions

Objective

Create a NimbusDatabase function to handle statistical/aggregation questions like the following...
How many sections of CSC 480 are offered this quarter?
How many teachers are interested in Artificial Intelligence?

Examples of aggregations

total count ("how many of X?")
unique count ("how many kinds of X?")
date range ("Between what times does X happen?")
earliest date ("When is the earliest section of COURSE?")
latest date ("When is the latest section of COURSE?")
boolean count (total count true / total count false) ("How often is X true?")
boolean satisfiability ("Is X always true?")

Key Result

Commit code to the NimbusMySQLAlchemy class in database_wrapper.py that can generally answer any aggregation question

Details

Relevant Code

How many courses are there in the database? = 178

>>> from database_wrapper import NimbusMySQLAlchemy
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
NimbusMySQLAlchemy closed
>>> db.session.query(db.Courses).count()
178

How many UNIQUE courses are there in the database? = 178

>>> from database_wrapper import NimbusMySQLAlchemy
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
NimbusMySQLAlchemy closed
>>> db.session.query(db.Courses).distinct().count()
178

What is the `deptartment, courseNum, units` of any course with the most units?

>>> db.session.query(db.Courses.dept).add_column(db.Courses.courseNum).add_column(db.Courses.units).distinct().order_by(db.Courses.units.desc()).first()
('CPE', 494, '6')

What is the `deptartment, courseNum, units` of any course with the least units?

>>> db.session.query(db.Courses.dept).add_column(db.Courses.courseNum).add_column(db.Courses.units).distinct().order_by(db.Courses.units.asc()).first()
('CPE', 100, '1')

What are the CSC480 sections?

>>> db.session.query(db.Sections.section_name).add_column(db.Sections.instructor).filter(db.Sections.section_name.contains("480")).all()
[('CSC 480_06', 'Kauffman, Daniel Alexander')]

How many?

>>> db.session.query(db.Sections.section_name).add_column(db.Sections.instructor).filter(db.Sections.section_name.contains("480")).count()
1

Segmentation fault in nimbus.py

Describe the bug

see full stack trace below

➜  /nimbus git:(mf-pipenv) ✗ vim nimbus.py
➜  /nimbus git:(mf-pipenv) ✗ python nimbus.py
initialized database session
initialized NimbusMySQLAlchemy
[1]    959 segmentation fault  python nimbus.py
➜  /nimbus git:(mf-pipenv) ✗ cat nimbus.py | head

from QA import create_qa_mapping, generate_fact_QA
from nimbus_nlp.NIMBUS_NLP import NIMBUS_NLP

To Reproduce Full Stacktrace

I followed the instructions here

➜  /nimbus git:(mf-pipenv) ✗ cat nimbus.py | head
import faulthandler; faulthandler.enable()

from QA import create_qa_mapping, generate_fact_QA

python -Xfaulthandler nimbus.py

Expected behavior

No segfault

Desktop (please complete the following information):

➜  /nimbus git:(mf-pipenv) ✗ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

➜  /nimbus git:(mf-pipenv) ✗ uname -r
4.19.76-linuxkit

Additional context

➜  /nimbus git:(mf-pipenv) ✗ cat nimbus_nlp/NIMBUS_NLP.py | head

import os
import json

from google.api_core.client_options import ClientOptions
from google.cloud import automl_v1

# Temporary import for the classifier
from nimbus_nlp.question_classifier import QuestionClassifier

➜  /nimbus git:(mf-pipenv) ✗ cat nimbus_nlp/question_classifier.py | head
import re
import nltk
import spacy
import numpy as np
import sklearn.neighbors
import pandas as pd
import json
from nimbus_nlp.save_and_load_model import save_model, load_latest_model, PROJECT_DIR

# TODO: move the functionality in this module into class(es), so that it can be more easily used as a dependency

Full Stacktrace

➜  /nimbus git:(mf-pipenv) ✗ python -Xfaulthandler nimbus.py
initialized database session
initialized NimbusMySQLAlchemy
Fatal Python error: Segmentation fault

Current thread 0x00007fcd99ff3400 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 922 in create_module
  File "<frozen importlib._bootstrap>", line 571 in module_from_spec
  File "<frozen importlib._bootstrap>", line 658 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
  File "/usr/local/lib/python3.6/site-packages/grpc/__init__.py", line 23 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/config.py", line 23 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
  File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/__init__.py", line 16 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 941 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/gapic/auto_ml_client.py", line 25 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
  File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/__init__.py", line 23 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
  File "/nimbus/nimbus_nlp/NIMBUS_NLP.py", line 6 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 678 in exec_module
  File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 971 in _find_and_load
  File "nimbus.py", line 4 in <module>
[1]    825 segmentation fault  python -Xfaulthandler nimbus.py

Create NimbusDatabase Entity & Function to store scraped clubs

Objective

Add API to store scraped clubs following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped clubs are saved in database.

Details

Add API to accept a list of scraped clubs
Create save_club() function in database_wrapper to build SQLAlchemy Club object and save it to the database.

Write some simple POSTMAN test cases

Objective

Get to MVP with some confidence that our API endpoints work

Key Result

1 test case per endpoint (@app.route(...) within flask_api

Details

Create API endpoint to store RaspberryPiLogs

Objective

Add API to store scraped RasperryPiLogs following the model of #53 .

Useful comment: #53 (comment)
also see #47 which used @app.route to make an REST endpoint that the RaspberryPi can talk to via simply import requests

Key Result

Scraped RasperryPiLogs are saved in database.

Details

Add API to accept a list of scraped RasperryPiLogs
Create save_rpi_logs() function in database_wrapper to build SQLAlchemy RasperryPiLogs object and save it to the database.

@chidiewenike And Tyler Herzog are good points of contact

create "get_course_schedule" function in database_wrapper.py

@mfekadu

Objective

make functionality which will give the days and times for a course during a given quarter

Key Result

returns a list of tuples of the days the course happens on and the times it will happen

Details

Something to consider is that if a lab to a class is considered a different 'class' than the class itself. We probably want to note the presence of labs for a class

Create NimbusDatabase Entity & Function to store scraped schedules

Objective

Add API to store scraped schedules following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped schedules are saved in database.

Details

Add API to accept a list of scraped schedules
Create save_schedule() function in database_wrapper to build SQLAlchemy Schedules object and save it to the database.

Create wrapper functions for getting entities/relations from the database

What's an entity?

Courses
Professors
Clubs
OfficeHours
...etc...

What's a relation?

Professor_teaches_Course
Professor_has_Office
Professor_has_Office

Resample incoming audio

The API will be receiving audio from a bunch of devices with different sampling rates. We need to resample the audio so that the mfccs are in a consistent format

Function: resample_audio()
Description: Resample the audio file to adhere to the Nimbus audio sampling standard.

Convert all Entity properties to snake case and enforce JSON camelCase convention

That's the convention for Python

Extra Details

will need to also change all occurrences of the properties

QA object for question/answer mapping

Objective

Deploy REST API with an endpoint for Question/Answering pipeline

Key Result

Create a class for wrapping questions with the required functions/behavior to answer them.

Details

The QA class will accept two functions as part of the constructor: db_query() and format_answer(). db_query will take a dictionary of extracted variables and return a dictionary of relevant data from the database. format_answer will take both dictionaries and create a properly formatted answer string. The QA class will also contain an answer() method which takes an extracted variables dictionary and uses the supplied functions to return a formatted answer string.

An object-oriented model where QA functions as a base class that question/answer mapping classes are extended from was also considered. This functionally-inspired model was chosen to allow flexible reuse of both database access and answer formatting code.

Setup GitHub Actions for auto-generating pdoc3

GitHub Actions
pdoc3
- perhaps by looking here

good API docs inspiration

Microsoft's GroupMe API
Google's YouTube API

Create a REST endpoint for text input of QA pipeline

Objective

Support progress toward the Implement QA pipeline by creating an endpoint for raw string questions either in request params or request body

Key Result

Commit code that does either option 1 or option 2 from below by Jan 19 2020

Details

option 1: the google search way

A URL somewhat similar to this one: http://calpoly-nimbus.com/ask/what%20is%20meaning%20life? should result in a webpage (HTTP response) containing some kind of answer (e.g “42”)

example google search

http://google.com/search?q=what+is+meaning+life?

pros

you can query directly from any browser... no fancy cURL command needed

cons

need to escape special characters like &
long URLs

option 2: the JSON way

endpoint: /ask
body:

{
    "question": "what is meaning life?"
}

pros

no worries about special characters

cons

need special tools to perform query (e.g cURL or Postman or fancy web app code)

create "get_club_properties" function in database_wrapper.py

@mfekadu

##Objective
have a function in database_wrapper.py to get all the fields for a particular club from the SQL database

##Key Result
Commit code to implement the "get_club_properties" function

Sections, Professors, and Courses tables contain miscategorized information

Describe the bug
Some columns from the Sections, Professors, and Courses tables seem to be better suited to each other. For example, the alias and contact info columns from Sections would be better in Professors, and the type column would be better in Courses.

The current organization doesn't allow us to answer a question like "Is CSC 480 a lab?" and introduces potentially unwanted behavior (like fetching a phone number from a section).

Setup GitHub Actions for running QA testing suite

Objective

Support progress toward the Report Test Cases... milestone by adding continuous integration of our QA pipeline changes

Key Result

Commit code for a QA testing suite
Set up the GitHub actions for automatically running this test suite
Report test case coverage each time it runs

Details

Additional context
During our meeting with Dr. Khosmood on Jan 17, 2020, @foaad suggested that we create a test suite of standard questions that we run on each iteration of our question-answering pipeline to make sure our code works.

Add to the requirements.txt

spacy==2.2.2
nltk==3.4.5
google-cloud-automl=0.10.0
sklearn==0.0
google-cloud==0.34.0
google-cloud-speech==1.3.2

Phrase Entry UI Glitch

Describe the bug
Cursor not aligned on phrases/tokenization page on Firefox V72.0.2 on Windows 10

To Reproduce
Steps to reproduce the behavior:

Go to 'csai.app'
Click on 'quotes button at the bottom (")'
See misaligned cursor in box

Expected behavior
Cursor is aligned inside of box

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Windows 10
Browser Firefox
Version 72.0.2

Smartphone (please complete the following information):
N/A

Additional context
Was not observed in Chrome.

Effectively express in the database which professors are teaching which classes, and when

@mfekadu

Objective

Need to be able to link the professors and courses. Currently, in the database there is no expression of a particular professor teaching a particular course.

Key Result

Pick one of the strategies below to link courses and professors

Details

Ways to go about this could be:

adding the professor_id to the Courses table
creating a "Teaches" table to have the course id and professor id
to create a full "Schedule" table to include office hours, classes being taught, and all agenda data for a professor

Add mfcc conversion script to the API

We're going to convert newly updated audio files into mfccs. These will be stored in the text section of the SQL server. Take the existing mfcc script and add it to the api.

create "get_courses_professor_teaches" function in database_wrapper.py

@mfekadu

Objective

have a function to get a list of the courses which a professor teaches

Key Result

Commit code to implement the "get_courses_professor_teaches" function

Details

Need to link the Professor and Courses tables first

create "get_professor_schedule" function in database_wrapper.py

@mfekadu

Objective

make functionality which will get the known classes/ office hours for a professor

Key Result

returns a list of tuples in form (event, start time, end time)

Details

This could be made easier if there ends up being a Schedule table added

Refactor API for data upload

Right now we hit the /new_data/wakeword for both wake word and non wake word recordings - this is not ideal since the URL is not intuitive. Should either be broken up into multiple endpoints (probably better), or renamed to something like audiodata.

Setup GitHub actions for running hypothesis and pytest tests

Create Calendar Entity

Objective

Add API to store scraped Calendar following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped Calendar are saved in database.

Details

Add API to accept a list of scraped Calendar
Create save_calendar() function in database_wrapper to build SQLAlchemy Calendar object and save it to the database.

Additional context

Add API endpoint to store scraped courses

Objective

Add API to store scraped courses following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped courses are saved in database.

Details

Add API to accept a list of scraped courses
Create save_course() function in database_wrapper to build SQLAlchemy Courses object and save it to the database.

create "get_course_prerequisites" function in database_wrapper.py

@mfekadu

Objective

make functionality which will get the prerequisites for a course

Key Result

returns a list of prerequisites to a given course

Details

Join the Courses and Prerequisites table
decide an effective way to format the return of classes
decide to give class code and number or name

implement the logic to store the wakeword audio file metadata into the database

Objective

Support progress toward the Implement Audio-File-Upload.... by saving the metadata of the files

Key Result

Commit code that saves the wakeword audio file metadata into the database

Additional Details

    metadata = {
        "isWakeWord": True,
        "firstName": "john",
        "lastName": "doe",
        "gender": "m",
        "noiseLevel": "q",
        "location": "here",
        "tone": "serious-but-not-really",
        "timestamp": 1577077883,
        "username": "guest"
        "filename": "ww_q_serious-but-not-really_here_m_doe_john_1577077883_guest.wav"
    }

add Pipfile and Pipfile.lock for use with pipenv

csai-scraping uses pipenv and it is just simply intuitive.

Let's set that up here too

It should make issues like #82 less common and easier to resolve/communicate too

Changes needed

Pipfile
Pipfile.lock
udpates to the README

Here are 2 nice cheat sheets

Another guide

https://realpython.com/pipenv-guide/

Pipfiles are toml files

so learn toml in Y minutes

Implement get_qa_pair to get Q&A formats

Objective

The get_qa_pair function is vital for

Training the QA model
Performing variable substitution on answer formats

Key Result

Create this function within the NimbusMySQLAlchemy class

Details

see other methods of that class, which are named get_... for reference on how to create this function
see the SQLAlchemy documentation on session.query

Add a "ClubName" column to the Clubs table in dev

@mfekadu

Objective

Be able to identify a club by its name, not just its id

Key Result

Commit change to the working dev database that has a Name column in the Clubs table

Save audio file to the CSAI Google Drive

Write a function save_audiofile() which will send the audio blob to Google Drive storage.

Drive API Resources:

Setup GitHub Actions for auto type checking

Objective

Support the Report Test Coverage... milestone by making writing test cases easier for us

Key Result

Setup GitHub Actions for automatic type-checking of our codebase on push and on PR

Details

Good Code Talk at PyCon 2018 by Carl Meyer about how Instagram avoids Python bugs by using pyre for type checking
The official Python type checker is MyPy but somehow Pyre is "faster for large codebases"

Create a way to save sessions in the QA pipeline

Objective

Support progress toward the Deploy REST API... by implementing conversation sessions

Key Result

Commit code for generating a session_token and including that session_token in the response body of each request (see the requests package in Python)
- @foaad suggested that sessions should expire.
Commit code to capture the metadata for each pass through the QA pipeline (store: question, answer, timestamp, ...)
Commit code to store sessions in the database
Commit code to continue off of an old session by providing the old session_token
Publish documentation for how to use the session_token

Details

Additional context

Don't reinvent the wheel!
https://stackoverflow.com/questions/12737740/python-requests-and-persistent-sessions

key not found in "form[key]" "api/modules/formatters.py" and server overload

Describe the bug
When running Postman tests for "/new_data/wakeword" in flask_api.py, it encounters error where name key is not found for "form[key]'. Screenshot shows Postman tests and the results.

(CODE REFERENCE )https://github.com/calpoly-csai/api/blob/0d99d7ef62bb899d22ce721af9531c2ed661d9ac/modules/formatters.py#L22

Also encountered "Status 500 Server Overload" when running with Postman tests with "https://calpoly-csai-nimbus.herokuapp.com/..." url.

Create Location entity

Objective

Add API to store scraped Location following the model of #53 .

Useful comment: #53 (comment)

Key Result

Scraped Location are saved in database.

Details

Add API to accept a list of scraped Location
Create save_location() function in database_wrapper to build SQLAlchemy Location object and save it to the database.

Additional context

False Positive Results from get_property_from_entity

Describe the bug

The get_property_from_entity is effective at finding/filtering for data that contains some string, but it works gets a lot of false positives.

The issue may be that

the name of the function is wrong.... perhaps consider find_entity_that_contains(entity_string)
something else?

To Reproduce

Pre-condition

Create a `Courses` table in MySQL

CREATE TABLE `Courses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dept` varchar(5) DEFAULT NULL,
  `courseNum` int(11) DEFAULT NULL,
  `termsOffered` set('F','W','SP','SU','TBD') DEFAULT NULL,
  `units` varchar(5) DEFAULT NULL,
  `courseName` varchar(255) DEFAULT NULL,
  `raw_concurrent_text` text,
  `raw_recommended_text` text,
  `raw_prerequisites_text` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=179 DEFAULT CHARSET=latin1;

Populate the table with at least the following

INSERT INTO `Courses` (`id`, `dept`, `courseNum`, `termsOffered`, `units`, `courseName`, `raw_prerequisites_text`, `raw_concurrent_text`, `raw_recommended_text`)
VALUES
	(2, 'CPE', 101, 'F,W,SP', '4', 'CPE 101. Fundamentals of Computer Science.', 'Appropriate Math Placement Level; or MATH 117 with a grade of C- or better; or MATH 118 with a grade of C- or better; or consent of instructor.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT');

INSERT INTO `Courses` (`id`, `dept`, `courseNum`, `termsOffered`, `units`, `courseName`, `raw_concurrent_text`, `raw_recommended_text`, `raw_prerequisites_text`)
VALUES
	(18, 'CPE', 333, 'SP', '4', 'CPE 333. Computer Hardware Architecture and Design.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT', 'CPE 101, CPE 233.'),
	(75, 'CSC', 209, 'TBD', '1', 'CSC 209. Problem Solving with Computers.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT', 'CSC/CPE 101 or CSC/CPE 108 with a grade of C- or better, or consent of instructor.');

Steps

>>> from database_wrapper import NimbusMySQLAlchemy
>>> import Entity
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
>>>
>>> # now let's try to answer the question "What is CPE 101?"
>>>
>>> db.get_property_from_entity(
...     prop="courseName",
...     entity=Entity.Courses.Courses,
...     entity_string="CPE 101"
... )
['CPE 101. Fundamentals of Computer Science.', 'CPE 333. Computer Hardware Architecture and Design.', 'CSC 209. Problem Solving with Computers.']
>>>

Expected behavior

When calling get_property_from_entity('courseName', Courses, 'CPE 101') the result perhaps should be only ['CPE 101. Fundamentals of Computer Science.']

However
The results CPE 333... and CSC 209... would be correct responses for the question "What are the prerequisites for CPE 101"

Additional context

The results `CPE 333...` and `CSC 209...` seem to correspond to the following SQL

while `CPE 101...` is found by

Deploy to Heroku with Google Drive secrets and GCP auth.json

Objective

Deploy to Heroku with Google Drive secrets and GCP auth.json because there's quite a few environment variables needed to make the system work.

auth.json for the nimbus-nlp
a special yaml file for the google drive
config.json for NimbusDatabase

Key Result

The Nimbus API system is deployed and running in the cloud.

Additional Details

We may also need

Code similar to this that can automatically generate the auth.json in the event that auth.json cannot be merged into config.json
To update the config_SAMPLE.json
To use PyYAML
to reference the secrets in the deployment yml of the .github/workflows folder

Implement get_property_from_related_entities

Objective

Support progress toward the Deploy...endpoint QA pipeline... by making a function to reduce code repetition (DRY principle) and simplify queries such as select fieldName, someField from Entity join Entity_has_RelatedEntity

Key Result

Commit code for the `get_property_from_related_entities

Details

Describe alternatives you've considered
We could write functions like get_professor_details instead, because we have a finite number of entities such that we could enumerate them in one file, and any repeated code could be refactored in the future. It’s almost too complicated and not conducive to an MVP to overly generalize our code before we even write simple examples of said code.

Through the process of writing this GitHub issue, I’ve convinced myself that this issue should not be worked on until after MVP v1.0 is shipped.

Integrate NIMBUS_NLP and QA.py and NimbusDatabase and flask_api.py

Objective

Integrate QA.py with NimbusDatabase and flask_api.py

Key Result

Somehow, the magic of Nimbus will just work. 😄

I'm opening this issue for us to discuss.

Comments are welcome!

Details

Additional context

`NIMBUS_NLP` looks like this

api/nimbus-nlp/NIMBUS_NLP.py

Lines 17 to 44 in b5782d8

 class NIMBUS_NLP: 

 @staticmethod 

 def predict_question(input_question): 

 ''' 

  Runs through variable extraction and the question classifier to 

  predict the intended question. 

  Args: input_question (string) - user input question to answer 

  Return: return_tuple (tuple) - contains the user's input question, 

  the variable extracted input question, 

  the entity extracted, and the predicted 

  answer 

  ''' 

 variable_extraction = Variable_Extraction() 

 entity, normalized_sentence = variable_extraction.\ 

 extract_variables(input_question) 

 classifier = TrainQuestionClassifier(save_model=False) 

 answer = classifier.classify_question(normalized_sentence) 

 return_tuple = (input_question, normalized_sentence, 

 entity, answer) 

 return return_tuple

There may soon be `Question_Classifier`

api/nimbus-nlp/NIMBUS_NLP.py

Lines 115 to 117 in b5782d8

 #TODO: Add the Question_Classifier code directly into this file 

 class Question_Classifier: 

 pass

`QA.py` looks like this

api/QA.py

Lines 9 to 53 in 58b0268

 class QA: 

 """ 

  A class for wrapping functions used to answer a question. 

  """ 

 def __init__(self, q_format, db, db_query, format_function): 

 """ 

  Args: 

  q_format (str): Question format string 

  db (NimbusDatabase): Object used to access remote database 

  db_query (DB_Query): Function used to get data from database. Takes 

  a dict of extracted variables and returns a dict of variables 

  from the database. 

  format_function (Answer_Formatter): Function used to format answer 

  string. Takes two dicts--one of extracted variables and one of 

  data retrieved from the database--and returns a str. 

  """ 

 self.q_format = q_format 

 self.db = db 

 self.db_query = db_query 

 self.format_function = format_function 

 def _get_data_from_db(self, extracted_vars): 

 return self.db_query(extracted_vars) 

 def _format_answer(self, extracted_vars, db_data): 

 return self.format_function(extracted_vars, db_data) 

 def answer(self, extracted_vars): 

 db_data = self._get_data_from_db(extracted_vars) 

 return self._format_answer(extracted_vars, db_data) 

 def __hash__(self): 

 return hash(self.q_format) 

 def create_qa_mapping(qa_list): 

 """ 

  Creates a dictionary whose values are QA objects and keys are the question 

  formats of those QA objects. 

  Args: 

  qa_list (list(QA)) 

  """ 

 return {qa.q_format: qa for qa in qa_list}

`NimbusDatabase` has this function

api/database_wrapper.py

Lines 352 to 394 in 58b0268

	def get_property_from_entity(
	self, prop: str, entity: UNION_ENTITIES, entity_string: str
	) -> List[UNION_ENTITIES]:
	"""
	This function implements the abstractmethod to get a column of values
	from a NimbusDatabase entity.

	Example:
	>>> db = NimbusMySQLAlchemy()
	>>> db.get_property_from_entity(
	prop="email",
	entity=Entity.Professors.Professors,
	entity_string="Khosmood",
	)
	>>> ["[email protected]"]

	Args:
	prop: ...
	entity: ...
	entity_string: ...

	Returns:
	A list of values for `prop`
	such that the `entity` matches the `entity_string`.

	Raises:
	...
	"""
	# TODO: be smart by check only Professor.firstName Professor.lastName
	# TODO: only check Course.dept, Course.course_num, Course.course_name
	props = []
	for k in entity.__dict__:
	if not k.startswith("_"):
	props.append(entity.__dict__[k])

	results = []
	# FIXME: this is not good querying!
	# TODO: don't be so lazy!
	for p in props:
	query_obj = self.session.query(entity)
	res = query_obj.filter(p.contains(entity_string)).all()
	results += res
	return [x.__dict__.get(prop) for x in results]

`flask_api.py` has the `/ask` endpoint

api/flask_api.py

Lines 43 to 71 in 58b0268

 @app.route('/ask', methods=['POST']) 

 def handle_question(): 

 """ 

  POST (not GET) request because the `question` is submitted 

  and an `answer` is "created." Also, some side-effects on the 

  server are: 

  * storage of the logs of this question-answer-session. 

  """ 

 if request.is_json is False: 

 return "request must be JSON", BAD_REQUEST 

 request_body = request.get_json() 

 question = request_body.get('question', None) 

 if "question" not in request_body: 

 return "request body should include the question", BAD_REQUEST 

 response = { 

 "answer": "answer of <<{}>>".format(question), 

 } 

 if "session" in request_body: 

 response['session'] = request_body["session"] 

 else: 

 response['session'] = generate_session_token() 

 return jsonify(response), SUCCESS

Make a Python function to answer the question "Who is interested in Artificial Intelligence?"

Make a Python function to answer the question "Who is interested in Artificial Intelligence?"

select firstName, lastName from Professors WHERE researchInterests LIKE "%Artificial Intelligence%";

create "get_professor_polyrating" function in database_wrapper.py

@mfekadu

Objective

make functionality which will get the average polyrating for a given professor

Key Result

returns a tuple of the average polyrating and number of polyratings, or just average polyrating

Details

Join Polyrating table and Professors table on professor id
Need to decide for sure if this will just return the single number of average polyrating, or give also the number of ratings.

save_clubs endpoint fails to save a club

Describe the bug

sqlalchemy.exc.DataError: (mysql.connector.errors.DataError) 1406 (22001): Data too long for column 'contact_phone' at row 1
[SQL: INSERT INTO `Clubs` (club_name, types, `desc`, contact_email, contact_email_2, contact_person, contact_phone, box, advisor, affiliation) VALUES (%(club_name)s, %(types)s, %(desc)s, %(contact_email)s, %(contact_email_2)s, %(contact_person)s, %(contact_phone)s, %(box)s, %(advisor)s, %(affiliation)s)]
[parameters: {'club_name': 'test_club', 'types': 'Academic, Special Interest', 'desc': 'description', 'contact_email': '[email protected]', 'contact_email_2': '[email protected]', 'contact_person': 'Test Person', 'contact_phone': 15552223232, 'box': 89, 'advisor': 'Test Person', 'affiliation': None}]
(Background on this error at: http://sqlalche.me/e/9h9h)

-->

========================================= 1 failed, 1 passed in 5.70s =========================================
➜  tests git:(mf-patch-and-test) ✗

discovered by new tests from #80

Implement create_filename

Description: Creates a string filename that adheres to the Nimbus formatting standard.

Setup GitHub Actions for deployment

Objective

Support progress toward the Deploy REST API ... milestone by automating the deployment step of our software development life cycle

Key Results

Commit a GitHub action that deploys our dev branch to Heroku by Jan 19 2020
Commit a GitHub action that deploys our master branch to Google Cloud Platform (or CSL server) by Jan 19 2020 this might be hard and it might be easier to manually deploy master branch else we could keep it all on Heroku

Details

Describe alternatives you've considered
alternative is manual deployment, which would be inefficient

Additional context
N/A

Generic function to save Entity to database

Objective

Write a generic function that can take in an Entity, a data dictionary, and insert into/update the database

Key Result

Be able to insert/update any Entity into the database, as long as the data dictionary keys perfectly match the Entity attribute names (except for the PK). The method should also validate the data for us.

Details

This will save a ton of code that, while not necessarily redundant, follows the same general logic. Note: this only works for the Entities that don't need additional data transforms from the data dictionary (ex: will not work for audio data metadata, unless the data dictionary is transformed before passing it in)

create "get_professors_with_interest" function in database_wrapper.py

@mfekadu

Objective

make functionality which will get professors who have a given research interests

Key Result

returns a list or tuple of professors with the given interest

Details

Join ResearchInterests and Professors table
Get a selection of all professors who have the input research interest

create "get_professor_research_interests" function in database_wrapper.py

@mfekadu

Objective

create a function to get the research interests of a given professor

Key Result

returns a list or tuple of research interests

Details

Join the ResearchInterests and Professors tables
Get all the research interests of a professor who is given

	class NIMBUS_NLP:

	@staticmethod
	def predict_question(input_question):
	'''
	Runs through variable extraction and the question classifier to
	predict the intended question.

	Args: input_question (string) - user input question to answer

	Return: return_tuple (tuple) - contains the user's input question,
	the variable extracted input question,
	the entity extracted, and the predicted
	answer

	'''

	variable_extraction = Variable_Extraction()
	entity, normalized_sentence = variable_extraction.\
	extract_variables(input_question)

	classifier = TrainQuestionClassifier(save_model=False)
	answer = classifier.classify_question(normalized_sentence)

	return_tuple = (input_question, normalized_sentence,
	entity, answer)

	return return_tuple

	#TODO: Add the Question_Classifier code directly into this file
	class Question_Classifier:
	pass

	class QA:
	"""
	A class for wrapping functions used to answer a question.
	"""

	def __init__(self, q_format, db, db_query, format_function):
	"""
	Args:
	q_format (str): Question format string
	db (NimbusDatabase): Object used to access remote database
	db_query (DB_Query): Function used to get data from database. Takes
	a dict of extracted variables and returns a dict of variables
	from the database.
	format_function (Answer_Formatter): Function used to format answer
	string. Takes two dicts--one of extracted variables and one of
	data retrieved from the database--and returns a str.
	"""
	self.q_format = q_format
	self.db = db
	self.db_query = db_query
	self.format_function = format_function

	def _get_data_from_db(self, extracted_vars):
	return self.db_query(extracted_vars)

	def _format_answer(self, extracted_vars, db_data):
	return self.format_function(extracted_vars, db_data)

	def answer(self, extracted_vars):
	db_data = self._get_data_from_db(extracted_vars)
	return self._format_answer(extracted_vars, db_data)

	def __hash__(self):
	return hash(self.q_format)


	def create_qa_mapping(qa_list):
	"""
	Creates a dictionary whose values are QA objects and keys are the question
	formats of those QA objects.

	Args:
	qa_list (list(QA))
	"""
	return {qa.q_format: qa for qa in qa_list}

	@app.route('/ask', methods=['POST'])
	def handle_question():
	"""
	POST (not GET) request because the `question` is submitted
	and an `answer` is "created." Also, some side-effects on the
	server are:
	* storage of the logs of this question-answer-session.
	"""

	if request.is_json is False:
	return "request must be JSON", BAD_REQUEST

	request_body = request.get_json()

	question = request_body.get('question', None)

	if "question" not in request_body:
	return "request body should include the question", BAD_REQUEST

	response = {
	"answer": "answer of <<{}>>".format(question),
	}

	if "session" in request_body:
	response['session'] = request_body["session"]
	else:
	response['session'] = generate_session_token()

	return jsonify(response), SUCCESS

calpoly-csai / api Goto Github PK

api's People

Contributors

Stargazers

Watchers

Forkers

api's Issues

Objective

Key Result

Details

Objective

Key Result

Objective

Key Result

Details

Relevant Code

How many courses are there in the database? = 178

How many UNIQUE courses are there in the database? = 178

What is the deptartment, courseNum, units of any course with the most units?

What is the deptartment, courseNum, units of any course with the least units?

What are the CSC480 sections?

How many?

Describe the bug

see full stack trace below

To Reproduce Full Stacktrace

Expected behavior

Desktop (please complete the following information):

Additional context

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

@chidiewenike And Tyler Herzog are good points of contact

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

option 1: the google search way

example google search

pros

cons

option 2: the JSON way

pros

cons

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

Details

Objective

Key Result

What is the `deptartment, courseNum, units` of any course with the most units?

What is the `deptartment, courseNum, units` of any course with the least units?

Create a `Courses` table in MySQL

The results `CPE 333...` and `CSC 209...` seem to correspond to the following SQL

while `CPE 101...` is found by