calpoly-csai / api Goto Github PK
View Code? Open in Web Editor NEWOfficial API for the NIMBUS Voice Assistant accessible via HTTP REST protocol.
Home Page: https://nimbus.api.calpolycsai.com/
License: GNU General Public License v3.0
Official API for the NIMBUS Voice Assistant accessible via HTTP REST protocol.
Home Page: https://nimbus.api.calpolycsai.com/
License: GNU General Public License v3.0
Add API to store scraped faculty/professors following the model of #53 .
Scraped faculty/professors are saved in database.
save_faculty()
function in database_wrapper
to build SQLAlchemy
Faculty
object and save it to the database.Create an SQLAlchemy object to store QuestionAnswerPairs
Entities
folder that defines the QuestionAnswerPairs
objectdatabase_wrapper.py
that allows save_qa_pair
and get_all_qa_pairs
Create a NimbusDatabase function to handle statistical/aggregation questions like the following...
How many sections of CSC 480 are offered this quarter?
How many teachers are interested in Artificial Intelligence?
Examples of aggregations
Commit code to the NimbusMySQLAlchemy
class in database_wrapper.py
that can generally answer any aggregation question
>>> from database_wrapper import NimbusMySQLAlchemy
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
NimbusMySQLAlchemy closed
>>> db.session.query(db.Courses).count()
178
>>> from database_wrapper import NimbusMySQLAlchemy
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
NimbusMySQLAlchemy closed
>>> db.session.query(db.Courses).distinct().count()
178
deptartment, courseNum, units
of any course with the most units?>>> db.session.query(db.Courses.dept).add_column(db.Courses.courseNum).add_column(db.Courses.units).distinct().order_by(db.Courses.units.desc()).first()
('CPE', 494, '6')
deptartment, courseNum, units
of any course with the least units?>>> db.session.query(db.Courses.dept).add_column(db.Courses.courseNum).add_column(db.Courses.units).distinct().order_by(db.Courses.units.asc()).first()
('CPE', 100, '1')
>>> db.session.query(db.Sections.section_name).add_column(db.Sections.instructor).filter(db.Sections.section_name.contains("480")).all()
[('CSC 480_06', 'Kauffman, Daniel Alexander')]
>>> db.session.query(db.Sections.section_name).add_column(db.Sections.instructor).filter(db.Sections.section_name.contains("480")).count()
1
➜ /nimbus git:(mf-pipenv) ✗ vim nimbus.py
➜ /nimbus git:(mf-pipenv) ✗ python nimbus.py
initialized database session
initialized NimbusMySQLAlchemy
[1] 959 segmentation fault python nimbus.py
➜ /nimbus git:(mf-pipenv) ✗ cat nimbus.py | head
from QA import create_qa_mapping, generate_fact_QA
from nimbus_nlp.NIMBUS_NLP import NIMBUS_NLP
➜ /nimbus git:(mf-pipenv) ✗ cat nimbus.py | head
import faulthandler; faulthandler.enable()
from QA import create_qa_mapping, generate_fact_QA
python -Xfaulthandler nimbus.py
No segfault
➜ /nimbus git:(mf-pipenv) ✗ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
VERSION_CODENAME=stretch
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
➜ /nimbus git:(mf-pipenv) ✗ uname -r
4.19.76-linuxkit
➜ /nimbus git:(mf-pipenv) ✗ cat nimbus_nlp/NIMBUS_NLP.py | head
import os
import json
from google.api_core.client_options import ClientOptions
from google.cloud import automl_v1
# Temporary import for the classifier
from nimbus_nlp.question_classifier import QuestionClassifier
➜ /nimbus git:(mf-pipenv) ✗ cat nimbus_nlp/question_classifier.py | head
import re
import nltk
import spacy
import numpy as np
import sklearn.neighbors
import pandas as pd
import json
from nimbus_nlp.save_and_load_model import save_model, load_latest_model, PROJECT_DIR
# TODO: move the functionality in this module into class(es), so that it can be more easily used as a dependency
Full Stacktrace
➜ /nimbus git:(mf-pipenv) ✗ python -Xfaulthandler nimbus.py
initialized database session
initialized NimbusMySQLAlchemy
Fatal Python error: Segmentation fault
Current thread 0x00007fcd99ff3400 (most recent call first):
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 922 in create_module
File "<frozen importlib._bootstrap>", line 571 in module_from_spec
File "<frozen importlib._bootstrap>", line 658 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/site-packages/grpc/__init__.py", line 23 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/config.py", line 23 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/__init__.py", line 16 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 941 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/gapic/auto_ml_client.py", line 25 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/__init__.py", line 23 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1023 in _handle_fromlist
File "/nimbus/nimbus_nlp/NIMBUS_NLP.py", line 6 in <module>
File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 678 in exec_module
File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 971 in _find_and_load
File "nimbus.py", line 4 in <module>
[1] 825 segmentation fault python -Xfaulthandler nimbus.py
Add API to store scraped clubs following the model of #53 .
Scraped clubs are saved in database.
save_club()
function in database_wrapper
to build SQLAlchemy
Club
object and save it to the database.Get to MVP with some confidence that our API endpoints work
1 test case per endpoint (@app.route(...)
within flask_api
Add API to store scraped RasperryPiLogs following the model of #53 .
Useful comment: #53 (comment)
also see #47 which used @app.route
to make an REST endpoint that the RaspberryPi can talk to via simply import requests
Scraped RasperryPiLogs are saved in database.
save_rpi_logs()
function in database_wrapper
to build SQLAlchemy
RasperryPiLogs
object and save it to the database.make functionality which will give the days and times for a course during a given quarter
returns a list of tuples of the days the course happens on and the times it will happen
Add API to store scraped schedules following the model of #53 .
Scraped schedules are saved in database.
save_schedule()
function in database_wrapper
to build SQLAlchemy
Schedules
object and save it to the database.What's an entity?
What's a relation?
The API will be receiving audio from a bunch of devices with different sampling rates. We need to resample the audio so that the mfccs are in a consistent format
Function: resample_audio()
Description: Resample the audio file to adhere to the Nimbus audio sampling standard.
That's the convention for Python
Extra Details
Deploy REST API with an endpoint for Question/Answering pipeline
Create a class for wrapping questions with the required functions/behavior to answer them.
The QA class will accept two functions as part of the constructor: db_query() and format_answer(). db_query will take a dictionary of extracted variables and return a dictionary of relevant data from the database. format_answer will take both dictionaries and create a properly formatted answer string. The QA class will also contain an answer() method which takes an extracted variables dictionary and uses the supplied functions to return a formatted answer string.
An object-oriented model where QA functions as a base class that question/answer mapping classes are extended from was also considered. This functionally-inspired model was chosen to allow flexible reuse of both database access and answer formatting code.
Support progress toward the Implement QA pipeline by creating an endpoint for raw string questions either in request params
or request body
Commit code that does either option 1 or option 2 from below by Jan 19 2020
A URL somewhat similar to this one: http://calpoly-nimbus.com/ask/what%20is%20meaning%20life?
should result in a webpage (HTTP response) containing some kind of answer (e.g “42”)
http://google.com/search?q=what+is+meaning+life?
cURL
command needed&
endpoint: /ask
body:
{
"question": "what is meaning life?"
}
cURL
or Postman
or fancy web app code)##Objective
have a function in database_wrapper.py to get all the fields for a particular club from the SQL database
##Key Result
Commit code to implement the "get_club_properties" function
Describe the bug
Some columns from the Sections, Professors, and Courses tables seem to be better suited to each other. For example, the alias and contact info columns from Sections would be better in Professors, and the type column would be better in Courses.
The current organization doesn't allow us to answer a question like "Is CSC 480 a lab?" and introduces potentially unwanted behavior (like fetching a phone number from a section).
Support progress toward the Report Test Cases... milestone by adding continuous integration of our QA pipeline changes
Additional context
During our meeting with Dr. Khosmood on Jan 17, 2020, @foaad suggested that we create a test suite of standard questions that we run on each iteration of our question-answering pipeline to make sure our code works.
spacy==2.2.2
nltk==3.4.5
google-cloud-automl=0.10.0
sklearn==0.0
google-cloud==0.34.0
google-cloud-speech==1.3.2
Describe the bug
Cursor not aligned on phrases/tokenization page on Firefox V72.0.2 on Windows 10
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Cursor is aligned inside of box
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Smartphone (please complete the following information):
N/A
Additional context
Was not observed in Chrome.
Need to be able to link the professors and courses. Currently, in the database there is no expression of a particular professor teaching a particular course.
Pick one of the strategies below to link courses and professors
Ways to go about this could be:
We're going to convert newly updated audio files into mfccs. These will be stored in the text section of the SQL server. Take the existing mfcc script and add it to the api.
have a function to get a list of the courses which a professor teaches
Commit code to implement the "get_courses_professor_teaches" function
Need to link the Professor and Courses tables first
make functionality which will get the known classes/ office hours for a professor
returns a list of tuples in form (event, start time, end time)
Right now we hit the /new_data/wakeword for both wake word and non wake word recordings - this is not ideal since the URL is not intuitive. Should either be broken up into multiple endpoints (probably better), or renamed to something like audiodata.
Add API to store scraped Calendar following the model of #53 .
Scraped Calendar are saved in database.
save_calendar()
function in database_wrapper
to build SQLAlchemy
Calendar
object and save it to the database.Add API to store scraped courses following the model of #53 .
Scraped courses are saved in database.
save_course()
function in database_wrapper
to build SQLAlchemy
Courses
object and save it to the database.make functionality which will get the prerequisites for a course
returns a list of prerequisites to a given course
Support progress toward the Implement Audio-File-Upload.... by saving the metadata of the files
Commit code that saves the wakeword audio file metadata into the database
metadata = {
"isWakeWord": True,
"firstName": "john",
"lastName": "doe",
"gender": "m",
"noiseLevel": "q",
"location": "here",
"tone": "serious-but-not-really",
"timestamp": 1577077883,
"username": "guest"
"filename": "ww_q_serious-but-not-really_here_m_doe_john_1577077883_guest.wav"
}
csai-scraping uses pipenv and it is just simply intuitive.
Let's set that up here too
It should make issues like #82 less common and easier to resolve/communicate too
Changes needed
Here are 2 nice cheat sheets
Another guide
Pipfiles are toml files
The get_qa_pair
function is vital for
Create this function within the NimbusMySQLAlchemy class
see other methods of that class, which are named get_...
for reference on how to create this function
see the SQLAlchemy documentation on session.query
Be able to identify a club by its name, not just its id
Commit change to the working dev database that has a Name column in the Clubs table
Write a function save_audiofile()
which will send the audio blob to Google Drive storage.
Drive API Resources:
Support the Report Test Coverage... milestone by making writing test cases easier for us
Setup GitHub Actions for automatic type-checking of our codebase on push
and on PR
Good Code Talk at PyCon 2018 by Carl Meyer about how Instagram avoids Python bugs by using pyre for type checking
The official Python type checker is MyPy but somehow Pyre is "faster for large codebases"
Support progress toward the Deploy REST API... by implementing conversation sessions
session_token
and including that session_token
in the response body of each request (see the requests
package in Python)
session_token
session_token
Additional context
Describe the bug
When running Postman tests for "/new_data/wakeword" in flask_api.py
, it encounters error where name key is not found for "form[key]'. Screenshot shows Postman tests and the results.
(CODE REFERENCE )https://github.com/calpoly-csai/api/blob/0d99d7ef62bb899d22ce721af9531c2ed661d9ac/modules/formatters.py#L22
Also encountered "Status 500 Server Overload" when running with Postman tests with "https://calpoly-csai-nimbus.herokuapp.com/..." url.
Add API to store scraped Location following the model of #53 .
Scraped Location are saved in database.
save_location()
function in database_wrapper
to build SQLAlchemy
Location
object and save it to the database.The get_property_from_entity
is effective at finding/filtering for data that contains some string, but it works gets a lot of false positives.
The issue may be that
find_entity_that_contains(entity_string)
Courses
table in MySQLCREATE TABLE `Courses` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dept` varchar(5) DEFAULT NULL,
`courseNum` int(11) DEFAULT NULL,
`termsOffered` set('F','W','SP','SU','TBD') DEFAULT NULL,
`units` varchar(5) DEFAULT NULL,
`courseName` varchar(255) DEFAULT NULL,
`raw_concurrent_text` text,
`raw_recommended_text` text,
`raw_prerequisites_text` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=179 DEFAULT CHARSET=latin1;
INSERT INTO `Courses` (`id`, `dept`, `courseNum`, `termsOffered`, `units`, `courseName`, `raw_prerequisites_text`, `raw_concurrent_text`, `raw_recommended_text`)
VALUES
(2, 'CPE', 101, 'F,W,SP', '4', 'CPE 101. Fundamentals of Computer Science.', 'Appropriate Math Placement Level; or MATH 117 with a grade of C- or better; or MATH 118 with a grade of C- or better; or consent of instructor.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT');
INSERT INTO `Courses` (`id`, `dept`, `courseNum`, `termsOffered`, `units`, `courseName`, `raw_concurrent_text`, `raw_recommended_text`, `raw_prerequisites_text`)
VALUES
(18, 'CPE', 333, 'SP', '4', 'CPE 333. Computer Hardware Architecture and Design.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT', 'CPE 101, CPE 233.'),
(75, 'CSC', 209, 'TBD', '1', 'CSC 209. Problem Solving with Computers.', 'NA_CONCURRENT_TEXT', 'NA_RECOMMENDED_TEXT', 'CSC/CPE 101 or CSC/CPE 108 with a grade of C- or better, or consent of instructor.');
>>> from database_wrapper import NimbusMySQLAlchemy
>>> import Entity
>>> db = NimbusMySQLAlchemy()
initialized database session
initialized NimbusMySQLAlchemy
>>>
>>> # now let's try to answer the question "What is CPE 101?"
>>>
>>> db.get_property_from_entity(
... prop="courseName",
... entity=Entity.Courses.Courses,
... entity_string="CPE 101"
... )
['CPE 101. Fundamentals of Computer Science.', 'CPE 333. Computer Hardware Architecture and Design.', 'CSC 209. Problem Solving with Computers.']
>>>
When calling get_property_from_entity('courseName', Courses, 'CPE 101')
the result perhaps should be only ['CPE 101. Fundamentals of Computer Science.']
However
The results CPE 333...
and CSC 209...
would be correct responses for the question "What are the prerequisites for CPE 101"
CPE 333...
and CSC 209...
seem to correspond to the following SQLCPE 101...
is found byDeploy to Heroku with Google Drive secrets and GCP auth.json because there's quite a few environment variables needed to make the system work.
auth.json
for the nimbus-nlpyaml
file for the google driveconfig.json
for NimbusDatabaseThe Nimbus API system is deployed and running in the cloud.
auth.json
in the event that auth.json
cannot be merged into config.json
config_SAMPLE.json
secrets
in the deployment yml of the .github/workflows
folderSupport progress toward the Deploy...endpoint QA pipeline... by making a function to reduce code repetition (DRY principle) and simplify queries such as select fieldName, someField from Entity join Entity_has_RelatedEntity
Commit code for the `get_property_from_related_entities
Describe alternatives you've considered
We could write functions like get_professor_details
instead, because we have a finite number of entities such that we could enumerate them in one file, and any repeated code could be refactored in the future. It’s almost too complicated and not conducive to an MVP to overly generalize our code before we even write simple examples of said code.
Through the process of writing this GitHub issue, I’ve convinced myself that this issue should not be worked on until after MVP v1.0 is shipped.
Integrate QA.py with NimbusDatabase and flask_api.py
Somehow, the magic of Nimbus will just work. 😄
I'm opening this issue for us to discuss.
Comments are welcome!
NIMBUS_NLP
looks like thisLines 17 to 44 in b5782d8
Question_Classifier
Lines 115 to 117 in b5782d8
QA.py
looks like thisNimbusDatabase
has this functionLines 352 to 394 in 58b0268
flask_api.py
has the /ask
endpointLines 43 to 71 in 58b0268
Make a Python function to answer the question "Who is interested in Artificial Intelligence?"
select firstName, lastName from Professors WHERE researchInterests LIKE "%Artificial Intelligence%";
make functionality which will get the average polyrating for a given professor
returns a tuple of the average polyrating and number of polyratings, or just average polyrating
Describe the bug
sqlalchemy.exc.DataError: (mysql.connector.errors.DataError) 1406 (22001): Data too long for column 'contact_phone' at row 1
[SQL: INSERT INTO `Clubs` (club_name, types, `desc`, contact_email, contact_email_2, contact_person, contact_phone, box, advisor, affiliation) VALUES (%(club_name)s, %(types)s, %(desc)s, %(contact_email)s, %(contact_email_2)s, %(contact_person)s, %(contact_phone)s, %(box)s, %(advisor)s, %(affiliation)s)]
[parameters: {'club_name': 'test_club', 'types': 'Academic, Special Interest', 'desc': 'description', 'contact_email': '[email protected]', 'contact_email_2': '[email protected]', 'contact_person': 'Test Person', 'contact_phone': 15552223232, 'box': 89, 'advisor': 'Test Person', 'affiliation': None}]
(Background on this error at: http://sqlalche.me/e/9h9h)
-->
========================================= 1 failed, 1 passed in 5.70s =========================================
➜ tests git:(mf-patch-and-test) ✗
Description: Creates a string filename that adheres to the Nimbus formatting standard.
Support progress toward the Deploy REST API ... milestone by automating the deployment step of our software development life cycle
Describe alternatives you've considered
alternative is manual deployment, which would be inefficient
Additional context
N/A
Write a generic function that can take in an Entity, a data dictionary, and insert into/update the database
Be able to insert/update any Entity into the database, as long as the data dictionary keys perfectly match the Entity attribute names (except for the PK). The method should also validate the data for us.
This will save a ton of code that, while not necessarily redundant, follows the same general logic. Note: this only works for the Entities that don't need additional data transforms from the data dictionary (ex: will not work for audio data metadata, unless the data dictionary is transformed before passing it in)
make functionality which will get professors who have a given research interests
returns a list or tuple of professors with the given interest
create a function to get the research interests of a given professor
returns a list or tuple of research interests
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.