Giter VIP home page Giter VIP logo

deepcite's People

Contributors

chyan2 avatar chyan214 avatar connorjoleary avatar deadmau6 avatar dependabot[bot] avatar dillonoleary avatar juliajyh avatar noah122 avatar panzey avatar vinay1337 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepcite's Issues

Don't grab DB Pass from Env

This is a security concern because anyone can see this info, shh. We should grab the value from google secrets instead.

When requesting a website, should check for status code

Is your feature request related to a problem? Please describe.
When you parse the results of nested websites the code currently doesn't take into account if the response was something like 404, which means that we may see an error page, but treat is as if the website was returned as expected.

Describe the solution you'd like
When requesting a website, look for a status 200, and if it is something else, don't parse that page.

Describe alternatives you've considered
Give special results for specific codes (404, 403, ...).

Additional context
https://www.reddit.com/r/todayilearned/comments/nz6hl7/til_the_banana_plant_is_a_herb_distantly_related/

Node_modules in extension

Hi, I don't know if you are looking for any help with this project because it appears to be for school, I'd love to help maybe even after the course is done.

Anyways I thought you might wanna know that the extension\node_modules\ exists but the extension\.gitignore excludes this folder. If you want to you can have a global .gitignore and if you want to remove all node_modules you can do node_modules/. Or if you want to remove just a specific directory then you could do test-server/node_modules.gitignore_pattern_format

' breaks db commit

Describe the bug
' mess up the ability to submit claim to postgres

To Reproduce
https://www.reddit.com/r/todayilearned/comments/otef3u/til_leonardo_da_vinci_wrote_all_the_branches_of_a/

Additional context
Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context self.dialect.do_execute( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 350, in execute self._c.execute_unnamed( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 1296, in execute_unnamed self.handle_messages(cursor) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 1463, in handle_messages raise self.error pg8000.exceptions.ProgrammingError: {'S': 'ERROR', 'V': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "s"', 'P': '603', 'F': 'scan.l', 'L': '1149', 'R': 'scanner_yyerror'}

Edge cases in tokenizer.py predict

If the sentence is very small then it may not add legit paragraphs to the predict object. Also, if there are multiple sentences, the one with a link associated would be a better thing to return.

Automatically run and popup info when right click cite

When you right click and cite some selection it only populates the popup. Instead it should pop up the popup automatically, input the text, and hit submit.

It does not look like you can automatically trigger the popup though from the background script, so instead it may be better to convert the popup to an iframe similar to how google keep operates

Source: https://stackoverflow.com/questions/17851700/how-to-open-the-default-popup-from-context-menu-in-a-chrome-extension

Improve error handling

right now this response is not send to the user, nor is any other error
except Exception as e: if check_instance(e): full_pre_json['error'] = str(e) else: link = html_link('https://github.com/connorjoleary/DeepCite/issues') full_pre_json['error'] = str('Error 500: Internal Server Error ' + str(e) + "." + \ new_indention("Please add your error to " + link + " with the corresponding claim and link."))

if we do want to show users errors we need to improve what they say.

Larger tree page

Is your feature request related to a problem? Please describe.
When a website finds a lot of possible sources. The tree it returns get squished at the bottom.

Describe the solution you'd like
A dynamic page size which allows you to scroll left and right

Add option to display tree

We should already have an endpoint setup as well as code to display a tree. They just need to be matched up and they need a button in the UI to show tree.

Corner cases in web scrapper

  • dynamic JS
  • 404 error
  • multiple citations in sentence in wikipedia
  • claim unable to be found in webpage of link.
  • scraper goes to wrong link within a sentence.

Dockerize

Should probably break this up into sub-tasks

Update the README

When installing and running I noticed that some of the readme instructions are a little lacking. For example, the install for word2vec is a little unclear because I noticed that the backend only requires the pretrained model and not the complete code but we can easily expand those instructions with a few extra bullet points on how to download the pretrained model. Also I would like to add a short instruction on how to install on firefox. Could I take this issue?

Configure from environment

There are quite a few global variables that can be established as an environment variable. There should be a single module for the backend that will validate and source global variables for the backend to use.

List index out of range

Describe the bug
The reddit post https://www.reddit.com/r/todayilearned/comments/np87ef/til_of_the_golden_spruce_a_gorgeous_one_of_a_kind/
gives this error

Traceback (most recent call last): File "/app/main.py", line 28, in deep_cite tree = Tree(link, claim) File "/app/tree.py", line 24, in __init__ self.tree_root = Claim(url, claim) File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 266, in parse_child self.create_children(ref2text, scores) File "/app/claim.py", line 146, in create_children self.child.append(Claim(ref2text[words], words, scores[i], (self.height +1), self)) # does ref2text allow for multiple links File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 266, in parse_child self.create_children(ref2text, scores) File "/app/claim.py", line 165, in create_children self.child.append(Claim(ref2text[ref_key], words, scores[i], (self.height +1), self)) File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 187, in parse_child citation = wiki(self.href, self.parent.href) File "/app/wiki_scraper.py", line 26, in wiki target_link = links[linkdict[link] + 1] IndexError: list index out of range

Check DB for duplicate run

Is your feature request related to a problem? Please describe.
If a user runs the same text and link as one which is stored in the db, it should return the already returned result

Describe the solution you'd like
Check against version number and text for a match. The text match should be done after trimming and can be a percent string match.

Popup Error Messages are HTML Instead of Text

When we receive an error message from the server to display in the popup extension, we should be receiving strictly text. Instead, we are receiving formatted HTML. This forces us to use .innerHTML instead of .value or .innerText, opening the gates for potential cross-site scripting attacks. We should only receive text to display to the user so we don't have to use this.

Steps to reproduce:

  1. type invalid claim and link
  2. hit the 'Cite' button

Expected results:

  • The error modal displays, and the text is unformatted.

Actual results:

  • The error modal displays, along with specific formatting inside of the error message field textbox.

Allow lambda to be run locally

In order for people to test their code in its entirety, the lambda must be allowed to be run locally and interact with the model and extension which can already be run locally.

Add Support for Pipenv

Hi, I was just wondering if we can put in a small feature request for those of us that use pipenv. pipenv just generates some extra files like a Pipfile and Pipfile.lock. So I think adding support for pipenv would be as simple as adding these files to the .gitignore so that way pipenv users don't have to worry about accidentally pushing these files. Also can I take this issue?

Text fragment

Create Ability to see Model Changes

In order to reduce prices drastically, we need to use a smaller model. We should set up (in a jupyter notebook probably) the ability to compare how the new model would perform vs the old one based on the runs already submitted by users.

Fix error and note error in RDS

https://www.reddit.com/r/todayilearned/comments/gzwlp6/til_6_years_after_resigning_nixon_testified_on returns
"error":"Unable` to obtain infomation from the website.
but this error is not put into the status code and the original website and claim are not saved in RDS

claim: 6 years after resigning, Nixon testified on behalf of former FBI assistant director Mark Felt at Felt's own trial, and gave money to Felt's defense fund. In 2005 Felt revealed he had been "Deep Throat", Bob Woodward's source while breaking the Watergate scandal that led to Nixon's resignation

Fix lambda timeout

The api gateway on aws has a timeout of 30 sec, but our calls may be longer than that

Convert to serverless

Instead of calling a flask endpoint, the lambda should call another lambda with EFS, call a sagemaker endpoint, or call another api that finds the similarity of two sentences (word2vec)

Use user id instead of Ip address

Is your feature request related to a problem? Please describe.
Ip address is too personal of something to track for what we need it for

Describe the solution you'd like
Store an it and sync it between chrome accounts
chrome.sync.set("uersId") or something like that

Describe alternatives you've considered
None

Additional context
None

Setup CI

Should probably break this up into sub-tasks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.