connorjoleary / deepcite Goto Github PK
View Code? Open in Web Editor NEWTraversing links to find the deep source of information
License: GNU General Public License v3.0
Traversing links to find the deep source of information
License: GNU General Public License v3.0
This is a security concern because anyone can see this info, shh. We should grab the value from google secrets instead.
Is your feature request related to a problem? Please describe.
When you parse the results of nested websites the code currently doesn't take into account if the response was something like 404, which means that we may see an error page, but treat is as if the website was returned as expected.
Describe the solution you'd like
When requesting a website, look for a status 200, and if it is something else, don't parse that page.
Describe alternatives you've considered
Give special results for specific codes (404, 403, ...).
Additional context
https://www.reddit.com/r/todayilearned/comments/nz6hl7/til_the_banana_plant_is_a_herb_distantly_related/
Hi, I don't know if you are looking for any help with this project because it appears to be for school, I'd love to help maybe even after the course is done.
Anyways I thought you might wanna know that the extension\node_modules\
exists but the extension\.gitignore
excludes this folder. If you want to you can have a global .gitignore
and if you want to remove all node_modules
you can do node_modules/
. Or if you want to remove just a specific directory then you could do test-server/node_modules
.gitignore_pattern_format
Describe the bug
' mess up the ability to submit claim to postgres
To Reproduce
https://www.reddit.com/r/todayilearned/comments/otef3u/til_leonardo_da_vinci_wrote_all_the_branches_of_a/
Additional context
Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context self.dialect.do_execute( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute cursor.execute(statement, parameters) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 350, in execute self._c.execute_unnamed( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 1296, in execute_unnamed self.handle_messages(cursor) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/pg8000/core.py", line 1463, in handle_messages raise self.error pg8000.exceptions.ProgrammingError: {'S': 'ERROR', 'V': 'ERROR', 'C': '42601', 'M': 'syntax error at or near "s"', 'P': '603', 'F': 'scan.l', 'L': '1149', 'R': 'scanner_yyerror'}
If the sentence is very small then it may not add legit paragraphs to the predict object. Also, if there are multiple sentences, the one with a link associated would be a better thing to return.
When you right click and cite some selection it only populates the popup. Instead it should pop up the popup automatically, input the text, and hit submit.
It does not look like you can automatically trigger the popup though from the background script, so instead it may be better to convert the popup to an iframe similar to how google keep operates
right now this response is not send to the user, nor is any other error
except Exception as e: if check_instance(e): full_pre_json['error'] = str(e) else: link = html_link('https://github.com/connorjoleary/DeepCite/issues') full_pre_json['error'] = str('Error 500: Internal Server Error ' + str(e) + "." + \ new_indention("Please add your error to " + link + " with the corresponding claim and link."))
if we do want to show users errors we need to improve what they say.
Is your feature request related to a problem? Please describe.
When a website finds a lot of possible sources. The tree it returns get squished at the bottom.
Describe the solution you'd like
A dynamic page size which allows you to scroll left and right
We should already have an endpoint setup as well as code to display a tree. They just need to be matched up and they need a button in the UI to show tree.
Describe the bug
It didn't even pick up the Wiki page, why?
To Reproduce
https://www.reddit.com/r/todayilearned/comments/nolm6e/til_the_words_female_and_male_are_etymologically/
Most useful on today I learned
Possibly just look at current website
Should probably break this up into sub-tasks
When installing and running I noticed that some of the readme
instructions are a little lacking. For example, the install for word2vec
is a little unclear because I noticed that the backend only requires the pretrained model and not the complete code but we can easily expand those instructions with a few extra bullet points on how to download the pretrained model. Also I would like to add a short instruction on how to install on firefox
. Could I take this issue?
There are quite a few global variables that can be established as an environment variable. There should be a single module for the backend
that will validate and source global variables for the backend
to use.
Describe the bug
The reddit post https://www.reddit.com/r/todayilearned/comments/np87ef/til_of_the_golden_spruce_a_gorgeous_one_of_a_kind/
gives this error
Traceback (most recent call last): File "/app/main.py", line 28, in deep_cite tree = Tree(link, claim) File "/app/tree.py", line 24, in __init__ self.tree_root = Claim(url, claim) File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 266, in parse_child self.create_children(ref2text, scores) File "/app/claim.py", line 146, in create_children self.child.append(Claim(ref2text[words], words, scores[i], (self.height +1), self)) # does ref2text allow for multiple links File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 266, in parse_child self.create_children(ref2text, scores) File "/app/claim.py", line 165, in create_children self.child.append(Claim(ref2text[ref_key], words, scores[i], (self.height +1), self)) File "/app/claim.py", line 57, in __init__ self.parse_child() File "/app/claim.py", line 187, in parse_child citation = wiki(self.href, self.parent.href) File "/app/wiki_scraper.py", line 26, in wiki target_link = links[linkdict[link] + 1] IndexError: list index out of range
Either print the error or print that there were no sources found
When looking at the extensions page chrome://extensions/ there is nothing for deepcite
Something like a check box on the extension that passes a flag to the lambda
Is your feature request related to a problem? Please describe.
If a user runs the same text and link as one which is stored in the db, it should return the already returned result
Describe the solution you'd like
Check against version number and text for a match. The text match should be done after trimming and can be a percent string match.
When we receive an error message from the server to display in the popup extension, we should be receiving strictly text. Instead, we are receiving formatted HTML. This forces us to use .innerHTML instead of .value or .innerText, opening the gates for potential cross-site scripting attacks. We should only receive text to display to the user so we don't have to use this.
Steps to reproduce:
Expected results:
Actual results:
In order for people to test their code in its entirety, the lambda must be allowed to be run locally and interact with the model and extension which can already be run locally.
This can occur in the lambda or the ecr instance (probably cheaper in the lambda though). But cutting out compute time should lower costs.
Backend error, soon to be fixed
Hi, I was just wondering if we can put in a small feature request for those of us that use pipenv
. pipenv
just generates some extra files like a Pipfile
and Pipfile.lock
. So I think adding support for pipenv
would be as simple as adding these files to the .gitignore
so that way pipenv
users don't have to worry about accidentally pushing these files. Also can I take this issue?
Describe the bug
The text fragment link is not selecting the text
To Reproduce
Steps to reproduce the behavior:
Making multiple requests at once causes an error.
Not sure if this is still a relevant issue now that we use gunicorn
.
In order to reduce prices drastically, we need to use a smaller model. We should set up (in a jupyter notebook probably) the ability to compare how the new model would perform vs the old one based on the runs already submitted by users.
Add tests and run them with travis ci.
For versioning I think we should follow semver standards.
https://www.reddit.com/r/todayilearned/comments/gzwlp6/til_6_years_after_resigning_nixon_testified_on
returns
"error":"Unable` to obtain infomation from the website.
but this error is not put into the status code and the original website and claim are not saved in RDS
claim: 6 years after resigning, Nixon testified on behalf of former FBI assistant director Mark Felt at Felt's own trial, and gave money to Felt's defense fund. In 2005 Felt revealed he had been "Deep Throat", Bob Woodward's source while breaking the Watergate scandal that led to Nixon's resignation
Describe the bug
true source not given as an option
To Reproduce
Steps to reproduce the behavior:
https://www.reddit.com/r/todayilearned/comments/n9evzh/til_theres_roughly_100_firefighter_arsonists/
full quote with that website as link
Expected behavior
There is literally a quote in the source, how did deepcite miss this?
The api gateway on aws has a timeout of 30 sec, but our calls may be longer than that
Describe the bug
The repeated url is being shown even when it shouldn't be
To Reproduce
https://www.reddit.com/r/todayilearned/comments/otqoog/til_of_research_during_1950s_allmale_combat/
Expected behavior
The url should not be repeated if it appears on another branch (only if it is in the direct path to root) as the claim of the nodes above should influence the next nodes it finds.
Instead of calling a flask endpoint, the lambda should call another lambda with EFS, call a sagemaker endpoint, or call another api that finds the similarity of two sentences (word2vec)
Is your feature request related to a problem? Please describe.
This program should be able to parse PDF's
Describe the solution you'd like
Should read the text of the PDF and find links in it like beautiful soup does
Additional context
Ex
https://www.reddit.com/r/todayilearned/comments/n8gf4n/til_that_in_1759_arthur_guinness_signed_a_9000/
Is your feature request related to a problem? Please describe.
Ip address is too personal of something to track for what we need it for
Describe the solution you'd like
Store an it and sync it between chrome accounts
chrome.sync.set("uersId") or something like that
Describe alternatives you've considered
None
Additional context
None
Should probably break this up into sub-tasks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.