droppdf's People
Forkers
fchasen futurepress genuinebuildmonkey hainguyen007 gyuri-lajos deepakkrmnnit zyscqfd youngmany t-abraham gabrielgrant gabrielmacedo surendrasoniblr langnote volt4ire xolotl dltj mduan vitaly-z hbcbh1999droppdf's Issues
Fix share linking from an annotation
Share hyp.is link no longer takes user back to docdrop, but to YT. I assume this is because of the rel=canonical reference. Should we also add a rel=canonical to the docdrop page before the YT reference? Try an experiment.
Paul said:
it looks like the hyp.is share link picks up the second rel=canonical. If I put the “real” link before the youtube link it continues to pick up the youtube link, if I put it after it appears to pick up the page link (although I do get an error (The following error was encountered while trying to retrieve the URL: http://localhost:8000/video/QOrOYUxzX3o/) which I assume is probably from running a local server instead of a real domain?
Yes, let's change the order to "after" then.
Set up staging server for new functionality
Set up a new server, perhaps "demo.docdrop.org" for new functionality under review.
Add quotes around search term at top and bottom of transcript
For instance:
Beginning of search for "officers" (5 matches)
Allow selection across transcript text block on YT annotator without timestamp being included in selection
Provide daily metrics on docdrop -- to slack?
Live site is being run in Debug mode
docdrop should refer to Hypothesis without the dot
Docdrop links on https://docdrop.org/fingerprinter/ confusing some LMS users
When you refingerprint a PDF on https://docdrop.org/fingerprinter/ you see the filename, a download link, a link to the file hosted at docdrop.org, and the fingerprint. The link to the hosted file can cause confusion in two different ways for LMS app users.
-
Currently in the LMS app if the URL you're using has the non-LMS app embedded the two apps "compete" with the non-LMS app appearing much of the time. Better explanation here: hypothesis/lms#433.
-
LMS users don't always read over the documentation (and don't always know there are two distinct apps) and mistake the docdrop link for the LMS environment.
Possible solutions:
- Get rid of the docdrop link on the page once refingerprinting is complete
- Add explainer text so folks clicking the docdrop link know that that link is not appropriate for use in the LMS.
Implement Wasabi
Enable selecting and annotating text across `div.sub` elements
Side-by-side annotation sidebar next to transcript
LaTeX in the annotations
Some LaTeX commands do not work in the annotations:
-
$a+b=c$
not rendered (but another option\(a+b=c\)
works fine) -
\(\mathbb R\)
does not work -
\( \# \)
does not work
Limit size of pdf to upload?
Hi, I've just encountered the error Request timedout after 120000 seconds
when trying to upload a 66.6MB pdf file to docdrop on Chrome browser. Although, the upload duration did not pass 30min.
Sceenshot:
I'm not sure whether I have passed the file size limit or any other reason?
OS: win 10
browser: chrome v.97
Thanks
Upgrade PDFjs to recent version
Per @robertknight:
I did look into this a while back but it was a bit of a pain in docdrop because there isn't a clean separation of PDF.js from the local modifications that have been made to it. What would make this much easier is to create tools to update PDF.js in that project and then actually use them to update the copy in the repository. Example of how this was done for the h+PDF repo here: hypothesis/pdf.js-hypothes.is#17
Add button to [Select PDF] instead of "click to select file" text.
Currently the OCR and fingerprinter service has text that says "or click to select file", please change that to one of our usual buttons that says "Select PDF". I think it's a little more intuitive.
Cannot annotate video with greek captions
As reported in an annotation on the landing-page, this greek video fails to annotate although it does seem to have captions, with the typical message:
Our video annotation capability works for YouTube videos that have either human or machine-generated transcripts.
The video you’ve selected does not have one or is not a YouTube video. Please either choose another video or contact the video creator to ask them to request a transcript be generated for this video.
Add the appropriate metadata so that docdrop youtube pages unfurl w/ thumbnails and text in Slack and elsewhere
Right now if you add a docdrop youtube link to a slack thread, it won't unfurl properly. Let's look at why and see if we can get a thumbnail from the video to unfurl at a minimum.
500 error for youtube video
https://docdrop.org/video/uLVMa88rVmg/
https://www.youtube.com/watch?v=uLVMa88rVmg
Not sure why this is breaking things (it seems someone requested that debug mode be turned off 🙄 ), but maybe could be some difference in the youtube API response format due to this video being a recording of a live-stream?
Create smoother transitions as transcript moves from block to block
Introduce a slight js transition to more smoothly scroll the transcript as the video plays.
Complete OCR service
Highest priority:
- Move to docdrop.org/ocr
- Change title to "DocDrop | OCR"
- Handle errors gracefully, do not spin forever.
- Expand limit beyond 12MB?
- Do not show docdrop link in OCR results
We can initially soft-launch after the above issues are fixed ^^
Then:
- When the PDF is dropped initially on the target, display whether or not a selectable text layer is detected.
- Allow two modes of operation (perhaps through a pulldown menu). (Force re-OCR of entire document, OCR only un-OCR'd pages) (these should be possible based on these comments here: https://www.reddit.com/r/Python/comments/8qvzc8/critique_cli_tool_for_document_to_text_conversion/e0n73bl/?utm_source=reddit&utm_medium=web2x&context=3
- Ensure that this OCR is also implemented on Docdrop homepage
- Show a progress meter like we do on the docdrop homepage
- Investigate autorotation of pages (whole document or page by page?)
- Clean up implementation, [ELABORATE]
Research options for better OCR than tesseract
Tesseract does a poor job at OCR. Most importantly, when using PDFjs to view, often the spacing between words is not properly detected. We need a solution that tends to get the spacing correct.
Host documents using ipfs
Just thought it might be interesting if the PDFs were stored on ipfs, and docdrop was simply "pinning" the files (ie. hosting them). But anyone else could choose to pin them as well, and tie the annotations to the content-addressable hash of the file. Might be interesting to see docdrop be an even thinner layer :)
Disclaimer: haven't implemented anything like this with ipfs, but from my understanding, it should be possible
Implementing video annotation support for Panopto
Panopto.com is a leading video content and "lecture capture" solution for educational institutions. An example of a content page is here: https://na-biz-dev.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1a90ada5-db37-42fe-b65f-b1ff775fd3bb
Educational institutions often use panopto to record lectures so that students can later replay them, with the spoken lecture (on camera) and the slides synced together. Videos are automatically transcribed.
We'll want to see if we can demonstrate a panopto lecture with Hypothesis annotation overlaid. This will most likely necessitate us deconstructing the elements of the captured content and reassembling them in a format that lends itself to our approach.
We 1) determine if such a "deconstruction" is in fact possible, and
2) sketch out a page layout which will work, allowing both the slides (most important) and the speaker video (less important) to be both visible while the transcript is front and center as it is for our current YT prototype.
Documentation of pieces is here:
https://www.partners.panopto.com/integrate
https://support.panopto.com/s/
https://support.panopto.com/s/article/api-0
https://www.partners.panopto.com/
We also have our own separate instance of panopto that we can develop against, credentials provided separately.
File picker fails intermittently
Using both drag & drop, and click to select, seeing intermittent failures:
- Drag & drop won't activate (and PDF opens in browser)
- Clicking to select file does not open file selector
No pattern for failure discernible in limited testing. Tested in Chrome & FF.
Create /privacy page
Google Drive plugins require a /privacy page in order to clear their approval. We need a basic page which mostly pulls from the Hypothesis page as its source.
ePub loading doesn't work (probably due to escaped symbols in bookUrl)
GET https://s3.us-central-1.wasabisys.com/docdrop-annotations-prod/russell_problemsphilosophy_en_2004-qcyf4.epub?AWSAccessKeyId=...&Signature=...=&Expires=...
fails with 400 Bad Request
. Replacing &
with &
in URL fixes the issue. It appears to be because of escaping of book_url
in epub.html
template. I haven't tried running the server locally, but in theory the following change should fix the issue:
<script>
+ {% autoescape off %}
window.bookUrl = "{{ book_url }}";
+ {% endautoescape %}
</script>
https://stackoverflow.com/questions/18345867/how-to-stop-django-template-code-from-escaping
Provide a way to access and explore PDF examples needing OCR
Through Wasabi?
Add support for .ppt .pptx
Create standalone OCR page at docdrop.org/ocr
Enable w/ existing tesseract engine.
Adjust absolute bottom on transcript window
The bottom of the transcript window doesn't yet float at the bottom of the viewport. Can we tweak this a bit?
Resubmit Drive plugin to Google
ePUB viewer does not pass from first page
Hello there. (Thanks for this great tool!!)
I have just started playing around with `dropdoc' and when attempting to use it with the epub format I found out that the files load ok. Still, the viewer does not work after that; most importantly, clicking on the next button does nothing, and sometimes, the content completely disappears only to be back with a page reload. I have opened the documents in local readers and they work fine.
I'm attaching the browser console error hoping it helps to debug the issue (I tried both on Firefox and Chromium with the same result).
Add <- Enter button at right edge of YouTube URL entry field at docdrop.org
For mobile, sometimes it's difficult to submit a URL. Add an enter button you can tap.
When a direct link is taken via an annotation, cue the video to that point
If you follow this link:
https://hyp.is/MKXJlg9aEeurAy8NWpahjQ/docdrop.org/video/XqxwwuUdsp4/
It doesn't cue the video to that point. Could it?
Upgrade to Python 3
App wrongly says not logged in
In using dragdrop to access local PDFs in Safari/OSX, the app will suddenly stop allowing annotation, allegedly because the user is not logged in (although this is clearly not the case). The only remedy seems to be to reload the PDF (which opens normally about 8 pages lower down). This seems to happened more frequently as a session goes along.
Enable Google Analytics
Add footer to OCR and Fingerprinter
Under header:
Drag and drop an image PDF to add text to it.
This service will detect whether a PDF has selectable text and will OCR (Optical Character Recognition) that PDF if not. It will allow you to force-override any pre-existing OCR text if detected. Learn more about OCR here.
Footer:
This is an experimental public service developed to understand the range of PDFs that users need to OCR, and the quality of results that can be delivered across those diverse examples. Your feedback (on twitter: @dwhly) is essential to its optimum development. The service makes use of a leading open source project called OCRmyPDF, which in turn uses Tesseract. In combination these provide the best quality OCR we are aware of, free or paid. This service is provided for free in the same spirit that the open source maintainers have provided their code for free. The code behind this service is available here: [INSERT LINK]
Last updated: [Date -- pulled from last time code updated]
Fix direct linking to PDFs in DocDrop
Swap pdf fingerprint
Allow user to use the fingerprint of one pdf on a different pdf.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.