Giter VIP home page Giter VIP logo

droppdf's People

Contributors

alexeygolovin5587 avatar dependabot[bot] avatar dwhly-proj avatar fchasen avatar genuinebuildmonkey avatar sammartin7787 avatar volt4ire avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

droppdf's Issues

Fix share linking from an annotation

Share hyp.is link no longer takes user back to docdrop, but to YT. I assume this is because of the rel=canonical reference. Should we also add a rel=canonical to the docdrop page before the YT reference? Try an experiment.

Paul said:

it looks like the hyp.is share link picks up the second rel=canonical. If I put the “real” link before the youtube link it continues to pick up the youtube link, if I put it after it appears to pick up the page link (although I do get an error (The following error was encountered while trying to retrieve the URL: http://localhost:8000/video/QOrOYUxzX3o/) which I assume is probably from running a local server instead of a real domain?

Yes, let's change the order to "after" then.

docdrop should refer to Hypothesis without the dot

Hypothesis does not use the dot when writing out its name. The dot is used only in the actual hypothes.is domain (eg, in a URL) and in the graphic wordmark. "Hypothes.is" should be changed to "Hypothesis" on the homepage for docdrop (and any other places the name is used on the site)

image

Docdrop links on https://docdrop.org/fingerprinter/ confusing some LMS users

When you refingerprint a PDF on https://docdrop.org/fingerprinter/ you see the filename, a download link, a link to the file hosted at docdrop.org, and the fingerprint. The link to the hosted file can cause confusion in two different ways for LMS app users.

  1. Currently in the LMS app if the URL you're using has the non-LMS app embedded the two apps "compete" with the non-LMS app appearing much of the time. Better explanation here: hypothesis/lms#433.

  2. LMS users don't always read over the documentation (and don't always know there are two distinct apps) and mistake the docdrop link for the LMS environment.

Possible solutions:

  • Get rid of the docdrop link on the page once refingerprinting is complete
  • Add explainer text so folks clicking the docdrop link know that that link is not appropriate for use in the LMS.

LaTeX in the annotations

Some LaTeX commands do not work in the annotations:

  • $a+b=c$ not rendered (but another option \(a+b=c\) works fine)

  • \(\mathbb R\) does not work

  • \( \# \) does not work

Limit size of pdf to upload?

Hi, I've just encountered the error Request timedout after 120000 seconds when trying to upload a 66.6MB pdf file to docdrop on Chrome browser. Although, the upload duration did not pass 30min.
Sceenshot:
Screenshot 2022-01-31 005101

I'm not sure whether I have passed the file size limit or any other reason?

OS: win 10
browser: chrome v.97

Thanks

Upgrade PDFjs to recent version

Per @robertknight:

I did look into this a while back but it was a bit of a pain in docdrop because there isn't a clean separation of PDF.js from the local modifications that have been made to it. What would make this much easier is to create tools to update PDF.js in that project and then actually use them to update the copy in the repository. Example of how this was done for the h+PDF repo here: hypothesis/pdf.js-hypothes.is#17

Cannot annotate video with greek captions

As reported in an annotation on the landing-page, this greek video fails to annotate although it does seem to have captions, with the typical message:


Our video annotation capability works for YouTube videos that have either human or machine-generated transcripts.

The video you’ve selected does not have one or is not a YouTube video. Please either choose another video or contact the video creator to ask them to request a transcript be generated for this video.

Complete OCR service

Highest priority:

  • Move to docdrop.org/ocr
  • Change title to "DocDrop | OCR"
  • Handle errors gracefully, do not spin forever.
  • Expand limit beyond 12MB?
  • Do not show docdrop link in OCR results

We can initially soft-launch after the above issues are fixed ^^

Then:

Research options for better OCR than tesseract

Tesseract does a poor job at OCR. Most importantly, when using PDFjs to view, often the spacing between words is not properly detected. We need a solution that tends to get the spacing correct.

Host documents using ipfs

Just thought it might be interesting if the PDFs were stored on ipfs, and docdrop was simply "pinning" the files (ie. hosting them). But anyone else could choose to pin them as well, and tie the annotations to the content-addressable hash of the file. Might be interesting to see docdrop be an even thinner layer :)

Disclaimer: haven't implemented anything like this with ipfs, but from my understanding, it should be possible

Implementing video annotation support for Panopto

Panopto.com is a leading video content and "lecture capture" solution for educational institutions. An example of a content page is here: https://na-biz-dev.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=1a90ada5-db37-42fe-b65f-b1ff775fd3bb

Educational institutions often use panopto to record lectures so that students can later replay them, with the spoken lecture (on camera) and the slides synced together. Videos are automatically transcribed.

We'll want to see if we can demonstrate a panopto lecture with Hypothesis annotation overlaid. This will most likely necessitate us deconstructing the elements of the captured content and reassembling them in a format that lends itself to our approach.

We 1) determine if such a "deconstruction" is in fact possible, and
2) sketch out a page layout which will work, allowing both the slides (most important) and the speaker video (less important) to be both visible while the transcript is front and center as it is for our current YT prototype.

Documentation of pieces is here:
https://www.partners.panopto.com/integrate
https://support.panopto.com/s/
https://support.panopto.com/s/article/api-0
https://www.partners.panopto.com/

We also have our own separate instance of panopto that we can develop against, credentials provided separately.

File picker fails intermittently

Using both drag & drop, and click to select, seeing intermittent failures:

  • Drag & drop won't activate (and PDF opens in browser)
  • Clicking to select file does not open file selector

No pattern for failure discernible in limited testing. Tested in Chrome & FF.

Create /privacy page

Google Drive plugins require a /privacy page in order to clear their approval. We need a basic page which mostly pulls from the Hypothesis page as its source.

ePub loading doesn't work (probably due to escaped symbols in bookUrl)

GET https://s3.us-central-1.wasabisys.com/docdrop-annotations-prod/russell_problemsphilosophy_en_2004-qcyf4.epub?AWSAccessKeyId=...&Signature=...=&Expires=... fails with 400 Bad Request. Replacing & with & in URL fixes the issue. It appears to be because of escaping of book_url in epub.html template. I haven't tried running the server locally, but in theory the following change should fix the issue:

   <script>
+  {% autoescape off %}
     window.bookUrl = "{{ book_url }}";
+  {% endautoescape %}
   </script>

https://stackoverflow.com/questions/18345867/how-to-stop-django-template-code-from-escaping

ePUB viewer does not pass from first page

Hello there. (Thanks for this great tool!!)

I have just started playing around with `dropdoc' and when attempting to use it with the epub format I found out that the files load ok. Still, the viewer does not work after that; most importantly, clicking on the next button does nothing, and sometimes, the content completely disappears only to be back with a page reload. I have opened the documents in local readers and they work fine.

I'm attaching the browser console error hoping it helps to debug the issue (I tried both on Firefox and Chromium with the same result).

docdrop.org-1672949714192.log

App wrongly says not logged in

In using dragdrop to access local PDFs in Safari/OSX, the app will suddenly stop allowing annotation, allegedly because the user is not logged in (although this is clearly not the case). The only remedy seems to be to reload the PDF (which opens normally about 8 pages lower down). This seems to happened more frequently as a session goes along.

Add footer to OCR and Fingerprinter

Under header:

Drag and drop an image PDF to add text to it.

This service will detect whether a PDF has selectable text and will OCR (Optical Character Recognition) that PDF if not. It will allow you to force-override any pre-existing OCR text if detected. Learn more about OCR here.

Footer:
This is an experimental public service developed to understand the range of PDFs that users need to OCR, and the quality of results that can be delivered across those diverse examples. Your feedback (on twitter: @dwhly) is essential to its optimum development. The service makes use of a leading open source project called OCRmyPDF, which in turn uses Tesseract. In combination these provide the best quality OCR we are aware of, free or paid. This service is provided for free in the same spirit that the open source maintainers have provided their code for free. The code behind this service is available here: [INSERT LINK]

Last updated: [Date -- pulled from last time code updated]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.