Giter VIP home page Giter VIP logo

amphi's People

Contributors

lbrndnr avatar sansavenir avatar

Watchers

 avatar  avatar  avatar

Forkers

sansavenir

amphi's Issues

Feed

There should be a feed with relevant/new papers that could be interesting to the user. No idea how to implement this though. The post order for the feed should probably take the following data points into account:

  • publication date
  • number of likes
  • topic (keywords, ccs)
  • number of reads

This means that first we have to implement things like:

  • post likes
  • mechanism to see different feeds (e.g. all, new, new in AI)
  • user history to count the number of reads of a paper

PDF Reader

It should be super comfy to read a PDF, read the comments alongside and write new ones.

  • Load/Render PDF in a memory efficient way (release pages that are not being displayed, for example)
  • Display comment threads in some way
  • Make it possible to highlight text and comment that section
  • Display cited papers

Profile page

A page that shows the user, liked posts/comments, written posts/comments, publications, collaborators maybe

Database Setup

Eventually, it should be possible to do a fuzzy text search on the contents of all the papers in the db. As far as I understand, this is the perfect use case of NoSQL. However, NoSQL databases are relatively slow at relating data to one another, so loading all the publications of one specific author might be slow.
I'm not sure how to tackle this. Maybe it's possible to use postgres for data like comments/users and mongodb for the papers?

Infrastructure

Once we have a basic web crawler and website running we should deploy this to some service. Dunno which service suits our needs the best. Maybe we're good with just getting a database service first and run the website locally, only. I also heard of fly.io which might be cool.

Web Crawler

The crawler should have the following functionalities:

  • Fetch new articles and crawl through arxiv database (later other providers like pubmed)
  • Extract text, author (name, email, affiliation), publication date, citations, keywords, ccs
  • Save all related DOIs in order to avoid duplicates in the db (providers use different DOI for the "same" paper)
  • Save the entries in the database

I implemented a basic web crawler in js so that we can use pdf.js. This seemed to make it easier to read the pdf in comparison to pdfplumber, for example. Getting a clean copy of the text content is quite difficult, but might not be necessary.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.