Giter VIP home page Giter VIP logo

arxiv-sanity-preserver's Introduction

I like deep neural nets.

arxiv-sanity-preserver's People

Contributors

andland avatar carlini avatar cwgreene avatar ecprice avatar edoput avatar gokceneraslan avatar hans avatar helges avatar karpathy avatar kingtaurus avatar lucidrains avatar martinthoma avatar matttrent avatar mlaneuville avatar moredread avatar openai-sys-okta-integration avatar sudoankit avatar tricao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arxiv-sanity-preserver's Issues

cs.IR

Those paper don't appear in arxiv-sanity:

I guess it's because they are listed under cs.IR, which isn't indexed by arxiv-sanity.

This is a bit strange as those papers could have been published under stat.ML or cs.CL.

Do you think cs.IR could be added to arxiv-sanity?

This issue is similar to #39 which seems to be fixed.

Top Hype (Last Year and All Time)

Let's add Top Hype papers for Last Year and All Time

It would be interesting to know how the social media was reacting to papers written last year

Feature request: add/replace abstract with extracted conclusion

Conclusion section is often more compact and to the point.

It would be great to have 2 summary tabs - "abstract" and "conclusion" (where available) for each paper in the list.

Also default view in general preference with an option to view as default "abstract" or "conclusion" as a summary would be nice.

cs.AI

This recent deepmind paper doesn't appear in arxiv-sanity. I believe it's because it's listed under cs.AI, which isn't indexed by arxiv-sanity.

This is a bit strange as it seems relevant to the other categories included in arxiv-sanity. This particular paper could have just as easily been posted in stat.ML or cs.LG.

Issue installing on my local machine

Hi

I would like to add an RSS feed to the most recent papers tab. I was trying to setup on my local machine based on the instruction in the README. It failed when I ran analyze.py

C:\Users\<user>\Desktop\arxiv-sanity-preserver>python analyze.py
Traceback (most recent call last):
  File "analyze.py", line 29, in <module>
    txt = f.read()
  File "C:\Users\<user>\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252
.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2705: cha
racter maps to <undefined>

Any idea how to fix it?

Ravi

Feature request: mobile view

currently its very difficult to view the website on mobile. it would be nice to have mobile view.

great project btw.

Blocked by arxiv?

I'm trying to do a study of texts of ML papers, and am using these scripts to acquire paper texts. After running download_pdfs.py (with about 20000 candidate papers acquired using fetch_papers.py) I was seemingly blocked by arxiv after downloading 1201 papers.
Has anyone experienced this sort of rate-limiting, not during fetch_papers but during download_pdfs? I can't access arxiv at all (including for my regular research), and am wondering whether this sort of blocking goes away after a little while, or if I need to start worrying.
screen shot 2017-05-22 at 12 55 39 pm
Thanks!

System for sharing libraries

This is a great tool, and I was thinking about extending it with paper archives beyond arXiv. But if everyone set up their own version of arxiv-sanity, the benefit of having access to other users' libraries for the recommendation system disappears.

Would you consider a system for exporting the library database? I guess something as simple as providing a regular static dump of DOI sets corresponding to anonymized libraries would be a good start. Then every admin of an arxiv-sanity setup (or other tools for that matter) can benefit from the user base of others.

Feature Request: Expanding to all of arXiv

How feasible would it be to expand to all categories in arXiv?

Per #33, you mention that it's important to keep communities small so that "top papers" are still relevant. Couldn't this still be maintained by having a user specify as part of their account which subcategories they work in? And then top papers for a user would do some sort of cross-category normalization to account for multiple communities of different sizes. Maybe we could also crowdsource clustering of categories into different research areas and have those preset (like it has been done for ML currently).

Would love to see this platform become widely adopted!

Legal?

Here: https://arxiv.org/help/robots
is the "Robots Beware: Indiscriminate automated downloads from this site are not permitted."
This makes Your code doing what is explicitly forbidden by arxiv.

Feature Requests/Suggestions

Some feature requests/suggestions:

  • The ability to add/rename/delete folders in my library.
  • Add personal notes to papers.
    • Similarly, allow user discussions on specific papers (or links to e.g. the reddit discussions)
  • Add links to paper code/GitHub (similar to GitXiv) and other related links (e.g. YouTube demo)

P.S. this is awesome...Many thanks!

Feature Request: Alert for Customized Search Query

Like Google Alert for search result or citation notification in Google Scholar, but basically, user will be able to set alert for their search query, and upon any new submission that matches that search query, it'll shoot a notification/email.

Convert does not work on Mac OS

Amazing work here!

Unfortunately I can't generate the images:

convert: unable to open image `pdf/********.pdf[0-7]`

while [0-7] should not be part of the file name, instead, the index of pages I want. Any hint?

Hosting "fork" for physics categories

I'm started indexing some of the Physics categories. My plan is to cover all of them, but I've started with physics.* and astro-ph.* for now.

The site is currently hosted at http://physics.arxiv-sanity.nolife.de/

I'd be wiling to host it long term, if you want to focus on the already covered categories. Alternatively I could forward the PDFs, thumbnails and extracted texts to you, if you want to incorporate them in your site. What is your plan at the moment?

How do you want to handle domain names for forks? As a sub domain, or should I register a different one?

Newest papers?

Right now the most recent paper from arxiv-sanity is from 11th of april while on arxiv there are several new paper since then.
Is there a problem with the refreshment?

Password reset

Hi Andrej,

Thanks a lot, for your wonderful works and especially your attempt to further democratizing AI.

Quick question: is there any way to reset the password? I looked at the codes and http://www.arxiv-sanity.com/ didn't find any code for that.

Thanks,
Rasool

Use file list instead of database in `analyze.py`

Would you consider a PR to use the list of txt files in analyze.py instead of querying the database? This would make it easier to make use of this script in other contexts. In addition, the script already skips over files when the text doesn't exist anyway. The only way the behaviour should be different is if there are some txt files that were manually placed in the folder for some reason.

UI bug with overlapping "Fork me on github" banner

At certain screen widths the "Fork me on github" banner overlays the paper's PDF button.

Normal case:
screen shot 2017-03-06 at 8 18 37 pm

Problematic case:
screen shot 2017-03-06 at 8 18 45 pm

I'd be happy to help solving if there's interest and no ongoing UI rework already.

how to sign up?

sorry for this silly question but I couldn't find a sign up option.. how can I create an account and log in?

thanks

Would it make sense to add LSA/LDA vec to TF IDF representation?

I was wondering if using both a topic vector (LSA/LDA based, or even paragraph2vec...) plus tf idf would improve results.
Topic vector based score would be added to tf idf based score with a low weight so common words (with high tfidf weight) are very important, but topic would be taken into account to probably affect document order.

What do you think?

arxiv paper ios app

I'm writing an ios app named RSarXiv, it aims to recommend arxiv papers based on user's behavior.
Maybe u can try it.
It's my great honor if you can give me some advices.
You can search rsarxiv to get the app in app store.
thanks a lot

library not showing my saved papers.

I have saved 74 papers. Yes that is a bit much, but not that much. I thought it would help the recommendation algorithm, and also store papers that looked interesting that I might want to read in the future.

Now axiv-sanity refused to show all of my papers. The papers I have saved most recently do not appear on the list at all. Whereas others do appear, but are near the bottom of the list. And scrolling down hits "You hit the limit of number of papers to show is one result. [sic]"

  1. I would like to at least like to be able to see the papers I saved most recently on the top. Being ordered by time saved, makes it much easier to find stuff I saved. It looks like they are instead being ordered by the date of the paper itself.
  2. Be able to see all of my saved papers so that I can use arxiv-sanity as a store for papers I might want to read. I understand limits might need to exist to prevent abuse, but just storing the ids of less than 100 papers shouldn't be that problematic.

I'm wondering if I now have to go through and unsave every paper and go back to using bookmarks or something. But I really like the convenience of arxi-sanity, and the ability to take advantage of it's recommendation algorithm.

Feature suggestion: search by full text similarity

First of all: I love the web app! I had actually built something similar (PubVis), when a reviewer made me aware of the arxiv sanity preserver. One feature that I had implemented and that I think from your setup you could probably easily add as well is a search using full text similarity. The idea here is that when you start drafting a paper, you want to make absolutely sure you didn't miss any essential references. Instead of conducting multiple keyword searches, with the full text search you can just paste your existing abstract (+ other text) and it is transformed into a tf-idf vector and then used to find related papers by computing the cosine similarity to the existing papers.

Can not start webserver. No such table in database.

Hi, sorry for my poor bug report. I'm new with github und such.
I'm trying to use your program with two topics in the astrophysics domain.
Everything processed fine until the webserver-like thing tries to read some tables.

~/arxiv-sanity-preserver ❯❯❯ ./venv/bin/python serve.py --prod
/$HOME/arxiv-sanity-preserver/venv/lib/python2.7/site-packages/flask_limiter/extension.py:124: UserWarning: Use of the default get_ipaddr function is discouraged. Please refer to https://flask-limiter.readthedocs.org/#rate-limit-domain for the recommended configuration
UserWarning
Namespace(num_results=200, port=5000, prod=True)
loading db.p...
loading tfidf_meta.p...
loading sim_dict.p...
loading user_sim.p...
precomputing papers date sorted...
computing top papers...
Traceback (most recent call last):
File "serve.py", line 415, in
top_counts = get_popular()
File "serve.py", line 409, in get_popular
libs = sqldb.execute('''select * from library''').fetchall()
sqlite3.OperationalError: no such table: library

Analyzing uses too much memory

Hi,

I'm not sure if this is normal, but analyzing a corpus of 800MB (ca. 16000 articles) runs out of memory on my machine with 8GB of RAM + 2GB of swap. Can someone with a background in data analysis judge if this is expected?

This might be the main issue for me to scale the database for the physics section of arXiv, as I only have run the analysis on a small portion of it (less than a year for most section, and not all categories that are relevant).

I'll try to profile the memory usage, but I hope the attempt isn't futile. :p

Add SSL/TLS certificate and use secure cookies

Thanks for making this 💪

I think the site would benefit from having security improved. Unfortunately, people have a tendency to re-use passwords, and as of now, the password and the session cookie can be intercepted on the same network and in man-in-the-middle attacks.

Perhaps you can use Certbot (Let's Encrypt) for this?

Would it be useful to have an extension to this project where you can see the ancestors and predecessors of a research paper?

For instance, I'm reading this paper and I see it referred to ideas from previously published papers. I want to put this paper as a child of those research papers and maintain a tree so that I can keep track of the ideas from the paper in a systematic manner to aid my research. In other words, I want to visualize the path of knowledge that flows from one research paper to another.

More than 20 papers in "Most recent"

Papers often arrive in batches, or sometimes I can't check them for a few days. It would be nice to be able to see a chronological list than spans maybe a week.

Switch to Twitter Stream

Hi @karpathy,

Few weeks ago, I forked your code to add twitter trends, I end up with a different architecture (wanted something more robust), anyway, I use postgres and sqlalchemy to record twitter stream.

I just open-sourced it so you can use if you want to use it! It's pretty straightforward if you have a postgres db.
=> https://github.com/BenderV/twitter_stream/tree/arxiv

I also have the same thing (sqlalchemy) set-up for arxiv (authors/papers/tags) if you are interested (I just need to do small work before open-sourcing it).

Stemming, ELK

I notice you started accepting pull requests. I'm adding stemming now. Prepare for you to merge when it's done?

Also I put the data into dockerized ElasticSearch/Kibana (similar to @rsarxiv suggestion). Just several lines of code and you've got a nice Kibana GUI for exploration. Found some interesting insights there. Interested in this as well? But it likes good disks, preferably SSD, for indexing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.