Giter VIP home page Giter VIP logo

pubmex's Introduction

Pubmex

(tested on macOS and Linux)

tag PyPI version

pubmex.py is a script to get a fancy paper title based on given DOI or PMID (can be also combined with macOS Finder)

Format of the title:

a first author . a last author - (title("dotted") or your customed title) . PMID . journal . year . pdf
e.g.
  Kelley.Scott.The.evolution.biology.shift.towards.engineering.prediction-generating.tools.away.traditional.research.practice.EMBORep.2008.pdf

Nowadays, it’s not a big issue, with all Mendeley and other tools, however...

I don’t want to put any PDF file collected on the way into my library, because then it gets super big (and then it’s hard to sync it for example with Dropbox). So now I can keep these PDF files into pdf-icebox and re-name them niecely automatically:

$ ls
Hnisz.Sharp.Phase.Separation.Model.Transcriptional.Control.Cell.2017.pdf
Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf

Usage:

$ pubmex.py sharp2017.pdf
Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf
mv sharp2017.pdf --> ./Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf

$ pubmex.py Query.Konarska.pdf
mv Query.Konarska.pdf --> ./Smith.Konarska."Nought.may.endure.but.mutability".spliceosome.dynamics.regulation.splicing.MolCell.2008.pdf
    
$ pubmex.py eabc9191.full.pdf
mv  eabc9191.full.pdf --> ./Balas.Johnson.Establishing.RNA-RNA.interactions.remodels.lncRNA.structure.promotes.PRC2.activity.SciAdv.2021.pdf

Tricks

Copy paste the DOI into a filename and run pubmex.py:

$ pubmex.py 10.1038:s41587-022-01432-w.pdf
filename: .......... 10.1038:s41587-022-01432-w.pdf
mv  10.1038:s41587-022-01432-w.pdf -->Chowdhury.AlQuraishi-Single-sequence.protein.structure.prediction.using.language.model.deep.learning-NatBiotechnol-2022.pdf

or PMID:

   $ pubmex.py 35439059.pdf
   filename: .......... 35439059.pdf
   mv  35439059.pdf --> ./Vicens.Kieft-Thoughts.how.think.and.talk.about.RNA.structure-ProcNatlAcadSciUSA-2022.pdf

Install

pubmex.py is depended on:

you can install them with:

# Ubuntu (Debian-based system)
apt-get install xclip python-biopython pdftotext
# macOS
brew install poppler biopython # or "sudo port install poppler biopython"
# or biopython with conda, `conda install -c conda-forge biopython`

and then:

pip install pubmex

or get pubmex (the latest, version from this Github repository)

pip install -e git+http://github.com/mmagnus/pubmex.git#egg=pubmex

History

  • 1.4 Add osx-automator
  • 1.3 Fixed #4 #5
  • 1.2 Fixed #2
  • 1.1 Simplify input, pubmex.py *.pdf
  • 1.0 With recent bugfixes 2021
  • 0.3 OSX installation
  • 0.2 Small changes
  • 0.1 Init version in 2010! :-)

Alternatives

pubmex's People

Contributors

mmagnus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pubmex's Issues

Not found in PubMed, although DOI (.ORG/10.1016/J.BBAGRM.2015.08.009) was detected

magnus@maximus:~/Desktop/pdfs$ pubmex.py -a -r -f 1-s2.0-S1874939915001868-main.pdf
ERROR: Not found in PubMed, although DOI (.ORG/10.1016/J.BBAGRM.2015.08.009) was detected in the pdf!
Traceback (most recent call last):
File "/home/magnus/bin/pubmex.py", line 472, in
main()
File "/home/magnus/bin/pubmex.py", line 451, in main
title = get_title_auto_from_text(text, OPTIONS.debug, False, OPTIONS.keywords)
File "/home/magnus/bin/pubmex.py", line 239, in get_title_auto_from_text
return get_title_via_doi(doi, debug, reference, customed_title)
File "/home/magnus/bin/pubmex.py", line 359, in get_title_via_doi
pmid = get_pmid_via_doi_net(doi)
File "/home/magnus/bin/pubmex.py", line 333, in get_pmid_via_doi_net
return get_value('citation_pmid', content)
TypeError: get_value() takes exactly 3 arguments (2 given)

Invalid git clone (edit: on windows machines)

The colon in 'demo/10.1261:rna.418407.pdf' causes problems in cloning from windows machines.

Cloning into 'pubmex'...
remote: Enumerating objects: 426, done.
remote: Counting objects: 100% (9/9), done.
remote: Total 426 (delta 8), reused 8 (delta 8), pack-reused 417 eceiving obj
Receiving objects: 100% (426/426), 3.79 MiB | 2.86 MiB/s, done.
Resolving deltas: 100% (252/252), done.
error: invalid path 'demo/10.1261:rna.418407.pdf'
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

some problem when I removed some prints to make the script quite

(py37) [mx] d$ pubmex -p 10.1016/j.molcel.2020.11.004
(py37) [mx] d$ pubmex -p 10.1016/j.molcel.2020.11.004 -d
doi: ............... 10.1016/j.molcel.2020.11.004
IdList.............. ['33259809']
pmid: .............. 33259809
summary_dict........ {'Item': [], 'Id': '33259809', 'PubDate': '2020 Dec 17', 'EPubDate': '2020 Nov 5', 'Source': 'Mol Cell', 'AuthorList': ['Ziv O', 'Price J', 'Shalamova L', 'Kamenova T', 'Goodfellow I', 'Weber F', 'Miska EA'], 'LastAuthor': 'Miska EA', 'Title': 'The Short- and Long-Range RNA-RNA Interactome of SARS-CoV-2.', 'Volume': '80', 'Issue': '6', 'Pages': '1067-1077.e5', 'LangList': ['English'], 'NlmUniqueID': '9802571', 'ISSN': '1097-2765', 'ESSN': '1097-4164', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed - indexed for MEDLINE', 'PubStatus': 'ppublish+epublish', 'ArticleIds': {'pubmed': ['33259809'], 'medline': [], 'pii': 'S1097-2765(20)30782-6', 'doi': '10.1016/j.molcel.2020.11.004', 'pmc': 'PMC7643667', 'rid': '33259809', 'eid': '33259809', 'pmcid': 'pmc-id: PMC7643667;'}, 'DOI': '10.1016/j.molcel.2020.11.004', 'History': {'pubmed': ['2020/12/02 06:00'], 'medline': ['2021/01/12 06:00'], 'received': '2020/07/20 00:00', 'revised': '2020/10/05 00:00', 'accepted': '2020/10/29 00:00', 'entrez': '2020/12/01 20:08'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(10, attributes={}), 'FullJournalName': 'Molecular cell', 'ELocationID': 'doi: 10.1016/j.molcel.2020.11.004', 'SO': '2020 Dec 17;80(6):1067-1077.e5'}
Ziv.Miska.The.Short-Long-Range.RNA-RNA.Interactome.SARS-CoV-2.MolCell.2020.pdf

gkz1184.pdf

(py37) [mx] rna$ pubmex.py gkz1184.pdf --debug
filename: .......... gkz1184.pdf
filename: .......... gkz1184.pdf
doi: ............... gkz1184
IdList.............. []
pmid: .............. False
ERROR: 		Not found in PubMed, although DOI (gkz1184) was detected in the pdf!
generate ./temp.....[OK]
out:
err:
temp is going to be opened
doi_line: .......... 11641174 NUCLEIC ACIDS RESEARCH, 2020, VOL. 48, NO. 3 DOI: 10.1093/NAR/GKZ1184
doi is found: ...... 10.1093/NAR/GKZ1184
doi: ............... 10.1093/NAR/GKZ1184
IdList.............. ['31889193']
pmid: .............. 31889193
summary_dict........ {'Item': [], 'Id': '31889193', 'PubDate': '2020 Feb 20', 'EPubDate': '', 'Source': 'Nucleic Acids Res', 'AuthorList': ['Reißer S', 'Zucchelli S', 'Gustincich S', 'Bussi G'], 'LastAuthor': 'Bussi G', 'Title': 'Conformational ensembles of an RNA hairpin using molecular dynamics and sparse NMR data.', 'Volume': '48', 'Issue': '3', 'Pages': '1164-1174', 'LangList': ['English'], 'NlmUniqueID': '0411011', 'ISSN': '0305-1048', 'ESSN': '1362-4962', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed - indexed for MEDLINE', 'PubStatus': 'ppublish', 'ArticleIds': {'pubmed': ['31889193'], 'medline': [], 'pii': '5691221', 'doi': '10.1093/nar/gkz1184', 'pmc': 'PMC7026608', 'rid': '31889193', 'eid': '31889193', 'pmcid': 'pmc-id: PMC7026608;'}, 'DOI': '10.1093/nar/gkz1184', 'History': {'pubmed': ['2020/01/01 06:00'], 'medline': ['2020/03/20 06:00'], 'accepted': '2019/12/09 00:00', 'revised': '2019/12/05 00:00', 'received': '2019/10/14 00:00', 'entrez': '2020/01/01 06:00'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(3, attributes={}), 'FullJournalName': 'Nucleic acids research', 'ELocationID': 'doi: 10.1093/nar/gkz1184', 'SO': '2020 Feb 20;48(3):1164-1174'}
ERROR: 		Problem! The pubmex could not find automatically a title for the pdf file! Sorry!

ct200162x.pdf

(py37) [mx] rna$ pubmex.py ct200162x.pdf --debug
filename: .......... ct200162x.pdf
filename: .......... ct200162x.pdf
doi: ............... ct200162x
IdList.............. []
pmid: .............. False
ERROR: 		Not found in PubMed, although DOI (ct200162x) was detected in the pdf!
generate ./temp.....[OK]
out:
err:
temp is going to be opened
doi_line: .......... DX.DOI.ORG/10.1021/CT200162X | J. CHEM. THEORY COMPUT. 2011, 7, 28862902
doi is found: ...... 10.1021/CT200162X
doi: ............... 10.1021/CT200162X
IdList.............. ['21921995']
pmid: .............. 21921995
summary_dict........ {'Item': [], 'Id': '21921995', 'PubDate': '2011 Sep 13', 'EPubDate': '2011 Aug 2', 'Source': 'J Chem Theory Comput', 'AuthorList': ['Zgarbová M', 'Otyepka M', 'Sponer J', 'Mládek A', 'Banáš P', 'Cheatham TE 3rd', 'Jurečka P'], 'LastAuthor': 'Jurečka P', 'Title': 'Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles.', 'Volume': '7', 'Issue': '9', 'Pages': '2886-2902', 'LangList': ['English'], 'NlmUniqueID': '101232704', 'ISSN': '1549-9618', 'ESSN': '1549-9626', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed', 'PubStatus': 'ppublish+epublish', 'ArticleIds': {'pubmed': ['21921995'], 'medline': [], 'doi': '10.1021/ct200162x', 'pmc': 'PMC3171997', 'rid': '21921995', 'eid': '21921995', 'pmcid': 'pmc-id: PMC3171997;'}, 'DOI': '10.1021/ct200162x', 'History': {'pubmed': ['2011/09/17 06:00'], 'medline': ['2011/09/17 06:01'], 'received': '2011/03/08 00:00', 'entrez': '2011/09/17 06:00'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(242, attributes={}), 'FullJournalName': 'Journal of chemical theory and computation', 'ELocationID': '', 'SO': '2011 Sep 13;7(9):2886-2902'}
ERROR: 		Problem! The pubmex could not find automatically a title for the pdf file! Sorry!

Not found in PubMed, although DOI (Thoughts on how to think (and talk) about RNA structure) was detected in the pdf!

$ pubmex.py Thoughts\ on\ how\ to\ think\ \(and\ talk\)\ about\ RNA\ structure.pdf -d
filename: .......... Thoughts on how to think (and talk) about RNA structure.pdf
filename: .......... Thoughts on how to think (and talk) about RNA structure.pdf
doi: ............... Thoughts on how to think (and talk) about RNA structure
IdList.............. ['35838578', '35439059', '33961772', '23912807']
pmid: .............. False
ERROR: 		Not found in PubMed, although DOI (Thoughts on how to think (and talk) about RNA structure) was detected in the pdf!
generate ./temp.....[OK]
out:
err: /bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `pdftotext Thoughts on how to think (and talk) about RNA structure.pdf temp'

ERROR: pdftotext Thoughts on how to think (and talk) about RNA structure.pdf temp
DOI has *not* been found automatically!
ERROR: 		Problem! The pubmex could not find automatically a title for the pdf file! Sorry!

This one is correct: 35439059

Automator not working

It seems that when using the automator installations that come with the pubmex the pubmex.py can not be found.

    for f in "$@"
    do
        pubmex.py $f
    done

The following error is displayed:

The action “Run Shell Script” encountered an error: “zsh:3: command not found: pubmex.py”

When specifying the direct location of just the pubmex.py file another error occures.

    for f in "$@"
    do
        /users/suntim/miniforge3/bin/pubmex.py $f
    done

The following error is displayed:

The action “Run Shell Script” encountered an error: “”

When specifying the direct location of python and the pubmex.py file another error occures.

    for f in "$@"
    do
        /usr/local/bin/python3 /users/suntim/miniforge3/bin/pubmex.py $f
    done

The following error is displayed:

The action “Run Shell Script” encountered an error: “Traceback (most recent call last): File "/users/suntim/miniforge3/bin/pubmex.py", line 27, in <module> from Bio import Entrez ModuleNotFoundError: No module named 'Bio'”

I have all dependencies installed pip3 install pubmex, pip3 install biopython, brew install poppler. As it says in the readme.md that biopython should be isntalled via brew I assume that was a mistake. I instead installed it via pip3.

The same error messages occure regardless of using the zsh or bash version.

Statistical\ Analysis\ of\ RNA\ Backbone.Torsion.Angles.2006.pdf

Statistical\ Analysis\ of\ RNA\ Backbone.Torsion.Angles.2006.pdf -d
pubmex.py:577 in main()
'-' * 80: '--------------------------------------------------------------------------------'
pubmex.py:579 in main()
filename: ('/Users/magnus/Sync/rna-torsion/Statistical Analysis of RNA '
           'Backbone.Torsion.Angles.2006.pdf')
pubmex.py:478 in get_title_via_doi()- doi: '10.1109/tcbb.2006.13'
pubmex.py:481 in get_title_via_doi()- pmid: '17048391'
pubmex.py:204 in get_title_via_pmid()- pmid: '17048391'
pubmex.py:214 in get_title_via_pmid()
journal: 'IEEE/ACM Trans Comput Biol Bioinform'
Traceback (most recent call last):
  File "/Users/magnus/miniconda3/bin/pubmex.py", line 651, in <module>
    main()
  File "/Users/magnus/miniconda3/bin/pubmex.py", line 585, in main
    title = get_title_via_doi(doi['identifier'], debug=debug, reference='', customed_title='')
  File "/Users/magnus/miniconda3/bin/pubmex.py", line 483, in get_title_via_doi
    return get_title_via_pmid(pmid, debug, reference, customed_title)
  File "/Users/magnus/miniconda3/bin/pubmex.py", line 222, in get_title_via_pmid
    year = year[0]
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.