Giter VIP home page Giter VIP logo

dp_parser's Introduction

dp_parser

This is a library for extracting the date of publication contained in MEDLINE records into a Date object.

The Date of Publication (DP) field contains the full date on which the issue of the journal was published field description. The good news is, the values of that field are in a standardized format. The bad news is, there are twelve different variations!

The variants break down into two categories: absolute dates, and date ranges. Absolute dates describe a specific day, such as 1984 Apr 30. Not very difficult to create a Date object for that!

However, many MEDLINE records do not contain an absolute date, but a date range. For instance, a bi-monthly journal might have a Date of Publication of 1984 Apr-May. A quarterly journal might have a date of publication of 1984 Spring-Summer. A weekly journal might have a date of publication of 1984 Apr 23-30. And so forth. These can be significantly harder to resolve into a Date object.

dp_parser does this for you.

Known Variations of Absolute Dates

Known Variations of Date Ranges

How it Handles Date Ranges

We handle date ranges by assuming the first date we find is the one on which the journal was published. For instance, the May-June issue of a journal comes out in May, right?

This may not necessarily hold true for records that have a Date of Publication like "1995-1997", however the EDAT of this record backs up our assumption, so I'm rolling with it. If you find a MEDLINE record that doesn't conform to these assumptions, please let me know.

How it Handles Seasons

Seasons present a particular challenge. For instance, "1974-1975 Winter". What does that even mean? When we encounter a date with a season, we just set it to the first day of that season. So, "1974-1975 Winter" becomes the first day of winter, 1974: 1974 Dec 21.

When dp_parser starts inferring (inventing) dates like this, it will add { :season => true } into the hash returned by #to_h

Likewise, the #season? method is defined on every node, and can be used to figure out whether the date has been inferred to be the beginning of a season.

How it Handles Semesters, Trimesters, and Quarters

Much like seasons, I just arbitrarily picked a date that approximates the first day of a given semester, trimester, or quarter.

However, there's all kinds of fun to be had with these dates. First of all, did you notice that some publications occur during the 4th trimester example? How in the heck do you have four trimesters?

Another fun one is this little gem which was published during the "2rd" semester. YES!!

Installation

sudo gem install rschenk-dp_parser

Usage

The two methods you're interested in are to_h and to_date.

p = DatePublishedParser.new
result = p.parse('1977 May-Jun')

result.to_h
=> {:year => 1997, :month => 5}

result.to_date.to_s
=> 1997-5-1

a_season = p.parse('1984 Spring')

a_season.season?
=> true

a_season.to_h
=> {:year => 1984, :month => 3, :day => 21, :season => true}

Notice that when you call to_h, it only fills out the fields that it knows. In the example above, there is no :day field in the hash, because the given date does not specify a day.

However, when you call in to_date, it will fill in the missing fields. The month defaults to Jan, and the day defaults to 1. Just like the Ruby Date object. (Imagine that!)

One special thing to note is the valid? method. If for some reason you've got a date like "1999 Apr 31", it will parse correctly, but Ruby will freak out when trying to make a Date object, because that Date doesn't exist. You can use the valid? method to check for this case.

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2009 Ryan Schenk. See LICENSE for details.

dp_parser's People

Contributors

rschenk avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.