benwbrum / fromthepage Goto Github PK

View Code? Open in Web Editor NEW

169.0 169.0 50.0 103.39 MB

FromThePage is a wiki-like application for crowdsourcing transcription of handwritten documents.

Home Page: http://fromthepage.com

License: GNU Affero General Public License v3.0

Ruby 48.78% JavaScript 18.55% CSS 1.46% HTML 10.90% Shell 0.03% SCSS 4.40% Slim 15.83% Dockerfile 0.07%

fromthepage's People

Contributors

Stargazers

Watchers

fromthepage's Issues

Collection Owners should automatically be (collection) work owners

Collection owner is not automatically a work owner, so can't add transcription conventions, etc.

Mrs. -- links wrong

subject linking for "Mrs." links to something wrong. (a real subject, but not the person in question..) Maybe the first "Mrs." defined.

Image Set source path cannot be entered with a trailing slash

This requires me to use legitimate File.basename calls in ImageSet.rb:88 instead of the gsub hokery it does now.

Data too long for column exception processing number location

Clicking on number location in the image set processor results in the following error:

ActiveRecord::StatementInvalid in TransformController#number_location_process
Mysql2::Error: Data too long for column 'action' at row 1: INSERT INTO interactions (action, browser, created_on, ip_address, params, session_id, status, user_id) VALUES ('number_location_process', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0', '2014-09-22 11:20:53', '127.0.0.1', '{"utf8"=>"✓", "authenticity_token"=>"nYVfiVsKMgh7RqL5RpO+uAtmtORkgAi5Zhvs07o33Oo=", "image_set_id"=>"110", "coordinate.x"=>"1...', 106, 'incomplete', 2)
Extracted source (around line #133):

131
132
133
134
135
136

@interaction.page_id = @page.id
end
@interaction.save
end
def complete_interaction

Page Status Window needs a "completed" entry

In From the Page, the page status drop down window only has "Incomplete transcription" and "Blank: nothing to transcribe". It needs a "Transcription Complete" entry for use when the transcription is complete after it had earlier been marked incomplete.

E.g., Grinnell S3 Page 8
http://beta.fromthepage.com/display/display_page?page_id=4364

Refactor bulk image import to use Rake instead of BackgrounDRb

The BackgrounDRb code is antiquated and buggy. It needs refactoring to use Rake instead.

Grinnel used brackets

On the page S3 Page 31, JG used brackets twice. E.g., [Elliot's]. Will this affect the annotation process?

I keep thinking that I have encountered the last of his quirks and then he surprises me again.

Internet Archive Import should check for derive task status

If a user uploads a PDF to the Internet Archive, a stub entry is created there while the derive task is enqueued. Once the derive task is done, the work is ready to be imported into FromThePage.

Currently, FromThePage does not issue any error or warning if users attempt to import stub Internet Archive works into FromThePage. We should check for IA status and ask the user to wait and try again (with advice about knowing when their work is ready), rather than barfing when we can't find derived files.

XHTML export is escaped in Rails 3

Rails 3 introduced automatic HTML escaping, which improperly escapes the XHTML export.

Feature: branding -- allow logo customization

At a minimum, allow room for a logo, like seen here:
http://fromthepage.bpoc.org/

The scientific names of plants needs some editing

I see a couple jack rabbits and a wood rat listed amongst the plants. Good forage for the animals but not so good for the plants..

Pages without images handled gracelessly

Currently the system allows users to create works and associate images for only a few of their pages. This can lead to exceptions being thrown, for example
ActionView::TemplateError (private method sub' called for nil:NilClass) on line #23 of app/views/shared/_zoom_div.rhtml or ActionView::TemplateError (private methodsub' called for nil:NilClass) on line #34 of app/views/display/_multi_page.rhtml

We should do something other than barf if we're missing images for a page.

Use OpenLibrary persistent URLs

The code built around determining URLs for IA images is now obsolete thanks to the new redirection APIs built into OL. As Mike Lichtenberg writes:

FWIW, IA added a way to access their images that doesn’t rely on knowing that the base address was “ia700609.us.archive.org/6/items/mcz13103363v2” at one time but might be different now. See https://openlibrary.org/dev/docs/bookurls.

Using these addresses, this…

http://www.archive.org/download/mcz13103363v2/page/n2_w840

… redirects to this…

http://ia600609.us.archive.org/BookReader/BookReaderImages.php?id=mcz13103363v2&itemPath=/6/items/mcz13103363v2&server=ia600609.us.archive.org&page=n2_w840

… and returns the same image as this…

http://ia600609.us.archive.org/BookReader/BookReaderImages.php?zip=/6/items/mcz13103363v2/mcz13103363v2_jp2.zip&file=mcz13103363v2_jp2/mcz13103363v2_0003.jp2&scale=2

The beauty is you don’t need to know those ugly server names and file paths (and don’t need to double-check them every time).

Rewrite edit-in-place fields

These depend on Prototype/Scriptaculous and need to be replaced.

Write a readme

Having the Rails one show up when you first get to the "fromthepage" project is not so good!

BookReader does not work when a doctype is added to the layout

BookReader does not work when a doctype such as <!DOCTYPE html> is added to the HTML document or layout.

I don't know why that is but the library won't display anything when a doctype is added to the document, I tried many doctypes and it never works with one.

BookReader will only work in quirks mode (without a doctype), which is odd.

Access Control Lists/Private Transcription Projects

It's great that FTP is being used for transcribing important documents, however, it could be useful in situations where the content is not 'world readable'. Consider access control lists to restrict access to projects.

This could be a plugin model, where the access control is provided by other code. (In fact, that's where integrating fromthepage with other code might come in)

Create New Image Set error

When clicking "Create Image Set" from the dashboard, (visiting http://localhost:3000/transform) the user gets the following error:

NoMethodError in Transform#index

Showing /home/benwbrum/dev/products/fromthepage/fromthepage/app/views/transform/directory_form.html.erb where line #1 raised:

undefined method `allow_forgery_protection' for {}:Hash

Extracted source (around line #1):

1: <%= form_tag( {:action => 'directory_process'} ) do %>
2:
3: <% if flash['error'] %>
4:

<%= flash['error'] %>

Rails.root: /home/benwbrum/dev/products/fromthepage/fromthepage
Application Trace | Framework Trace | Full Trace

app/views/transform/directory_form.html.erb:1:in _app_views_transform_directory_form_html_erb__1749423035688235413_42378720' app/controllers/transform_controller.rb:20:inindex'

Collection owner can't edit "introductory block"

At least if they are the 2nd created collection owner -- guessing it's associated with a work instead? Or just the UI needs to be fixed?

need routes to have logical URLs

Had to send someone this link today: http://beta.fromthepage.com/collection/show?collection_id=20&ol=l_hd_c_link

I would have been a lot happier sending them this one:
http://beta.fromthepage.com/jmmclure/collection or something involving the work name like: http://beta.fromthepage.com/jmmclure/grinnell

Works should not be transcribable without a collection.

It's possible for works to exist without a collection, but only in an intermediate state. We should force owners to add works to a collection so that they do not run into errors when they try to transcribe works in an invalid state..

IA support for non-JP2

In order to handle the Graves diaries correctly, we need to be able to import books from IA with multiple filetypes as the originals.

Replace hpricot with nokigiri

reject possible duplicates

Possible duplicates/combine -- need a way to reject a possible dupe and not show it again. (So it doesn't clutter up the subject pages.)

Files that are read-only on the server cause the background processor to break

We're seeing problems in which the transform controller is either not firing off the background image processor or the processor never runs.

UI issues Editing transcription conventions

 Initially, I was able to edit the transcription conventions, but after two or three times accessing the edit box, it no longer works. When I click in that area, it appears to open the editing box, but then as soon as I click again to actually enter text, it disappears and instead the “Permissions” editing box opens.

 Along the same lines, I notice that you can format the transcription conventions using html tags, but the tags disappear if you re-open the edit box. Is there another way I should be formatting text?

Errors in transcripts should be displayed

If users enter unbalanced HTML tags, the parser will barf, but the end user will only see a "Something went wrong" error. When @lasuprema had an unbalanced bracket, the app was totally unhelpful, although it logged this in the logfile:


Processing TranscribeController#save_transcription (for 128.62.58.21 at 2014-10-09 17:37:18) [POST]
  Parameters: {"authenticity_token"=>"x3o06IDSXusyY02X3kgu28Ic3KvUaJUgtRkF/nWmUPA=", "page_id"=>"3437", "page"=>{"title"=>"", "source_text"=>"the [[Thinking Fellers Union Local 282]]\r\n\r\ninterview by [[susan]] and [[steve]] @ [[Emo's]] in August 1995\r\nsome of the questions are ours; some are from a psychiatrist's questioning of an alleged victim of satanic ritual abuse.\r\n\r\nGW: I want to know if you've had any interesting encounters with the law in Texas.\r\n[[Hugh]]: We got pulled over the first time we came here for speeding. It was kind of a speed trap.\r\n[[Anne]]: It's not all that interesting.\r\nHugh: Yeah. All it was, we came over a hill and [[Paul]], our first drummer, was driving. Was it early in the morning? I think it was.\r\nAnne: We'd driven all night.\r\n[[Brian]]: I thought that was interesting, to me, just because I had been one of the last people driving and I hadn't slept all night, and that whole road on Highway 10, I guess, for many hours was littered with deer and there were deer corpses all over the place. So, by the time this policeman stopped us, I was out of my mind, and he could have easily been anything other than a policeman, too. I couldn't tell what was going on at all.\r\n[[Anne]]: And we looked like we had been up all night, too, really scruffy and unwashed. I'm really surprised he didn't just tell us to follow him to the police station. 'Cause I've heard that happens, or else they try to get money out of you on the spot, or they run you out of town.\r\nGW: Do you have an opinion on whether or not electroconvulsive therapy is good therapeutic practice when used by a licensed psychiatrist?\r\n[[Hugh]]: I've heard that it's not. I've heard that it's a bad thing", "status"=>"incomplete"}, "save"=>"Save"}

REXML::ParseException (# >
/usr/local/lib/ruby/1.8/rexml/parsers/baseparser.rb:330:in `pull'
/usr/local/lib/ruby/1.8/rexml/parsers/treeparser.rb:22:in `parse'
/usr/local/lib/ruby/1.8/rexml/document.rb:227:in `build'
/usr/local/lib/ruby/1.8/rexml/document.rb:43:in `initialize'
/home/fromthepage/fromthepage/releases/20140408151942/app/models/xml_source_processor.rb:161:in `new'
/home/fromthepage/fromthepage/releases/20140408151942/app/models/xml_source_processor.rb:161:in `update_links_and_xml'
/home/fromthepage/fromthepage/releases/20140408151942/app/models/xml_source_processor.rb:74:in `process_source'

Curly brace extending over several lines

In S3 Page 26, JG lists on three lines records of three quail. He then drew a curly brace from the first to the last and wrote a long note that started on the first line, went to the last, and then went on to the next couple lines. I indicated this with a single { after each of the birds and added a note in the text and in the notes themselves.

Page titles should support wiki-linking

While wiki-linking works within pages, entering a wikilink in a page title doesn't get processed.

Create Image Set fails with NoMethodError in Transform#size_form

After clicking the orientation form, the application fails with this error:

NoMethodError in Transform#size_form

Showing /home/benwbrum/dev/products/fromthepage/fromthepage/app/views/transform/size_form.html.erb where line #14 raised:

undefined method `id' for nil:NilClass

Extracted source (around line #14):

11
12
13
14
15
16
17

<% form_tag({:action => 'size_process'}) do %> <%= hidden_field_tag('image_set_id', @image_set.id) %> <%= radio_button_tag('size', 'just_right')%> <%= label('size', 'just_right', "This is just right") %>

Rails.root: /home/benwbrum/dev/products/fromthepage/fromthepage
Application Trace | Framework Trace | Full Trace

app/views/transform/size_form.html.erb:14:in block in _app_views_transform_size_form_html_erb___1280253658677194652_24376240' app/views/transform/size_form.html.erb:13:in_app_views_transform_size_form_html_erb___1280253658677194652_24376240'

UI formatting / read all works for a subject

Subject list needs better formatting (on my computer at least) for this page:
http://beta.fromthepage.com/display/read_all_works?article_id=436
bullets are on the left/right dividing line.

How to handle dittos

In Grinnell 1925 S3 Page 7, he uses dittos for several bird names as well as collection location descriptions. I chose to insert the actual text instead of the ditto marks. How many transcription rules does this violate :) ? My thinking is that using the marks would make the text harder to mine and Grinnell's meaning was patently clear.

Devise Error in IA publish to FromThePage

As an owner user, go to the dashboard
Import an Internet Archive book
Use https://archive.org/details/mcz13103363v16 as the URL
Hit "Next" through the duplicate warning
Press "Convert to FromThePage"

This raises something which looks like a Devise error:
NoMethodError in IaController#convert

undefined method `current_user' for #Class:0x00000003155240

Rails.root: /home/benwbrum/dev/products/fromthepage/fromthepage
Application Trace | Framework Trace | Full Trace

app/models/page.rb:122:in create_version' app/controllers/ia_controller.rb:29:inblock in convert'
app/controllers/ia_controller.rb:23:in `convert'

Sign up screen does not redirect user back to originating page.

While visiting the login page sends the user back to the page from whence they came, any visit to the signup page does not do the same. This needs fixing.

Split a Work?

I had a work called "collection of letters", but realized that it's better to have each multi page letter in a different Work. ( as with http://beta.fromthepage.com/collection/show?collection_id=2&ol=l_hd_c_link ) - could there be a way to split (or join) a Work, basically, move a Page between Works?

.gitignore

in general, it would be nice to not have to modify files under change control to do configuration. Not sure how to do that.

Anyways, here's a .gitignore that reduces the noise a little.

public/images/simple_captcha
public/images/working/dot/*
public/images/working/upload/*
public/.htaccess
log

Subjects cannot have double quotes

Apparently putting double quotes in the name of a subject (e.g. [[John "Jack" Coffee Hays]]) returns an indescript error. We should A) catch the error and make it meaningful, and B) figure out what's wrong with the double quotes within a wikilink.

Unicode Support

Dominik Wujastyk has been interested in using FromThePage for Sanskrit manuscripts (see his blog entry on crowdsourced transcription). Quick tests reveal that the current production server running Ruby 1.8 and Rails 2 does not support unicode in any form. Migrating to Ruby 1.9 is required for this to work, as well as possibly migrating and/or changing the encoding of the backing MySQL database. srl295 might be interested in following this issue.

Convert hpricot to nokogiri

New versions of Rails and Ruby appear to break the old hpricot library entirely.

We should convert this to nokogiri

Display transcribe links to site visitors, then prompt to log in

Currently we don't show users anything they don't have permission to do. This means that for a casual site visitor, the transcription functionality is intentionally hidden -- a misfeature if we want users to participate easily.

User can adjust size of transcription window

I have encountered several cases where the transcription window was not wide enough to enter JG's entire line. See S3 Page 7. As I entered some of the lines, they wrapped, which makes it a little harder to read when checking the work. However, when I hit save, the wrapping disappeared. When I did this and similar pages, when I reach the last couple lines on the original text, I have scrolled so far that I cannot see the transcription window and have to either jot down the transcription or scroll, enter a few words, scroll down, back up, etc.

Replace notes with something modern

RJS is used in FromThePage in the 'notes' feature -- itself pulled from an old restful_comments plugin. This needs to be replaced with something modern, perhaps by replacing notes.

Acceptance criteria:
As a logged-in user,

Visit a page in a work (such as http://beta.fromthepage.com/display/display_page?ol=d_act_page&page_id=1946 )
Click the "Add Note" link
Verify a form appears
Type in some text
Save the note
Verify the note is saved.
Reload the page, or navigate to the next page and back again
Verify the note is displayed
Edit the note (n.b. may depend on Issue #30 )
Verify that edits to the note are made
Delete the note
Verify the note is deleted.
Reload the page, or navigate to the next page and back again
Verify the note is not displayed

Grinnel used plus or minus

On Grinnel's S3 Page 46, he lists a number of birds that he saw but for which he did not take careful notes so he indicates some as +- with the + superimposed over the -. I rendered this as, for example
Great Auk (1000 +-).
NHW

Handling page headings that vary

In the earlier Grinnell notes that I have seen, the page heading appeared to always have the elements collector, locality, date, and page number. However, in the late summer survey in Mexico that he did with Lamb, the locality notes are sometimes more complex and can include latitude and elevation. E.g.:
Collector: Grinnell - 1925
Date: September 27
Location: San Jose, 2500 ft. Lat. 31 degrees (altitude according to our aneroid)
Page Number: 2550

My question is whether to render the transcription as I did above or whether to break up the location to aid future parsing. E.g.
Location: San Jose
Elevation: 2500 ft.
Latitude: Lat. 31 degrees
Instrument: (altitude according to our aneroid)

"More" link on a collection activity list shows all activity

The more link should restrict the list to the activity for the collection.

Rails 4 - Not able to create image set

After pulling a fresh version of the repo (as of today) and setting up my dev environment, I am not able to 'create an image set'. I am able to click the 'Create an image set' link. I enter the path (local) to a group of images and then click 'next' and I get this error:

unable to open image `2013-03-14_19-07-37_165.jpg': @ error/blob.c/OpenBlob/2587
Extracted source (around line #108):

106
107
108
109
110
111

set default image data

orig = Magick::ImageList.new(sample_image.original_file)
self.original_width = orig.columns
self.original_height = orig.rows

Rails.root: /home/johnmlocklear/railsApps/fromthepage
Application Trace | Framework Trace | Full Trace

app/models/image_set.rb:108:in new' app/models/image_set.rb:108:inprocess_sample_image'
app/models/image_set.rb:62:in directory_setup' app/controllers/transform_controller.rb:337:inprocess_source_directory'
app/controllers/transform_controller.rb:87:in `directory_process'

Request

Parameters:

{"utf8"=>"✓",
"authenticity_token"=>"BNrCnS/aQ6H8IMBghfveM4MI2tEoBHqrrrEp9bujTQc=",
"directory"=>"/home/johnmlocklear/Pictures/art",
"commit"=>"Next"}

Feature: PDF generation and Publish-on-demand integration

The world has changed since I last worked on the LaTeX formatters for doing PDF generation. Now I should be able to use the Lulu.com publishing API to generate Publish-on-Demand books from manuscript transcripts and/or facsimiles.

Could links be added to each page

I don't know how many of the other JG trips would be affected but for the trip to the San Martir mountains, Chester Lamb was the other half of the party. A link to a cleaned up copy of his map of collecting stations as well as to his own notes could be useful at times. Also, are there records available on line of the plants that they collected? A link to these could be useful, especially since JG often used common names.

size_form.html.erb does not render form

Once the user has chosen an orientation, they are should be presented with a dialog about sizing the image. For some reason, the form at e.g. http://localhost:3000/transform/size_form?image_set_id=109 doesn't display anything.

Why doesn't size_form.html.erb render anything?

Error importing internet archive works with spaces in filenames

Using the new uploader, Internet Archive users can upload files containing spaces in the image file names. FromThePage still expects filenames in IA to conform to the old, persnickety file format, so an import will blow up on the third step during XML file parsing.

Logfiles:

Processing IaController#ia_book_form (for 70.112.88.81 at 2014-10-06 22:14:47) [GET]
  Parameters: {"ol"=>"d_ia_import"}
Rendering template within layouts/application
Rendering ia/ia_book_form
Completed in 18ms (View: 5, DB: 4) | 200 OK [http://beta.fromthepage.com/ia/ia_book_form?ol=d_ia_import]


Processing IaController#confirm_import (for 70.112.88.81 at 2014-10-06 22:14:54) [POST]
  Parameters: {"commit"=>"Next", "authenticity_token"=>"MRUo4XaibGPztbe4xpDkP2/2O1+6M7NnglFvfkjFUxg=", "detail_url"=>"https://archive.org/details/Doc3617Pp312"}
Rendering template within layouts/application
Rendering ia/confirm_import
Completed in 33ms (View: 9, DB: 15) | 200 OK [http://beta.fromthepage.com/ia/confirm_import]


Processing IaController#import_work (for 70.112.88.81 at 2014-10-06 22:15:02) [POST]
  Parameters: {"commit"=>"Next", "authenticity_token"=>"MRUo4XaibGPztbe4xpDkP2/2O1+6M7NnglFvfkjFUxg=", "detail_url"=>"https://archive.org/details/Doc3617Pp312"}

URI::InvalidURIError (bad URI(is not URI?): http://ia802305.us.archive.org/10/items/Doc3617Pp312/slave ledger doc 3617 pp3-12_scandata.xml):
  /usr/local/lib/ruby/1.8/uri/common.rb:436:in `split'
  /usr/local/lib/ruby/1.8/uri/common.rb:485:in `parse'
  /usr/local/lib/ruby/1.8/open-uri.rb:29:in `open'
  app/controllers/ia_controller.rb:168:in `import_work'

Problem file list (from https://ia902305.us.archive.org/10/items/Doc3617Pp312/ ):

Doc3617Pp312_archive.torrent                       04-Oct-2014 01:02                3891
Doc3617Pp312_files.xml                             04-Oct-2014 01:02                4711
Doc3617Pp312_meta.sqlite                           03-Oct-2014 16:22                9216
Doc3617Pp312_meta.xml                              04-Oct-2014 01:02                 927
slave ledger doc 3617 pp3-12.djvu                  04-Oct-2014 01:01              567506
slave ledger doc 3617 pp3-12.epub                  04-Oct-2014 01:02                3975
slave ledger doc 3617 pp3-12.gif                   04-Oct-2014 00:59              128989
slave ledger doc 3617 pp3-12.pdf                   03-Oct-2014 16:22             8003108
slave ledger doc 3617 pp3-12_abbyy.gz              04-Oct-2014 01:00                2804
slave ledger doc 3617 pp3-12_djvu.txt              04-Oct-2014 01:02                  93
slave ledger doc 3617 pp3-12_djvu.xml              04-Oct-2014 01:00                4651
slave ledger doc 3617 pp3-12_jp2.zip               04-Oct-2014 00:59             4323550
slave ledger doc 3617 pp3-12_scandata.xml          04-Oct-2014 01:01                3205
slave ledger doc 3617 pp3-12_text.pdf              04-Oct-2014 01:02              704575

benwbrum / fromthepage Goto Github PK

fromthepage's People

Contributors

Stargazers

Watchers

Forkers

fromthepage's Issues

set default image data

Recommend Projects

Recommend Topics

Recommend Org