Giter VIP home page Giter VIP logo

e-ark-software / earkweb Goto Github PK

View Code? Open in Web Editor NEW
20.0 16.0 6.0 65.43 MB

E-ARK Web is a software for the creation and management of archival information packages, and it supports full-text search for individual files contained in them.

License: MIT License

Python 7.38% JavaScript 43.47% CSS 32.40% HTML 7.22% Shell 0.60% Dockerfile 0.06% Batchfile 0.02% SCSS 4.73% Less 4.11%
repository ingest archiving

earkweb's People

Contributors

bartham avatar janrn avatar romankarl avatar rschmidt13 avatar shsdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

earkweb's Issues

SIPExtraction error

After creating a SIP, on it's active page we cannot continue the SIP to AIP tasks with SIPExtraction because there is an error:

Task cannot be executed at the current task state (last executed task: SIPtoAIPReset, input accepted from previous tasks: ['IdentifierAssignment'])

Search results 404

The files we uploaded inside the SIPs can not be opened.
Search results seems fine, but after listing the found files, those can not be opened (except the files from the schemas folder)

Uploading earkweb-100-ip.JPG…

Archiving of SIP

Despite the same key prefix that we compare between "Administration">"Django Administration">"API Key Permissions">"API keys" and configuration in ./settings/settings.cfg.docker file on the line 73/74, archived SIP is still pending in "overview of information packages in progress" listing. But should be under archived packages.

Celery is not started automatically from docker

This can be checked in dashboard: http://localhost:8000/earkweb/administration/dashboard Sometimes Celery does not start automatically from docker deployment. This can be fixed, when you start it manually from inside earkweb1 running docker container:

in one terminal:
root@2eb1e1400ec7:/earkweb# cat run_all.sh
#!/bin/bash
sleep 20
echo "Starting celery ..."

cd /earkweb && celery multi start ingestqueue -A earkweb.celery --concurrency=4 -Ofair --pidfile=/data/celery_worker.pid --logfile=/data/celery_default_queue.log

cd /earkweb && celery -A earkweb.celery worker --pool threads -Ofair --pidfile=/data/celery_worker.pid --logfile=/data/celery_default_queue.log &

cd /earkweb && celery -A earkweb.celery worker --pool prefork -Ofair --pidfile=/var/data/celery_worker.pid --logfile=/var/data/celery_default_queue.log &
sleep 3
echo "Starting flower ..."
cd /earkweb && celery -A earkweb.celery flower --port=5555 >/var/data/flower.log 2>&1 &
sleep 3
echo "Starting earkweb ..."
cd /earkweb && python3 manage.py runserver 0.0.0.0:8000
root@2eb1e1400ec7:/earkweb# celery -A earkweb.celery worker --pool prefork -Ofair

in the second terminal:
root@osboxes:/home/osboxes# docker exec -it earkweb1 /bin/bash
root@2eb1e1400ec7:/# cd earkweb
root@2eb1e1400ec7:/earkweb# celery -A earkweb.celery status

Docker installation - missing language files

By docker deployment may happen that some language files are missing. In particular "django.mo". This results in some missing important menu items, such as "Information package creation". To fix this either is required to follow documentation here: https://github.com/E-ARK-Software/earkweb/blob/master/docs/install_manual.md or copy these files to related location before "docker-compose build" (e.g. earkweb/locale/en/LC_MESSAGES for english language).

500 error on demo site (Google Chrome / Edge - Windows)

Hi there - trying to access the demo site I'm seeing the issue in the screenshots below:

jquery-1.11.2.js:9659 GET https://earkweb.sydarkivera.se/earkweb/submission/initialize/PACKAGE.NAME.001// 
500 (Internal Server Error)

Screenshots

Chrome

image

NB. the coloring here is a custom dark reader.

And in Microsoft Edge:

image

Uploading SIP - ascii character code

The SIP's file name cannot have special characters, because uploading them to E-ARK WEB cause an error.

RODA-in offers different naming method when we export SIPs, like: title + id, or id + date.. etc. We could save our SIPs made with RODA-in with their unique ID, but that way would be hardly recognizable.

ascii-earkweberror

API for manual entry of PREMIS events

In the archival process of preparing a DIPu there are scenarios in which some manual preparation of materials is needed (like converting geodata from one coordinate system to another, or using ). And this kind of actions call for manual PREMIS entries.
So I propose you create an API in which you can enter an entry into PREMIS manualy within the DIPu preparation process.

Broken package by creation using API methods

It is possible to create a package with manipulated package id e.g. "string". For example:
{
"process_id": "string",
"work_dir": "string",
"package_name": "99",
"external_id": "string",
"identifier": "string",
"version": 0,
"storage_dir": "string",
"last_change": "2020-08-06T19:19:15.178662+02:00"
}
Later it is not possible to find it and to remove this package using GET and DELETE methods "/informationpackages/{process_id}/". Error occures: 404 Error: Not Found.

Packages names (AIP to DIP conversion) must be unique

When posting orders from the OMT, the submitted "order_title" must be unique. Otherwise the following JSON is returned:
{ "message": "IntegrityError(1062, \"Duplicate entry 'Flour1' for key 'PRIMARY'\")", "success": false }
This could be a problem since different users of the OMT may use the same order title.

It is possible to change this, or should I add some UUID like string to the order title before sending it from the OMT?
(this will lead to some not so pretty packages names in earkweb)

Package indexing using API method did not work

I've created a package:

{
"process_id": "1fdc9763-b586-4f35-ae09-93460021f5a7",
"work_dir": "/var/data/repo/work/1fdc9763-b586-4f35-ae09-93460021f5a7",
"package_name": "roman.7",
"external_id": "doi:10.111/94",
"identifier": "urn:uuid:4b36751d-7635-48a8-863e-6d943c6c4f70",
"version": 0,
"storage_dir": "",
"last_change": "2020-08-06T19:41:48.355363+02:00"
}

using POST /informationpackages/

Then I tried to use method POST /storage/informationpackages/{identifier}/index/ for created ID and it was not find for indexing.

Duplicated packages with the same package id is possible

It is possible to create multiple packages with the same package id e.g. "string". For example:
{
"process_id": "string",
"work_dir": "string",
"package_name": "99",
"external_id": "string",
"identifier": "string",
"version": 0,
"storage_dir": "string",
"last_change": "2020-08-06T19:19:15.178662+02:00"
},
{
"process_id": "string",
"work_dir": "string",
"package_name": "88888",
"external_id": "string",
"identifier": "string",
"version": 0,
"storage_dir": "string",
"last_change": "2020-08-06T19:22:14.933824+02:00"
}
using POST method "/informationpackages/".

Db issue - fatal error (docker installation) and ghostcript missing link

The installation using docker and docker compose result in some issue with the database when creating a new SIP or Stating a new AIP. There are missing tables in the eark db
It seems the same issue that report @luis100 a year ago
#48

Error creating a new SIP

Request

Method: GET
http://ip:8000/earkweb/sipcreator/
1.9
ProgrammingError
(1146, "Table 'eark.workflow_workflowmodules' doesn't exist")
/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py in defaulterrorhandler, line 36
/usr/bin/python
2.7.6
['/earkweb', '/usr/lib/python2.7/dist-packages', '/usr/local/lib/python2.7/dist-packages/fido-1.3.0-py2.7.egg', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/earkweb', '/opt/python_wsgi_apps/earkweb']
Wed, 5 Dec 2018 11:00:22 +0100

These are the existing tables;
| Tables_in_eark |
+-----------------------------+
| auth_group |
| auth_group_permissions |
| auth_permission |
| auth_user |
| auth_user_groups |
| auth_user_user_permissions |
| celery_taskmeta |
| celery_tasksetmeta |
| django_admin_log |
| django_content_type |
| django_migrations |
| django_session |
| djcelery_crontabschedule |
| djcelery_intervalschedule |
| djcelery_periodictask |
| djcelery_periodictasks |
| djcelery_taskstate |
| djcelery_workerstate |
| earkcore_informationpackage |
+-----------------------------+
19 rows in set (0.00 sec)

And , if you try build images and run as individual containers, when build earkweb image it crash in steP 17 ghostcript because the release 9.18 link has been moved:
https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/tag/gs918

Tested in Ubuntu 14.04 and 16.04.

These issue make earkweb unable to install. Can you take a look.

Regards,

Value of "Last task" in AIP to DIP conversion does not change

When I make call a like this

$ curl -X POST -d '{"process_id": "c1b1c16e-2c00-474f-b99b-42019b3eaeed"}' http://localhost:8000/earkweb/search/prepareDIPWorkingArea

and get a 201 response back, the "Last task" for this process is still AIPtoDIPReset in earkweb, but should this not be DIPExtractAIPs?

SIP creator Premis 3.0

When creating a SIP it looks like earkweb places a Premis 2.2 schema.

uadklip

I guess this should be a Premis 3.0.

The current AIP spec does use 2.2 but the SIP spec and DIP spec uses 3.0. I believe it has been decided that the AIP spec will amend to 3.0 too

tar file from DIP storage cannot be unpacked

In the AIP to DIP conversion workflow, I completed the task "DIPStore" (with status "succes") and the DIP is stored in /var/data/earkweb/storage/pairtree_root/69/b8/19/c7/-f/29/d-/45/0b/-a/94/e-/56/88/e5/9c/ef/65/data/00001/69b819c7-f29d-450b-a94e-5688e59cef65.tar

When I try to unpack this tar-file, I get this error:

$ tar xvf 69b819c7-f29d-450b-a94e-5688e59cef65.tar
69b819c7-f29d-450b-a94e-5688e59cef65/metadata/earkweb.log
69b819c7-f29d-450b-a94e-5688e59cef65/metadata/preservation/premis.xml
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Broken file upload

Despite the statement about supported formats for file upload "Only csv, jpg, jpeg, png..." the upload of one jpeg file broke the upload process with the error message "bird.jpeg: Syntax error: JSON parse: unexpected character at line 4 column 1 of the JSON data".

Indexing status

There is an error at Indexing status (from the menu):
HTTPError at /sip2aip/indexing_status
HTTP Error 404: Not Found

Cannot post order with the same title as a previously deleted order

(this issue is related to #39 )

Here is what I did:
Submitting an order from the OMT by sending this JSON {'order_title': u'Flour2', 'aip_identifiers': ['urn:uuid:adb2b78e-c9c2-a35a-8cfa-f163612b3a08']}. This works fine, and I can see the order in list of active AIP to DIP creation processes (there is a typo in the page displaying this list - it says "Active SIP to AIP conversion processes" and not "Active AIP to DIP conversion processes"). If I delete the IP by using the "IPdelete" task the process succeeds fine and the process is removed from the list. However, when I try to submit the order again, i.e. sending the same JOSN as above, I get the response { "message": "IntegrityError(1062, \"Duplicate entry 'Flour2' for key 'PRIMARY'\")", "success": false }, so it seems as if the process/package was not deleted properly?

AIP to DIP: DIPAcquireAIPs cannot be run as the first task

After submitting an order from the OMT to earkweb, I can see the newly created AIP to DIP conversion process in the list of active AIP to DIP conversion processes. In this list I can see that "Last task" has the value AIPtoDIPReset and "Process status" has the value Success.

However, when I click on the process and then run the task "DIPAcquireAIPs", I get an error?

Output from process log:

Task execution: DIPAcquireAIPs task 74dd4bdd-f472-4e30-ae10-c50b5263996e
Processing package 5f7fabe2-2410-4ee9-af38-b6919ecd1bf1

Output from error log:

Task execution request rejected ('package task_status=-1')
Task status is undefined (-1). Task status must be set in task implementation.

If I first run the task AIPtoDIPReset and then run DIPAcquireAIPs things are working fine, but I guess that it should not be necessary to run AIPtoDIPReset first?

Docker - creation of tables ?

When I follow you instructions to setup earkweb through docker (https://github.com/eark-project/earkweb/blob/master/docs/install_docker.md), the eark database is correctly created, but it doesn't have any tables... So everything crashes afterwards (can't create user...)

When I run docker exec -it --user=root earkdb_1 /repair_tables.sh it tells me that the files doesn't exist (myisamchk: error: File '/var/lib/mysql/eark/auth_group' doesn't exist...), I don't know how to get them :/

also, it seems that no containers has the urllib3

celery_1    |   File "/earkweb/sip2aip/views.py", line 9, in <module>
celery_1    |     from requests.packages.urllib3.exceptions import ConnectionError
celery_1    | ImportError: No module named packages.urllib3.exceptions
celery_1 exited with code 1

Have I done something wrong? how could I solve it? Is there an easy way to setup earkweb?
I can't find any definition of the SQL tables in your github...

Flower not accessible

If we select host system other than Ubuntu 18.04, we have a problem to access flower service.

The celery process is running but cannot be resolved e.g. using curl. And starting of this process manually in earkweb1 container with “–url flower” parameter does not work because original process is running. That is strange. But now after I removed all images and containers and redeployed I can access the dashboard again:

docker exec -it earkweb1 /bin/bash

root@e1a319eeda23:/# celery -A earkweb.celery flower --url_prefix=flower --port=5555

curl http://localhost:5555/flower -> page not found

root@e1a319eeda23:/earkweb# ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 43 0.0 0.1 20536 4088 pts/0 Ss 09:18 0:00 /bin/bash
root 108 0.0 0.0 36072 3360 pts/0 R+ 09:37 0:00 _ ps auxf
root 1 0.0 0.0 20044 3392 ? Ss 09:11 0:00 /bin/bash /earkweb/run_all.sh /earkweb/run_all.sh
root 7 0.0 0.0 20044 284 ? S 09:11 0:00 /bin/bash /earkweb/run_all.sh /earkweb/run_all.sh
root 9 0.2 3.4 246724 138880 ? S 09:11 0:03 _ /usr/bin/python3 /usr/local/bin/celery -A earkweb.celery worker --pool prefork -Ofair --pidfile=/var/data/celery_worker.pid --logfile=/var/data/celery_default_queue.l
root 25 0.0 3.1 246212 124912 ? S 09:11 0:00 _ /usr/bin/python3 /usr/local/bin/celery -A earkweb.celery worker --pool prefork -Ofair --pidfile=/var/data/celery_worker.pid --logfile=/var/data/celery_default_que
root 13 0.0 0.0 20044 284 ? S 09:11 0:00 /bin/bash /earkweb/run_all.sh /earkweb/run_all.sh
root 15 0.2 3.4 610052 137664 ? Sl 09:11 0:03 _ /usr/bin/python3 /usr/local/bin/celery -A earkweb.celery flower --url_prefix=flower --port=5555
root 19 0.0 1.2 128416 50284 ? S 09:11 0:00 python3 manage.py runserver 0.0.0.0:8000
root 24 11.1 3.4 739788 136052 ? Sl 09:11 2:53 _ /usr/bin/python3 manage.py runserver 0.0.0.0:8000

The reason for that seems to be an older version of flower from requirements.txt - 0.9.5

AIP package search / Package file search not working

AIP package search and Package file search seems to be not working at all.
After running the search, even if E-ARK WEB is empty, shouldn't it say, no search result, giving a feedback that the search was completed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.