Giter VIP home page Giter VIP logo

image-injecttool's People

Contributors

alexey-ebi avatar bunop avatar dependabot[bot] avatar gelso avatar wizardfan avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

image-injecttool's Issues

Real time status updates while loading data dump file

Is your feature request related to a problem? Please describe.
Real time status updates, no need to refresh page. For Ale’s data, loading status has been there for at least about two minutes. Give up waiting and play around, the submission list indicates loaded status. Probably the loading status has not been updated in time.

Describe the solution you'd like
We could use django-channels to send messages using websockets

Describe alternatives you've considered
Automatic page reload

Additional context
Required steps:

  • add support for django channels
  • adding tests for django channels
  • mocking async calls in unittest (no call to asgi server during tests - no status update for record in InjectTool database)
  • write a few words using sphinx for django channels (aim, how it works...)

Small changes, minor updates

Is your feature request related to a problem? Please describe.
Here are described some ideas or improvement not related to a specific issue or a problem in particular

Additional context

  • check validation.admin
    • check __str__ methods
    • check plurals
    • check columns displayed
  • use bumpversion to manage versions?
  • disable validation links while WAITING - not useful we can inspect errors while waitin and we can't update or fix in this state
  • what to do when getting a specie not in DictSpecie table?
  • TaskFailureMixin should be unique and used by all tasks - issue #58
  • batch update doesn't update the last_changed colum in Name table
  • add the update submission view
  • country should't be added when importing data - issue #57
  • pinning kumbu library and fix issues with flower - ussue #6
  • send mail to admin when a task fail - issue #58
  • use common.helpers.send_mail_to_admins when sending a mail to admin is needed - issue #58
  • configure git LFS
  • uploading csv file with cryoweb path should raise an exception or a more informative error
  • take a look to https://medium.com/@ksarthak4ever/django-and-web-security-headers-d72a9e54155e - will have a dedicated issue
  • solve the cookie problem with font-awesome and google-chrome - seems has been solved by font-awesome
  • cryoweb data with GPS coordinates will fail validation since we don't have place as a text - should this be fixed with geographic referencing tool?
  • biosample submission data tables can be deleted if the submission process went ok?
  • add the contact to text for missing organizations (affiliation/organizations)
  • filter validation by owner in admin - devel
  • filter biosample objects as validation object in admin - devel
  • use submission.id instead of str(submission) in task retrieval logs - devel
  • the reset_submission management command should free also biosample.Submission table - no, should be done by hand since there can be opened USI submissions
  • add views for organization objects as view for update organization information
  • the retrieval task could be done in parallel - will have a dedicated issue
  • remove text placeholder from dashboard
  • warn user if they delete a submission with biosample names
  • better error message when uploading a missing relationship during excel upload
  • Add last species and translations to initializedb

refactor websocket to support other real-time stuff

Is your feature request related to a problem? Please describe.
Websocket could be refactored in order to support other real time stuff, for instance zooma calls: I could display a progress bar, or I could activate or deactivate action buttons relying on completed task or missing terms

Describe the solution you'd like
By refactoring websocket I could support other types of request and will be more easier to maintain and add other features. Code could be tested with appropriate methods.

Describe alternatives you've considered
Things could remain the same and informations between frontend and backend could be managed using JavaScripts and django views. However, since we have websockets implemented we could exploit better this resource and add additional features

Additional context
Some useful links to understand better before to decide a plan:

Updating a biosample record still create a new biosample record

Describe the bug
After a successful biosample submission (and after retrieval of biosample ids), modifying and submitting data again will result in a new biosample record. The submission of an already submitted record shuold have the accession attribute inside JSON data. Relationship between sample and already submitted animal is not yet clear - should use the accession attribute? will InjectTool validation handle those changes? - wating for USI updated docs

To Reproduce
Steps to reproduce the behavior:

  1. Complete a submission with success (get BioSamples ids and get the completed submission status through image InjectTool)
  2. Update a sample and an animal (modify something in order to see NEED_REVISION status)
  3. Pass the validation step
  4. Provide a token and submit the updated data to biosample
  5. Wait to get the submission complete.
  6. Updated data will have a new biosample id (two biosample page for the same entity)

Expected behavior
Submitting the same data will update the same biosamples record. In biosamples page I will have the updated data and a note which tell me that data is updated

Additional context
We are facing issues in providing accession attribute in biosample data, the USI seems to have a problem managing updated data. Currently waiting for update from USI team

Replace data source file in an existing submission

Is your feature request related to a problem? Please describe.
There is a main issue in updating user data. This is not related to update some attribute in a submission, or update one or more entities within a submission, but with inserting updated data with a new or updated data source file. Cryoweb user for example can import their data by providing a full database dump. A user could be import a huge template file where he does't know where the new data are. We don't want that the same entity will be loaded into biosample with a new biosample id. We don't want that the new submission will overwrite old data, since we could have fixed data to pass validation or we could have added data not modeled in original data source by hand. For such reason we can't use the reload submission to replace data within the same submission id

Describe the solution you'd like
Adding new data should not replace old one (or you have to agree to replace the old ones). In such way, names entities should be unique in user namespace, not in submission as they are actually. By removing the Name model and trasferring the name attribute in Sample and Animal tables I can modify constraint in order to not have the same object for the same user into database. When adding an object I could check that this object is not present in a different submission. Moreover relationship between objects will be easier.

Describe alternatives you've considered
We can add new data in templates, however we have to prevent that a user load the same Animal or Sample in two different submission

Additional context
Those are steps eventually required:

  • clean up user data tables (submission, animal, samples, ...) to avoid issues in migrations
  • remove the Name table and move the name attribut to Animal/Sample
  • refactor code reflecting Name removal
  • set constraints as (("name", "breed", "owner"),) and (("name", "animal", "owner"),) for Animal/Sample
  • change uid.helpers.update_or_create_obj to test object presence in a different submission
  • model ValidationResult as a GenericRelation to Animal/Sample
  • fix the name issue when callingSample and Animal with the same name (ex 1)
  • model a test with a reload dataset with new data (not present before)
  • refactor SplitSubmissionHelper accordingly new modifications
  • try to sort BioSamples submission object relying the new Animal relationship
  • check that all tests work

new data record with old validation results

Describe the bug
Records with unexpected validation status

To Reproduce
Steps to reproduce the behavior:
Create a new submission with the example cryoweb file (3 animals 1 sample)
Do the validation
Reload the data source file with the same example cryoweb file
Edit the data

Expected behavior
All records without validation result (newly loaded data has not been validated)

Task classes need to inherit from a common mixin

Is your feature request related to a problem? Please describe.
Different task classes has a lot of features in common. Ideally they can be refactored in order to improve maintenance and simplify code

Describe the solution you'd like
With a common task mixin, could be easier to manage task, and set behaviour when things go wrong

Describe alternatives you've considered
Leaving things like this works, but if I need to change something (eg. send a message when a task fails) I need to updata each task class I defined

Additional context
Those thing should be addressed and are described also in #50

  • define a TaskFailureMixin for all tasks
  • send mail to admin when a task fail
  • more friendly messages when a submission fails
  • truncate message body after some issues

:bug: task SearchOrphanTask raises error

Describe the bug
Periodic task SearchOrphanTask raise errors when executing: the BioSample url https://www.ebi.ac.uk/biosamples/samples now returns a different page, without the page section

To Reproduce
Steps to reproduce the behavior:

  1. Oper the admin interface
  2. Search for periodic task section
  3. Select and execute the search_orphan_biosamples peridic task

Expected behavior
Task need to be executed without errors. No notification emails need to be sent

Screenshots

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/var/uwsgi/image/common/tasks.py", line 211, in wrapped_f
    result = f(*args, **kwargs)
  File "/var/uwsgi/image/biosample/tasks/cleanup.py", line 357, in run
    loop.run_until_complete(check_samples())
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 488, in run_until_complete
    return future.result()
  File "/var/uwsgi/image/biosample/tasks/cleanup.py", line 284, in check_samples
    async for sample in get_biosamples(managed_domains=managed_domains):
  File "/var/uwsgi/image/biosample/tasks/cleanup.py", line 238, in get_biosamples
    totalPages = data['page']['totalPages']
KeyError: 'page'

Additional context

Stuff can be solved as IMAGE-CommonDataPool does, by querying accession endpoint before and then check for each biosample object.

:bug: got a 50x status page when creating a new submission

Describe the bug
Adding a Submission with the same attributes of a previous one in the same user namespace throw an Internal Server Error. The same is expected to happen when updating a submission with the same attributes of another one. The reason seems to be the unique key defined in uid.models.Submission table

To Reproduce
Steps to reproduce the behavior:

  1. Get a submission with InjectTool
  2. Create a new submission with the same attributes of the previous one

Expected behavior
It's ok that a user can't create the same object two times in his namespace, however the Internal Server Error must be avoided and the error message should be provided during the form validation, before attempting to create the new submission

Screenshots

Internal Server Error: /submissions/create/

IntegrityError at /submissions/create/
duplicate key value violates unique constraint "uid_submission_gene_bank_name_gene_bank_b784b3b4_uniq"
DETAIL:  Key (gene_bank_name, gene_bank_country_id, datasource_type, datasource_version, owner_id)=(test italy, 8, 0, test 1, 6) already exists.

Additional context

  • test for the same key in submissions.forms.SubmissionFormMixin
  • test form invalid and no database changes when using the same attributes
  • test error when updating a submission or creating a new one

validation.tests.test_helpers should be mocked where possible

Is your feature request related to a problem? Please describe.
During unitttest, image_validation module executes task using real EBI server. This consumes time and it is prone to errors when issues from EBI occurs. However, by doing this we will not sure that InjectTool will act properly when IMAGE-metadata changes

Describe the solution you'd like
Tests should be mocked up. Methods are already tested in IMAGE-ValidationTool project. Define a strategy where the external resources were called (to ensure that things works even with future metadata releases)

Describe alternatives you've considered
We could ignore those test, or keep things like there are. Or we could use the python request_cache module during tests

Additional context
A resume of required step:

  • learn how image_validation replies (using their internal objects) and mock up responses (check that validation.helpers methods are called) - it's a difficult task since we have to monky patch a lot of things and I need to store results in some persistent objects
  • configure things to do a real request (option to not mock an object) - not done: we suppose to do a real test in Travis CI
  • Try request_cache module and set an expiration policy

:construction_worker: upgrade CI configuration files

Is your feature request related to a problem? Please describe.
Travis CI is no more free for Open Source Projects, it gives to every user a limited amount of credits for builds, and after that no furter builds are allowed without buying credits. There's also the possibility to request credits for OS Projects, however its seems tricky and need to be discussed with Travis CI staff,

Describe the solution you'd like
Update .travis.yml to use the Partner Queue Solution

Describe alternatives you've considered
Alternative could be move CI to another workflow, for instance GitHub workflow or Circle CI

Upload the template file xlsx not possible

Dear Sirs,

we have downloaded the template file (xlsx format), but only UTF-8 or ASCII format is accepted for upload.

How can we convert the template file from xlsx format to a text file with UTF-8?
The Template File contains several worksheets.
Can more than one file be uploaded?

Best Robin

Support the last IMAGE-ValidationTool library

Is your feature request related to a problem? Please describe.
InjectTool does not support the last IMAGE-ValidationTool version

Describe the solution you'd like
Upgrade code and support last library version. image_validation now returns a ValidationResultRecord for functions like validation.check_usi_structure and validation.check_ruleset. Since all validation objects reply in the same way the Wrong JSON structure status will no more appear. ValidationSummary objects and its related functions should change accordingly

Describe alternatives you've considered
We could keep current version, however we can't add last validation features. We can also force the Wrong JSON structure status for data which fails check_usi_structure

Additional context

  • Upgrade image_validation package
  • Fix/Upgrade code
  • Check tests and Validation behaviour
  • remove Wrong JSON structure and its stuff or force this status again

Sending notifications in real-time

Is your feature request related to a problem? Please describe.
Send notifications in real-time

Describe the solution you'd like
Send messages using django-channels

Describe alternatives you've considered
Automatic page reload

  • Send notification messages using django-channels

  • Add documentation about django-channels using sphinx

Catch exceptions when uploading text file with template path

Describe the bug
If you upload a CRBanim file with the template path, you will raise an exception from XLRDError module

To Reproduce
Steps to reproduce the behavior:

  1. Create a new submission
  2. Select the Template data source type
  3. Upload a CRBanim like text file

Expected behavior
A warning telling a user that the file format is different from the expected and/or a generic ERROR status for the submission

Additional context

  • Simulate an error from XLRD
  • Simulate a generic error when upload data

"same as" relationship support

Is your feature request related to a problem? Please describe.
Will be nice support the same as relationship for records which have already a BioSamples record

Describe the solution you'd like
The same as relationship should be modeled into database and supported during the biosample submission. CRBanim record with a BioSamples id should be tracked into the database and modeled with the same as relationship to their orignal biosample record

Describe alternatives you've considered
same as record could be ignored (as now) or inserted as a new duplicated objects

Additional context

  • model the same as attribute into database
  • implement the same as attribute in biosample generation
  • edit Animal/Sample views to show the same as attribute
  • change CRBanim data load to handle same as data

Update Biosamples data generation

Is your feature request related to a problem? Please describe.
Data submitted to biosample should be updated

Describe the solution you'd like
Biosamples data should have the InjectTool submission id, the same as attribute implemented and the breed ontology should point to specie ontology if no mapping ontology is found

Describe alternatives you've considered
Those updates are required to manage better data submission and siupport

Additional context

  • add a key for injecttool submission id in biosample data
  • model the same as attribute into database - moved into #70
  • implement the same as attribute in biosample generation - moved into #70
  • edit Animal/Sample views to show the same as attribute - moved into #70
  • change CRBanim data load to handle same as data - moved into #70
  • breed ontology should point to base ontology if no mapped term is found
  • model the general breed relation into the database
  • test the new biosample data with ValidationTool - ignored for the moment

Inject tool validation summary page

Is your feature request related to a problem? Please describe.
Validation summary (could combine with submission summary)

Describe the solution you'd like

  • How many unknown, pass, warning, and error
  • If possible, what is the most occurred error, which may lead to batch correction, e.g. add decimal degrees to the coordinate as units
  • Validation summary is static, could be calculated after validation and will remain the same until data changes
  • Should be modeled as a validation.models class
  • Should use the IMAGE-ValidationTool image_validation.ValidationResult.ValidationResultColumn to get information againt the most attribute failing
  • Should have a dedicated page (not a record in SubmissionDetailView) to represent data
  • Ideally we need to point this page with edit pages, or for the most common errors to bulk update, as described in #16

Describe alternatives you've considered
It could start with a descriptive record. Could add other functionalities with time. We could decide a minimal set of features of such isse for first milestone

Additional context
Steps required:

  • Refactor validation.helpers.ValidationSummary using appropriate method
  • Define a validation status View
  • Define models for validation reports
  • prepare report after validation, as a part of validation (validation is not finished before report)
  • test code for pages and methods defined
  • Verify validation summary within reload submission condition
  • Update all-count when validating submission
  • Small refactoring to improve code readability

Structure ZOOMA calls into InjectTool

Is your feature request related to a problem? Please describe.
Zooma tasks are performed to fill-up dictionary tables. There are defined in InjectTool in zooma django application, in both helpers and tasks modules; however they have to be called manually using django management commands after a dictionary terms is added into database, for example:

docker-compose run --rm uwsgi python manage.py annotate_countries

Describe the solution you'd like
Zooma tasks need to be called after data import, for example using celery chains and considered as a part of data-loading step.

Describe alternatives you've considered
Zooma tasks could be called also by celery periodic tasks with regular time schedules and ideally manually by user. Where user can call those task (in which page) need to be defined

Additional context
Call to zooma API are maintaned into IMAGE-ValidationTool package, maybe the zooma.helper module could be updated in order to import such code. Moreover, solutions like mapping to a default ontology is already defined in IMAGE-ValidationTool. Ideally we have to:

  • fix issues with celery-flower - pinning kombu version
  • Update IMAGE-ValidationTool to the last version
  • Model annotate_developmental_stage and annotate_physiological_stage
  • Update zooma.helpers to use image_validation module
  • Call zooma tasks after data load
  • Define periodic task in order to call zooma regularly
  • zooma tasks should be mutually exclusives
  • Designing views where a user can call zooma tasks.
  • Designing views where a user can set manually an ontology term (should instead have a django group able to admin only those tables)?
  • Send notification to admin when a new term is added or is missing - postponed: using admin to manage term
  • Solve the task exlusive bug
  • Test all this stuff

Upgrade to the last django LTS version (2.2)

Is your feature request related to a problem? Please describe.
Django 1.11 will reach its end of life in April 2020. Before this date will be useful to upgrade django to the last version in order to facilitate future maintenance

Describe the solution you'd like
Change django version in requirements.txt and fix all test that will not works. Url refactoring could be addresses in order to exploit the last django features

Describe alternatives you've considered
This task should be addressed on the final stages of InjectTool development

Additional context
Those are the steps that need to be addressed:

  • Change django version in requirements.txt an set to the actual LTS version (2.2)
  • Apply migrations where possible
  • Fixing things after migrations (will 3rd party modules works with new django version?)
  • Upgrade urls.py sintax accordingly django-2.2
  • Test things as usual
  • General test of InjectTool (functional testing?)

Export BioSamples ids after Submission

Is your feature request related to a problem? Please describe.
Will be useful to download BioSamples ids after a successful submission. People should export all BioSample id data on a CSV, maybe this could be derived from the edit data page

Describe the solution you'd like
Donload the Edit data page content in a CSV

Describe alternatives you've considered
Data could be derived from the CDP, for instance by quering for the proper dataset and using the API to have own ids, but retrieving them from InjectTool will be more easier

Additional context

  • collect data for the submission process
  • generate a CSV file for download
  • test the export process itself

:sparkles: search and track orphaned BioSamples IDs

Is your feature request related to a problem? Please describe.
During previous submission, we have seen issues like records duplication for some submitted Animal/Samples. Moreover there should be the possibility to remove a submitted biosample id. Regarding the duplication issue, records can't be deleted using InjectTool, since they never be submitted from InjectTool. The main intent of this issue now become notify admin about data not managed by InjectTool. Since is not possible to remove a record from BioSample, the release date should be postponed with a lot of years

Describe the solution you'd like
A new background tasks should monitor for ophaned BioSamples records, and notify to the admin user that there are issues in Submitted data. Then a command will postpone the published date for ophan records. This command will not be executed by InjectTool, the BioSamples removal is not described in the project aims, so it need to be requested explicitely by the user.

Describe alternatives you've considered
A user could load its data and then remove records from InjectTool while preserving them into BioSamples: for such cases orphaned records need to be ignored

Additional context

  • define a task to monitor orphaned records
  • filter record accordingly domains (record that I could change)
  • define a table in which track orphaned records
  • define a table in which track records to ignore
  • notify admin about issues in IMAGE data
  • test how the release data trick affect the returned results
  • define a submission task for orphaned samples
  • track orphaned submission status with pyUSIrest
  • write a magement command to deal with this stuff

country should't be added when importing data

Is your feature request related to a problem? Please describe.
Uploading data let people to add countries during breeds creation. This could lead to insert mispelled countries or terms that can't be annotated using the chosen ontologies.

Describe the solution you'd like
No new countries can be generated while importing data

Describe alternatives you've considered
Things could remain as they are, however could be difficult to manage ontologies and data manually

Additional context
This aspect is described also in #50. Ideally those things need to be addressed

  • prevent to create new countries when creating breeds
  • assert that countries are in database before data load (like check_species)

Move InjectTool URLs outside /image/ location

Is your feature request related to a problem? Please describe.
InjectTool should be accessible without a location, by providing only a domain name

Describe the solution you'd like
The /image/ location should be removed from all its occurrences, from NGINX and from Django configuration files

Describe alternatives you've considered
Things could remain like this, however removing the location will be easier from the user point of view

Additional context

  • remove /image/ location from django settings
  • remove /image/ location from NGINX
  • update docs
  • update ansible role

Fix configuration issue between NGINX and InjectTool

Describe the bug
After installing InjectTool on production environment with SSL, we have issues in using websocket, affecting the real time status update: within a https request I can't do a request using ws, I have to use wss protocol

To Reproduce
Go to a submission detail page (for example, this) then call a task like validation and inspect the javascript console log

Expected behavior
Urls need to work as expected with SSL and without SSL. No console errors need to be displayed

Additional context
here are the steps required:

  • Fix javascript in submission_detail.html template in order to determine the correct protocol relying on user request
  • Fix ansible NGINX recipe in order to serve such request
  • Consider if docker nginx configuration could be simplified according new needs

:bug: InjectTool cannot submit data into EBI BioSamples

Describe the bug
BioSample API submission system was changed and InjectTool cannot submit a sample into BioSamples

To Reproduce
Steps to reproduce the behavior:

  1. Load new data
  2. Validate new data (#119 need to be solved)
  3. Submit data into BioSamples

Expected behavior
Valid data need to be submitted to BioSamples

Errors

  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 706, in urlopen
    chunked=chunked,
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 421, in connect
    tls_in_tls=tls_in_tls,
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 450, in ssl_wrap_socket
    sock, context, tls_in_tls, server_hostname=server_hostname
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/usr/local/lib/python3.6/ssl.py", line 817, in __init__
    self.do_handshake()
  File "/usr/local/lib/python3.6/ssl.py", line 1077, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/local/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 756, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='submission-test.ebi.ac.uk', port=443): Max retries exceeded with url: /api/ (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/uwsgi/image/biosample/tasks/submission.py", line 309, in run
    submission_helper.read_token()
  File "/var/uwsgi/image/biosample/tasks/submission.py", line 141, in read_token
    self.root = pyUSIrest.usi.Root(auth=self.auth)
  File "/usr/local/lib/python3.6/site-packages/pyUSIrest/usi.py", line 46, in __init__
    self.get(self.api_root)
  File "/usr/local/lib/python3.6/site-packages/pyUSIrest/client.py", line 329, in get
    response = super().get(url)
  File "/usr/local/lib/python3.6/site-packages/pyUSIrest/client.py", line 163, in get
    response = self.session.get(url, headers=headers, params=params)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='submission-test.ebi.ac.uk', port=443): Max retries exceeded with url: /api/ (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))

Additional context
Submission API endpoint are changed and pyUSIrest, a dependency required to data submission, need to be upgraded. Submission need to be tested using testing and production BioSamples endpoints

Add CI test

Is your feature request related to a problem? Please describe.
Will be helpful having CI system in order to perform testing and misuring things like code coverage. Will be great also thinking about compiling sphinx documentation while pushing code

Describe the solution you'd like
Travis CI need to be set up for InjectTool system

Describe alternatives you've considered
People could perform test by theirselves. I could check tests before accepting PR in my revision. However, having those test will be helpful

Additional context

  • set up Travis CI
  • test for code coverage
  • adding badges
  • add scrutinizer configuration for coverage
  • think about readthedocs

Inject tool functionality to batch update records and batch remove records prior to a submission

Is your feature request related to a problem? Please describe.
Add ability to batch update records on upload of another file, will need appropriate warnings, should display information that has changed and ask whether they want to overwrite, should add any new samples, will require revalidation. Must retain any already assigned biosamples.

Describe the solution you'd like
With each file submission, should be able to batch remove private samples from the submission, so that from a dump file the user can decide certain samples should not form part of the submission, these need to be deleted from the Inject tool databases as we don't have permision to hold these, perhaps should ask the user whether to store the rejected IDs so that they can be exlcuded from future updates, otherwise the submitter will have to remove them every time they upload a new file

Things to be addressed:

  • After batch remove for samples, call reset_all_count even for ValidationSummary(type='sample')
  • After batch removal, the remaining validation results are still valid. Summary could be updated directly relying on database - not so important, could be a new feature
  • Prevent BatchUpdate and BatchDelete relying on submission status (ex. WAITING)
  • Deal with DictionaryTable update - hard to address, will require a new PR
  • When updating a series of fields, being able to update other fields with the old validation (don't require another validation to update other errors) - related to derive summaries from database - will require a new PR

:recycle: PasswordResetView doesn't return errors if the supplied email doesn't exists

Is your feature request related to a problem? Please describe.
While resetting the password, the system doesn't return any error message if the supplied email doesn't exists. This is described in PasswordResetView) class and the reason of such behaviour is to prevent information leaking to potential attackers.

Describe the solution you'd like
The system could provide an error message in this case, and the suggested way to accomplish this is to subclass the PasswordResetForm class and change the form_class attribute of PasswordResetView, in order to refer to the modified class. PasswordResetForm has a get_users method that could be modified in order to change the default behaviour, while in this stackoverflow question is described a solution using the validator form method. However, as suggested here, this could introduce a leak that could be exploited to infer username e-mails. This could be reduced by replying with 429 Too Many Requests using django-ratelimit, but this implies that the ip addresses are forwarded correctly, since the django application is behind a proxy. A method to infer the client ip address is described here while here are described other security considerations regarding this module

Describe alternatives you've considered
Ths system could remain like this and a user could contact the administrator to get info about his user credentials

Additional context
Here are the steps that need to be addressed

  • Change the PasswordResetForm class in order to raise a error in validator method
  • Set the django.countrib.auth.PasswordResetView to use the new form_class attribute
  • ensure that the client ip addresses are forwarded correcty between reverse proxies
  • get the client ip address, as described here
  • install django-ratelimit dependency and set rate limit to PasswordResetView using a decorator or a class mixin
  • test the rate-limited url

Batch submission to Biosamples

Is your feature request related to a problem? Please describe.
Submitting to Biosamples a lot of data within a single submission using pyUSIrest is not so performant, because first all data are stored in memory as described in cnr-ibba/pyUSIrest#3, and then a big submission could imply overheads on USI servers.

Describe the solution you'd like
Submission to biosample should occur in batches, letting InjectTool to deal with all the stuff. From the user point of view, there will be no significant changes on the UI. This batch submission can also be parallelized using celery chords which could let the user submit all his data in the same token validation interval. submissions.tasks could be improved as suggested by scrutinizer

Describe alternatives you've considered
The parallel submission could be ignored for the moment, since we are able to submit about 2 samples/sec and it could be acceptable to receive a mail to generate a new token in order to complete the submission for a few thousands of samples

Additional context
Those will be ideally the things that need to be addressed by this issue:

  • submission.tasks refactoring as suggested by scrutinizer
  • remove the biosample_submission_id attribute of image_app.models.Submission class. USI submission will be modeled in biosample.models
  • model new models in biosample.models (track here all USI submission? how I can patch a sample without searching USI data?)
  • Write a SubmissionBacth class to deal with a single submission batch (could be called in chords?)
  • Write a SubmissionTask to deal with an entire submission using SubmissionBatch
  • model batch size as a parameter - how I could divide objects? could related objects belong to distinct batches?
  • refactor FetchStatus task accordingly with new modules
  • model biosample.management.commands tests
  • mock and test things as usual
  • delete old-modules
  • refactor code and place things in their appropriate module
  • test a biosample submission with a small batch size
  • model relationship between animals
  • generate fake submission data using Faker
  • test a big submission using batches
  • check documentation

Complete project documentation

Is your feature request related to a problem? Please describe.
Project is not completely documentated both for the developer point of view and for the final user point of view. Policy document need to be updated with the relevant information. The instructions on how to complete a submission is not yet finished. Description like submission status and submission detail page are in the generic about document and maybe shuold be placed elsewhere. Sphinx documentation is not yet completed

Describe the solution you'd like
Documentation need to be completed (and revised). Maybe it need to be structered in arguments (something like having a doc describing what the different section does, a doc to describe the import step, a doc describing the aim, a doc with troubleshooting, ...) with links in order to browse in different sections. The sphinx developer documentation should be linked somewhere in the site or in GitHub readme

Describe alternatives you've considered
Sphinx documentation could be considered internal and not addressed to the user, its redaction can be delayed

Additional context
Here are points that should be addressed:

  • Revise and agree on privacy policy document
  • Describe the full submission steps
  • Describe the main pages (views) of the site. Try to link each page to the appropriate help page
  • Revise the whole documentation in browsable sections - side bar menu?
  • Think about a FAQ page
  • Complete the sphinx documentation
  • An illustration picture for various docker images and how they interact with each other in the about page
  • Similar to the status change, a diagram demonstrates the workflow of statuses would be useful
  • Submission statuses and Submission details page documentation needs to be reviewed again as tool itself has not been finalized
  • Celery and Redis perform long tasks, where long is confusing. Not time-consuming as monitoring statuses is quick but frequently. Probably background or monitoring?
  • Validation, submission tasks are done by celery, review the text
  • Your registration is almost done page: Could not find the source code Biosamples should be BioSamples. contact EBI => need to specify test aap server in the link and later change to production aap server
  • Some screenshots may need to be redone. Remove tabs from browser, show only the page content
  • Update the dashboard page description
  • document ontology annotations in sphinx
  • describe synonyms management in sphinx

Check mapping between UID column names and metadata rules

Is your feature request related to a problem? Please describe.
The IMAGE_sample_ruleset_version_1.5_working was annotated with some comments and I need to ensure if they are updated or not

Describe the solution you'd like
Check IMAGE-metadata ruleset columns with UID and Biosample attributes conversion

Describe alternatives you've considered
Maybe UID conversion is up to date; in such case i need to fix my column in xls file

Additional context
Task list:

  • check cryoweb columns, expecially annotated columns in IMAGE_sample_ruleset_version_1.5_working
  • check UID to biosample column conversion
  • fix a issue when sending message to websocket without validation data
  • sample.storage and sample.storage_processing should be modeled as Enum accordingly to metadata rules
  • add help text for sample.availability
  • Minor refactoring

:recycle: The add an Existing AAP profile page is not clear

Is your feature request related to a problem? Please describe.
It is not clearly stated that Add and existing AAP profile from accounts.viewsActivationComplete page requires to add an AAP profile already defined within the image manager account

Describe the solution you'd like
Having a custom AAP profile could be useful if you want to use the same user for manage biosample submission outside InjectTool. Ideally injectool need to ask for user credentials in order to verify that the user exists in BioSamples environment and for get the user_id. Then InjectTool create a new team using the manager user and add the user_id to the new Team (in other words, skip the user creation step, and add custom user to a new InjectTool managed team.

Describe alternatives you've considered
The adding AAP profile page could be removed, and accounts.views.ActivationComplete could redirect directly to the new BioSamples user generation. However an avanced user could want to have an unique account to manage his data within and outside InjectTool

Additional context

  • fix the biosample.views.RegisterUserView template title (is not a new account)
  • generate a token from user data (AAP username and password)
  • get user reference as described here or in pyUSIrest docs
  • create a new Team (like registering a new user does)
  • add the user reference to team
  • revise documentation accordingly
  • test stuff with unittests
  • test stuff in BioSamples test environment
  • what happens if I remove a submission from my data before injecttool does all stuff?

:bug: `biosample.views.CreateUserView` doesn't catch all errors from API

Describe the bug
biosample.views.CreateUserViews doesn't catch all exceptions from external API site, for example when there are issues with team creation

To Reproduce
Create a user and simulate a issue when creating a team like this:

{'timestamp': 1569592802046,
 'status': 500,
 'error': 'Internal Server Error',
 'exception': 'org.springframework.web.client.ResourceAccessException',
 'message': 'I/O error on POST request for "https://explore.api.aai.ebi.ac.uk/domains/": No content to map due to end-of-input\n at [Source: ; line: 1, column: 0]; nested exception is com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input\n at [Source: ; line: 1, column: 0]',
 'path': '/api/user/teams'}

Expected behavior
With an error like this, I need to save user information an send a message to contact admins

Additional context

  • ensure that AAP registration process is still the same
  • solve pyUSIrest related issues, for example cnr-ibba/pyUSIrest#4
  • model an issue in biosample.tests at the create team step
  • catch error and notify users and admin
  • update image.settings with more ADMIN adresses

Complete status update doesn't update submission detail buttons

Describe the bug
Buttons don't changes their disabled class when submission goes in COMPLETED status

To Reproduce
Start InjectTool locally, and display a submission detail page. Then open python terminal with:

$ docker-compose run --rm uwsgi python manage.py shell

and simulate submission statues changes:

import asyncio
from common.helpers import send_message_to_websocket

from common.constants import WAITING, COMPLETED, SUBMITTED, STATUSES
from image_app.models import Submission

# get a submission object. Pk is the last part of the detail URL
submission_obj = Submission.objects.get(pk=1)

# define a fake method to simulate status update and message send
def fake_update(status):
    # submission_obj is taken outside this function
    submission_obj.status = status
    submission_obj.save()

    # send a message
    asyncio.get_event_loop().run_until_complete(
        send_message_to_websocket(
            {
                'message': STATUSES.get_value_display(status),
                'notification_message': submission_obj.message
            },
            submission_obj.id
        )
    )

# simulating status updates. This will set buttons correctly
fake_update(WAITING)

# simulating submitted steps. Buttons are ok
fake_update(SUBMITTED)

# simulating Completed phase. Buttons are equal to Waiting and Submitted phase
fake_update(COMPLETED)

# reloading page load buttons correctly

Expected behavior
Buttons are in the same status as for reloading the page

:arrow_up: upgrade dependencies and fix security issues

Is your feature request related to a problem? Please describe.
Some dependencies need to be updated to resolve security issues

Describe the solution you'd like
Packages need to be upgraded to recommended version

Describe alternatives you've considered
Updates could be ignored for certain packages

Additional context
Remember to test changes before approving PR

Error from USI API should be modeled as ValidationResult

Is your feature request related to a problem? Please describe.
An error which occurs in biosample submission is correctly reported by email, but such information could be not retrieved from Edit submission page.

Describe the solution you'd like
This type or error should have a precise category in validation.models, in order to give useful information to the user against a precise record. However, this is not an issue in validation intended as a problem in user data, but a problem in InjectTool itself. Now I could send a json dictionary for each failed sample. This could be annoying when thousand errors occur. This type of errors should be avoided: we implemented the validation steps to avoid certain types of errors. This type of errors should set the submission with ERROR status.

Describe alternatives you've considered
We could model USI errors in other tables, however by using the validation.models tables we could exploit the validation tags and messages.

Additional context
Those are the steps need to be done:

  • simplify email reports to display only sample ids or names
  • model validation.models.ValidationResult.message for USI errors like image_validation does
  • define a new badge for USI error (or choose the danger one)
  • ensure USI messages are written in validation.models.ValidationResult.messages
  • set submission.status = ERROR
  • tests for USI error messages

django-tables and django filters integration

Is your feature request related to a problem? Please describe.
Diango tables and filter could be used to sort intems in tables or filtering record of a particoular interest (like the data with errors in validation or the terms belonging to a particoular language). Could be useful also in selecting items

Describe the solution you'd like
Those two django modules should be adopted an integrated in views, like translation tables and submission edit view

Describe alternatives you've considered
We could manipulate page views using custom queryset managed bt get methods, however this is already implemented in django modules

Additional context
I have already implemented a fake project using this modules here

:arrow_up: upgrade pyUSIrest dependency

Is your feature request related to a problem? Please describe.
pyUSIrest module should be upgraded to the last version

Describe the solution you'd like
pyUSIrest now implement iterators when browsing objects, so it should be more performant and could use less memory inside InjectTool environment. Moreover, exception were customized (no more need to catch low-level exception) and some issues were solved. See cnr-ibba/pyUSIrest#3 for more information

Describe alternatives you've considered
InjectTool could remain like this, however new pyUSIrestfeatures could facilitate development

Additional context

  • Upgrade library version
  • change import path for USI objects
  • refer to proper exceptions
  • remove older code from injectTool
  • describe the BioSample submission system in Sphinx

Improve code structure as suggested by scrutinizer

Is your feature request related to a problem? Please describe.
Code could be improved relying scrutinizer in order to better describe things and facilitate developing or fixing conflicts

Describe the solution you'd like
Code need to be refactored at least for the rated F methods

Describe alternatives you've considered
This is not critical for InjectTool release, it could be addressed as the last things or when adding stuff to solve other issues

Additional context
Refer to scrutinizer report to have an idea on what to do

Submit VCF files

My group is part of the IMAGE project and we have to publish VCF files either on IMAGE or on EVA.
In the help of IMAGE I found, that IMAGE Injection tool is also using EVA https://www.image2020genebank.eu/help
So, it is possible to submit VCF files via the IMAGE injection tool?
We only found three methods in order to submit data to IMAGE.

We need the feature to upload/submit VCF files via the IMAGE injection tool.

Otherwise we would submit our VCF files to EVA directly.

Switch from biosamples test environment to production

Is your feature request related to a problem? Please describe.
This feature is necessary for final release. All stuff related to biosamples test environment should be replaced to point the final production environment

Describe the solution you'd like

  • add info for in AAP registration and clearly state to track the password and explain how they can recover it
  • remove the internal user id from the generate-token form
  • pyUSIrest library should use to production environment - will require a dedicated library update
  • pyUSIrest urls need to be overrided by InjectTools configuration urls
  • the context validation part should consider the production environment
  • the injecttool test banner in all pages should be removed
  • register the manager user to production authentication site
  • ideally, those changes should be defined in config file, in order to maintain the testing and developing environment different from the production.
  • update ansible role
  • add documentation for biosample variables

Describe alternatives you've considered
This aspect is a required feature of InjectTool. Alternatives could be on how we manage this transition, however this transition have to be implemented

Additional context
This stuff is required for milestone 3. Ansible roles could be update to address this

Better model relationship objects and validation

Is your feature request related to a problem? Please describe.
Last image_validation update has stuff to validate relationship between objects. Moreover image_app.models.Animals can't deal with parents (that are relationship)

Describe the solution you'd like
Model relationship between animals and try to check them in validation

Describe alternatives you've considered
I could ignore this, but I plan to model animal relationship. So is time to face it and with its validation. Since data are submitted to test server, I need to deal with temporary urls

Additional context

  • model relationship between animal in image_app.models.Animal.to_biosample()
  • set temporary urls for context validation (remember to change them in production)
  • update validate.helpers.MetaDataValidation to check relationship
  • setup tests (with biosample mock) and keep coverage nearly the same
  • call check ruleset during MetaDataValidation setUp
  • Define custom exception for check_ruleset fail. Submit a mail to user (or admins ?)
  • catch exceptions from image_validation context validation
  • try to do a real submission to biosample

:bug: InjectTool cannot validate data using OLS

Describe the bug
EBI Ontology Lookup Services has changed and is not possible to validate data anymore. In particular some ontologies in EFO were changed and can't be validated with InjectTool. In addition, the current OLS will be moved to OLS4 in October 23

To Reproduce
Steps to reproduce the behavior:

  1. Load data into InjectTools
  2. Start a validation task

Expected behavior
Validation need to pass for data compliant to IMAGE-metadata

Screenshots
Screenshot_20230608_135251

Additional context
IMAGE-metadata and IMAGE-ValidationTool need to be updated in order to support the new OLS services

Managing batch updates through FormSet

Is your feature request related to a problem? Please describe.
This should resolve issues related to batch update, for example fields derived from Foreign Keys, as described in #16

Describe the solution you'd like
Using formset, users should be able to edit multiple form objects in the same time. Ideally we could address issues related to foreign keys (multiple select objects) that the current implementation can't address. Moreover, we could change more than 1 value at time (consider solving location issues with accuracies at the same time). Supporting pagination could solve performance issues. Would be great to select objects to update with no reference to errors of the same type

Describe alternatives you've considered
The current implementation is able to edit the same text based field for objects with same issues. Issues that cannot be solved with batch update should be solved one-by-one

Additional context
Some links describing formsets:

Celery worker dies after scaling down

Describe the bug
Celery worker dies after scaling down, without raising any exceptions. It simply scale down to 0 workers

To Reproduce
It doesn't seem possible to predict when and why it dies, sometimes happens after months, sometimes after days

Expected behavior
Celery worker need to be up all the times

Additional context
Try to update celery packages or fix the number of celery workers (no more autoscaling)

Issues in template loading are difficult to understand

Is your feature request related to a problem? Please describe.
Errors in template loading are diffucult to understand. For example:

  • errors are not clear when a template column is missing (should them be validated when uploading file like CRBanim?)
  • specie in breed and animal sheet should be the same
  • Buttons for uploading errors are not updated (page need to be reloaded) - is the exception raised before sending websocket messages?
  • Using (specie) synonyms for uploading data is not supported and not clear
  • Integer names should be recorded as integer

Describe the solution you'd like
Handle those errors if they aren't clear by catching exceptions to describe better; Try to recover data using synonyms. Try to understand if same behaviour could be applied with other template paths. Fix other error types

Describe alternatives you've considered
Validating template using InjectTool seems to be the most efficent way to upload data. User should fix file and reload until data are loaded without errors

Additional context

  • Django need to be upgraded to the latest patch
  • define a is_valid method like CRBanim and apply it during form upload
  • check columns in is_valid method called when uploading file
  • ensure species in both breed and animal sheets
  • ensure messages through websocket called when raising import errors
  • ensure buttons updated when errors are found
  • removing spaces from cells
  • converting float in integer if is_integer is True
  • Uploading data using specie synonyms
  • adding synonims if no specie found

add function to import from Template file

Is your feature request related to a problem? Please describe.
InjectTool lacks of the import Excel Template feature

Describe the solution you'd like
Loading InjectTool excel template file when starting a new submission

Describe alternatives you've considered
Template loading is a InjectTool requirement, and this type of data are difficult to convert in other input format like cryoweb or crbanim template file

Additional context
Those are the point that need to be addressed to upload excel file:

  • fix last UID columns Sample.physiological_stage, Sample.preparation_interval
  • decide how to map columns into UID
  • deal with provided ontologies (don't trust and discard)
  • start a new template application
  • parse a text like "3 years" in an a time_unit field
  • write down helpers module to define relations between objects and fill UID
  • write tasks method to upload data asynchronously
  • model language relation like cryoweb and crbanim applications
  • test stuff works and coverage remain nearly the same
  • remove check from create submission form and support template loading as a valid option
  • refactor common create stuff as image_app.helpers functions
  • update docs

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.