umccr / cttso-ica-to-pieriandx Goto Github PK

Transfers data from GDS / creates PierianDx case / run and informatics job for each sample.

License: MIT License

Python 74.17% Dockerfile 1.33% Shell 9.13% JavaScript 0.34% TypeScript 15.03%

cttso-ica-to-pieriandx's Introduction

ctTSO ICA to PierianDX

This is a quick intermediary piece of software designed to be one-day obsolete. That day will be a good day.

In the meantime we can take a bulk list of samples in csv format,
find the matching secondary analysis runs on ICA by matching up the library ids,
pull down the data from ICA and transfer over to PierianDx's s3 bucket.

The script then creates a case, sequencing run and informatics job on PierianDx.

Installation:

Option 1: Recommended

No installation required - just run things

Checkout our deploy folder for how we automate this service in AWS.

Option 2: (Installation through conda)

Few hacky bits

Clone this repo

git clone [email protected]:umccr/cttso-ica-to-pieriandx.git

Enter this repo and checkout version

cd cttso-ica-to-pieriandx
git checkout v1.0.0

Create the conda env

conda env create \
  --name 'cttso-ica-to-pieriandx' \
  --file 'cttso-ica-to-pieriandx-conda-env.yaml'

Activate the conda env

conda activate cttso-ica-to-pieriandx

Run setup.py whilst inside conda env

python setup.py install

Copy across references whilst inside conda env to the right location

rsync --archive \
  "references/" \
  "$(find "${CONDA_PREFIX}" -type d -name "references")/"

Command usage

ctTSO ICA to PierianDx

usage: cttso-ica-to-pieriandx.py [-h] [--ica-workflow-run-ids ICA_WORKFLOW_RUN_IDS] [--accession-json ACCESSION_JSON]
                                 [--accession-csv ACCESSION_CSV] [--verbose]

Given an input.json file, pull information from gds, upload to s3 for a single sample,
 create a case, run and informatics job

Given an input csv, pull information from gds, upload to s3 and create a case, run and 
 informatics job is for all samples in the csv.  
 
One may also specify the ica workflow run ids. If these are not specified, the list of workflow runs are searched to find
the workflow run on the bases of the library name.  

The following environment variables are expected:
  * ICA_BASE_URL
  * ICA_ACCESS_TOKEN
  * PIERIANDX_BASE_URL
  * PIERIANDX_INSTITUTION
  * PIERIANDX_AWS_REGION
  * PIERIANDX_AWS_S3_PREFIX
  * PIERIANDX_AWS_ACCESS_KEY_ID
  * PIERIANDX_AWS_SECRET_ACCESS_KEY
  * PIERIANDX_USER_EMAIL
  * PIERIANDX_USER_AUTH_TOKEN

optional arguments:
  -h, --help            show this help message and exit
  --ica-workflow-run-ids ICA_WORKFLOW_RUN_IDS
                        List of ICA workflow run IDs (comma separated), if not specified, script will look through the workflow run list for matching patterns
  --accession-json ACCESSION_JSON
                        Path to accession json containing redcap information for sample list
  --accession-csv ACCESSION_CSV
                        Path to accession csv containing redcap information for sample list
  --verbose             Set log level from info to debug
  
example usage:
./cttso-ica-to-pieriandx.py --accession-csv samples.csv
./cttso-ica-to-pieriandx.py --accession-json samples.json

Check Status

usage: check-pieriandx-status.py [-h] [--case-ids CASE_IDS] [--case-accession-numbers CASE_ACCESSION_NUMBERS]
                                 [--verbose]

Given a comma-separated list of case accession numbers or case accession ids,
return a list of informatics jobs, the informatics job ids and the status of each.
If both case ids and case accession numbers are provided, an outer-join is performed.

The following environment variables are expected:
  * PIERIANDX_BASE_URL
  * PIERIANDX_INSTITUTION
  * PIERIANDX_USER_EMAIL
  * PIERIANDX_USER_AUTH_TOKEN


optional arguments:
  -h, --help            show this help message and exit
  --case-ids CASE_IDS   List of case ids
  --case-accession-numbers CASE_ACCESSION_NUMBERS
                        List of case accession numbers
  --verbose             Set logging level to DEBUG

Download reports

usage: download-pieriandx-reports.py [-h] [--case-ids CASE_IDS] [--case-accession-numbers CASE_ACCESSION_NUMBERS]
                                     --output-file OUTPUT_FILE [--pdf] [--json] [--verbose]

Given a comma-separated list of case accession numbers or case accession ids,
download a list of reports to the zip file specified in --output-file
If both case ids and case accession numbers are provided, an outer-join is performed.
Must specify one (and only one) of pdf and json. Parent directory of output file must exist. 
Output file must end in '.zip'.  

The zip file will contain a directory which is the nameroot of the zip file,
The naming convention of the reports is '<case_accession_number>_<report_id>.<output_file_type>'

The following environment variables are expected:
  * PIERIANDX_BASE_URL
  * PIERIANDX_INSTITUTION
  * PIERIANDX_USER_EMAIL
  * PIERIANDX_USER_AUTH_TOKEN

optional arguments:
  -h, --help            show this help message and exit
  --case-ids CASE_IDS   List of case ids
  --case-accession-numbers CASE_ACCESSION_NUMBERS
                        List of case accession numbers
  --output-file OUTPUT_FILE
                        Path to output zip file
  --pdf                 Download reports as pdfs
  --json                Download reports as jsons

Environment variable hints

ICA_BASE_URL

Base url to ica endpoint.
Set to https://aps2.platform.illumina.com

ICA_ACCESS_TOKEN

The access token for the project context that contains the files on ICA
Run ica-context-switcher --scope read-only --project-name <project-name> to add ICA_ACCESS_TOKEN to your environment

PIERIANDX_BASE_URL

For prod this is https://app.pieriandx.com/cgw-api/v2.0.0.
For dev this is https://app.uat.pieriandx.com/cgw-api/v2.0.0

PIERIANDX_INSTITUTION

For prod this is melbourne
For dev this is melbournetest

PIERIANDX_AWS_REGION

Set to us-east-1 for both dev and prod accounts

PIERIANDX_AWS_S3_PREFIX

Set to s3://pdx-xfer/melbourne for prod
Set to s3://pdx-cgwxfer-test/melbournetest for dev

PIERIANDX_AWS_ACCESS_KEY_ID

Can be found in Keybase for both dev and prod accounts

PIERIANDX_AWS_SECRET_ACCESS_KEY

Can be found in Keybase for both dev and prod accounts

PIERIANDX_USER_EMAIL

Your email address used to log in to PierianDx

PIERIANDX_USER_AUTH_TOKEN

Your password used to log in to PierianDx

Launching via AWS Lambda:

Assumptions:

Assumes you're in the development account (843407916570)
- You can run aws sts get-caller-identity to confirm
- This will collect the rolling ICA_ACCESS_TOKEN for the development project in ICA when running the batch command.
Deployment in production coming soon 🚧
- This will collect the rolling ICA_ACCESS_TOKEN for the production project in ICA when running the batch command.

The AWS Lambda call expects two input parameter arguments:

ica_workflow_run_id
- This is the ica workflow run id for the cttso workflow
accession_json_base64_str
- This is a base64 encoded version of the accession json object

Single sample example

The example below shows an example deployment of the lambda

De-identified sample

#!/usr/bin/env bash

# Set to fail
set -euo pipefail

## Set inputs
ica_workflow_run_id="wfr.55ee9135cd88442b8810d7224c88c03f"
accession_json_base64_str="$(
  jq \
    --raw-output \
    '@base64' <<< \
  '
    {
      "sample_type":"Validation",
      "disease":285645000,
      "is_identified":False,
      "accession_number":"SBJ00001_L2100001",
      "study_id":"Validation",
      "participant_id":"SBJ00001",
      "specimen_type":119361006,
      "external_specimen_id":"pDNA_Super_1085",
      "date_accessioned":"2021-10-04t09:00:00+1000",
      "date_collected":"2021-10-02t09:00:00+1000",
      "date_received":"2021-10-03t09:00:00+1000",
    }
  ' \
)"

# Get payload
payload="$(jq --raw-output \
              --arg ica_workflow_run_id "${ica_workflow_run_id}" \
              --arg accession_json_base64_str "${accession_json_base64_str}" \
              '{
                 parameters: {
                   ica_workflow_run_id: $ica_workflow_run_id,
                   accession_json_base64_str: $accession_json_base64_str
                 }
               }' <<< {})"

# Call lambda - write output to stdout
aws lambda invoke \
  --cli-binary-format "raw-in-base64-out" \
  --function-name "ctTSOICAToPierianDx_batch_lambda" \
  --payload "${payload}" \
  /dev/stdout

Identified sample

#!/usr/bin/env bash

# Set to fail
set -euo pipefail

## Set inputs
ica_workflow_run_id="wfr.55ee9135cd88442b8810d7224c88c03f"
accession_json_base64_str="$(
  jq \
    --raw-output \
    '@base64' <<< \
  '
    {      
      "sample_type": "Validation",
      "disease": 285645000,
      "is_identified": True,
      "accession_number": "SBJ00001_L2100001",
      "specimen_type": 119361006,
      "external_specimen_id": "pDNA_Super_1085.1",
      "date_accessioned": "2021-10-04t09:00:00+1000",
      "date_collected": "2021-10-02t09:00:00+1000",
      "date_received": "2021-10-03t09:00:00+1000",
      "date_of_birth": "2021-10-04t09:00:00+1000",
      "first_name": "John",
      "last_name": "Doe",
      "mrn": "pDNA_Super_1085",
      "hospital_number": "99",
      "facility": "PeterMac",
      "requesting_physicians_first_name": "Dr",
      "requesting_physicians_last_name": "DoLittle"
    }
  ' \
)"

# Get payload
payload="$(jq --raw-output \
              --arg ica_workflow_run_id "${ica_workflow_run_id}" \
              --arg accession_json_base64_str "${accession_json_base64_str}" \
              '{
                 parameters: {
                   ica_workflow_run_id: $ica_workflow_run_id,
                   accession_json_base64_str: $accession_json_base64_str
                 }
               }' <<< {})"

# Call lambda - write output to stdout
aws lambda invoke \
  --cli-binary-format "raw-in-base64-out" \
  --function-name "ctTSOICAToPierianDx_batch_lambda" \
  --payload "${payload}" \
  /dev/stdout

Accession CSV format reference

The accession csv will have the following columns (all columns are reduced to lower cases with spaces converted to underscores):

Sample Type / SampleType / sample_type
- One of [ 'patientcare', 'clinical_trial', 'validation', 'proficiency_testing' ]
Indication / indication
- Generally the disease name
Disease / disease / disease_id (alternatively use 'disease_name')
- The SNOMED disease id
Disease Name / DiseaseName / disease_name # Optional (you can just set the 'disease_id' instead)
- The SNOMED disease name
Is Identified? / is_identified?

Deprecated - always set to false anyway.
- True | False
Requesting Physicians First Name / requesting_physicians_first_name # Optional (not used)
- First name of the requesting physician
Requesting Physicians Last Name / requesting_physicians_last_name # Optional (not used)
- Last name of the requesting physician
Accession Number / accession_number
- The Case Accession Number, should be <subject_id>_<library_id>
- i.e SBJ00123_L2100456
Specimen Label / specimen_label: # Optional
- Mapping to the panel's specimen scheme
- Default is primarySpecimen
Specimen Type / SpecimenType / specimen_type # Optional (alternatively use 'specimen_type_name')
- The specimen type SNOWMED id
Specimen Type Name / specimen_type_name # Optional (alternative use 'specimen_type')
- The specimen type SNOWMED name
External Specimen ID / external_specimen_id
- The external specimen ID
Date Accessioned / date_accessioned
- Date Time string in UTC time
Date collected
- Date Time string in UTC time
Date Received
- Date Time string in UTC time
Gender # Optional (default "unknown")
- One of [ unknown, male, female, unspecified, other, ambiguous, not_applicable ]

De-identified specific options

Study ID / StudyID / study_id
- Could be the name of the study
- Leave blank and falls back to sample type attribute?
Participant ID / ParticipantID / participant_id
- The subject ID
- Leave blank and falls back to accession number

Identified specific options

Date Of Birth:
- Patient's Date of birth, we just set this as the current date
First Name:
- Patient's first name, set always to John or Jane
Last Name:
- Patient's last name, set always to Doe
Mrn
- Patient's medical record number
Hospital Number
- Patient's hospital number (set to 99 currently)
Institution
- Institution of clinician
Requesting Physicians First Name
- First name of Clinician
Requesting Physicians Last Name
- Last nameof Clinician

Unused options

Ethnicity # Optional (default "unknown")
- One of [ hispanic_or_latino, not_hispanic_or_latino, not_reported, unknown ]
Race # Optional (default "unknown")
- One of [ american_indian_or_alaska_native, asian, black_or_african_american, native_hawaiian_or_other_pacific_islander, not_reported, unknown, white ]
Medical Record Numbers / medical_record_numbers # Optional (not used)
Hospital Numbers / hospital_numbers # Optional (not used)
Usable MSI Sites / usable_msi_sites # Optional (not used)
Tumor Mutational Burden (Mutations/Mb) / tumor_mutational_burden_mutations_per_mb # Optional (not used)
Percent Unstable Sites / percent_unstable_sites # Optional (not used)
Percent Tumor Cell Nuclei in the Selected Areas / percent_tumor_cell_nuclei_in_the_selected_areas # Optional (not used)

Contributing to this repository:

Please make all changes in a separate branch and then create a PR to the dev branch.

A PR should then be made from the dev branch to the main branch.

Please make sure you update the Changelog.md file with your changes before making a PR into the main branch. If the changes are under the deploy/cttso-ica-to-pieriandx-cdk folder, please update the deploy/cttso-ica-to-pieriandx-cdk/Changelog.md file.

cttso-ica-to-pieriandx's People

Contributors

Watchers

cttso-ica-to-pieriandx's Issues

Remove 'debug logic line in portal-helpers' code

Remove the following lines

## DEBUG
if subject_id == "SBJ00595" and library_id.startswith("L2100178"):
    print(mini_df)

Don't use RequestResponse for manual run workflows in launch_clinical_payloads.sh AND launch_validation_payloads.sh

This can result in duplicate batch submissions

Use job arn over job definition in lambda env var for batch submission

Related to aws/aws-cdk#26128 (comment)

Panel Type not being added to lambda payload when submitting pieriandx lambda

Panel Type not being considered when submission lambda.

Circular Dependency for LIMS Maker Stack

Error message
Circular dependency between resources: [cttsoicatopieriandxprodlimsmakerlambdastacklfBF1A5A5E, cttsoicatopieriandxprodlimsmakerlambdastacklftrigAllowEventRuleCttsoIcaToPieriandxPipelineStackProdcttsoicatopieriandxprodLimsMakerLambdaStagecttsoicatopieriandxprodlimsmakerlambdastackcttsoicatopieriandxprodlimsmakerlambdastacklfA1060767738057BF, cttsoicatopieriandxprodlimsmakerlambdastacklftrigCA91B158, cttsoicatopieriandxprodlimsmakerlambdastacklfEventInvokeConfig2D261CF4, cttsoicatopieriandxprodlimsmakerlambdastackLambdaExecutionRoleDefaultPolicy9FF70FF7, cttsoicatopieriandxprodlimsmakerlambdastackssmcdklambdaeventruleparameterC1114276, cttsoicatopieriandxprodlimsmakerlambdastackssmcdklambdaparameter157B45B6

Check pending cases first before merging lims data with pieriandx data

SBJ03366_L2300790 failed

Slack: https://umccr.slack.com/archives/C8CG6K76W/p1688192222630239

CloudWatch link

Create decision tree to document how panel type is chosen

Could not get pieriandx case id from job df to collect missing jobs



2023-07-13 03:14:53,348 - INFO     - lambda_code               - lambda_handler                           : LineNo. 1566 - Got 2 rows to replace
--
[ERROR] KeyError: "['pieriandx_case_id'] not in index"

Traceback (most recent call last):  

File "/var/task/lambda_code.py", line 1570, in lambda_handler    
pieriandx_job_status_missing_df = update_pieriandx_job_status_missing_df(
  pieriandx_job_status_missing_df, merged_df
)  

File "/var/task/lambda_code.py", line 759, in update_pieriandx_job_status_missing_df    

merged_df = merged_df[[  File 

"/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/frame.py", line 3813, in __getitem__    indexer = self.columns._get_indexer_strict(key, "columns")[1]  

File "/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/indexes/base.py", line 6070, in _get_indexer_strict    self._raise_if_missing(keyarr, indexer, axis_name)  

File "/var/lang/lib/python3.10/site-packages/pandas-1.5.3-py3.10-linux-x86_64.egg/pandas/core/indexes/base.py", line 6133, in _raise_if_missing    raise KeyError(f"{not_found} not in index")

Update lambda decision tree and documentation

Separate out 'is_validation_sample' from decision tree on which lambda (validation or RedCap/Clinical) path to go down.

Update cttso deploy readme to have the following diagrams and have decision logic to reflect these diagrams.

PierianDx column pieriandx_submission_time is not updated in cttso lims

pieriandx_submission_time is empty, and is causing issues with safe-guard against resubmissions.

Add log group to batch definition

Research Sample not completing since not considered validation but no redcap entry

Research sample that is not validation but has no redcap entry should be placed through the validation pipeline (not the clinical pipeline), fix code that is causing this issue.

PierianDx LIMS GSheet Query get_pieriandx_incomplete_job_df_from_cttso_lims_df needs improvement

EventClient type should be EventBridgeClient type

GH Actions check ensure changelogs have been updated before merging from dev into main

Merged row logging error doesn't explain which subject id, library id is not found

Error simply states "Cannot be found in merged df" when there is no merged row matching subject id, library id, portal run id.

Special characters in PierianDx password are causing token generation issues

Need to update documentation to inform users not to use special characters
Add an automatic check when running the script update_pieriandx_password.sh to check that the token generation is successful.

Lambda is causing submissions to be launched multiple times

Ensure negative controls are NOT uploaded to PierianDx

Prevent duplication of pieriandx accession numbers

If a lambda is accidentally launched twice, both batch jobs are invoked at the exact same time.

Two options to resolve this:

Launch batch jobs one at a time
Before launching a batch job - lambda ensures that no batch job exists in the queue?
When running a batch job, job checks that there are not multiple of this job running?

Invalid indices check fails when portal workflow finishes in early hours of morning

Portal Workflow Run is set to UTC time,

Pieriandx Case Creation date is a few hours behind (US / Eastern time).

When matching the correct workflow id to the pieriandx submission id, we use the portal UTC date but the pieriandx date may be the day before.

This means that the job will be attempted to be resubmitted as the lims lambda script dismisses the join between the portal data and the pieriandx submission as the portal workflow run id cannot be the day after the pieriandx submission.

Since we only get the date from PierianDx, we cannot convert from US Eastern to UTC timezone. Instead, grab the portal date, and convert to us eastern and then perform check

Add eventbridge disable if processing_df contains items in update_df

Disable event bridge trigger from lambda if processing df contains items that exist in the update_df (this shouldn't happen)

Samples relaunched over a period of three hours

Over the weekend we had the cttso lims launch 16 samples via 33 analyses.

Here is the table of launches from the lims script against subject id / library id combinations with each cell representing the pieriandx accession id generated

	18 March 1pm	18 March 2pm	18 March 3pm
SBJ03127_L2300341	SBJ03127_L2300341_001	SBJ03127_L2300341_002	SBJ03127_L2300341_003
SBJ03130_L2300344	SBJ03130_L2300344_001	SBJ03130_L2300344_002	SBJ03130_L2300344_003
SBJ03132_L2300346	SBJ03132_L2300346_001	SBJ03132_L2300346_002	SBJ03132_L2300346_003
SBJ03133_L2300347	SBJ03133_L2300347_001	SBJ03133_L2300347_002	SBJ03133_L2300347_003
SBJ03128_L2300342	SBJ03128_L2300342_001	SBJ03128_L2300342_002	SBJ03128_L2300342_003
SBJ03129_L2300343	SBJ03129_L2300343_001	SBJ03129_L2300343_002	SBJ03129_L2300343_003
SBJ03137_L2300351	SBJ03137_L2300351_001	SBJ03137_L2300351_002	SBJ03137_L2300351_003
SBJ00596_L2300355		SBJ00596_L2300355_001	SBJ00596_L2300355_002
SBJ03140_L2300354		SBJ03140_L2300354_001	SBJ03140_L2300354_002
SBJ03141_L2300356		SBJ03141_L2300356_001	SBJ03141_L2300356_002
SBJ03136_L2300350		SBJ03136_L2300350_001	SBJ03136_L2300350_002
SBJ03138_L2300352		SBJ03138_L2300352_001	SBJ03138_L2300352_002
SBJ03139_L2300353		SBJ03139_L2300353_001	SBJ03139_L2300353_002
SBJ03135_L2300349			SBJ03135_L2300349_001
SBJ03134_L2300348			SBJ03134_L2300348_001
SBJ03131_L2300345			SBJ03131_L2300345_001

It may not be possible as to why this occurred. Could this have been something like GSheets not returning the full dataframe to pandas? Or likewise with pieriandx case endpoint? Is there a better way to ensure that pending samples are updated? Should these be updated before other sources are merged?

Re-raise error when disease code doesn't match disease id

Code at https://github.com/umccr/cttso-ica-to-pieriandx/blob/main/utils/micro_classes.py#L52

Related to #97

Add new column 'PierianDx Submission Time' to LIMS Sheet

Related to #46

Consider edge case when pieriandx IDs are 'cancelled' with no replacement

Each hour, the following two rows are updated without a change

SubjectID	LibraryID
SBJ03173	L2300559
SBJ03234	L2300558

Neither sample has a replacement PierianDx ID

CDK Build Deployment fails

This CDK CLI is not compatible with the CDK library used by your application. Please upgrade the CLI to the latest version.
(Cloud assembly schema version mismatch: Maximum schema version supported is 20.0.0, but found 31.0.0)

pandas 2.0.0 has keywarg change for line_terminator

line_terminator is now lineterminator in pandas version 2.0.0. This is a breaking change for PierianDx submission.

'research' workflow in GLIMS to be run under main panel

[ERROR] AttributeError: 'Series' object has no attribute 'glims_is_validation'

Traceback (most recent call last):  File "/var/task/lambda_code.py", line 1788, in lambda_handler    processing_df = submit_libraries_to_pieriandx(processing_df)  
File "/var/task/lambda_code.py", line 348, in submit_libraries_to_pieriandx    processing_df["submission_arn"] = processing_df.apply(  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/frame.py", line 9568, in apply    
return op.apply().__finalize__(self, method="apply")  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 764, in apply    return self.apply_standard()  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 891, in apply_standard    results, res_index = self.apply_series_generator()  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 907, in apply_series_generator    results[i] = self.f(v)  
File "/var/task/lambda_code.py", line 350, in <lambda>    if x.glims_is_validation is True  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/generic.py", line 5902, in __getattr__    return object.__getattribute__(self, name) | [ERROR] AttributeError: 'Series' object has no attribute 'glims_is_validation' Traceback (most recent call last):   
File "/var/task/lambda_code.py", line 1788, in lambda_handler     processing_df = submit_libraries_to_pieriandx(processing_df)  
File "/var/task/lambda_code.py", line 348, in submit_libraries_to_pieriandx     processing_df["submission_arn"] = processing_df.apply(  
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/frame.py", line 9568, in apply     return op.apply().__finalize__(self, method="apply")   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 764, in apply     return self.apply_standard()   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 891, in apply_standard     results, res_index = self.apply_series_generator()   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/apply.py", line 907, in apply_series_generator     results[i] = self.f(v)   
File "/var/task/lambda_code.py", line 350, in <lambda>     if x.glims_is_validation is True   
File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/generic.py", line 5902, in __getattr__     return object.__getattribute__(self, name)

Add column PierianDx Sample Type to LIMS Sheet

Would be a useful metric to be able to look back on to make sure all library submission types are as expected.

Check lambdas are warmed up before launch

Update CDK to 2.85.0

Currently sitting on version 2.39, may require a bit of work for beta modules like aws-batch-alpha

lambda_utils miscell imports the wrong logger

This means the lambda images are unable to deploy

Cloudwatch logs not starting

CannotStartContainerError: Error response from daemon: failed to initialize logging driver: failed to create Cloudwatch log stream: AccessDeniedException: User: arn:aws:sts::843407916570:assumed-role/cttso-ica-to-pieriandx-de-cttsoicatopieriandxdevba-E60G

Pieriandx Submission Time is deleted LIMS df by script

PierianDx Submission Time not entered into LIMS DF at submission time

GH Action Changelog is skipping when files have been updated

Upgrade cttso-ica-to-pieriandx Docker container base

Old base image is causing some errors in deployment

<html>
<body>
<!--StartFragment-->

Installing into a conda env
--
306 | Traceback (most recent call last):
307 | File "/opt/conda/bin/mamba", line 7, in <module>
308 | from mamba.mamba import main
309 | File "/opt/conda/lib/python3.9/site-packages/mamba/mamba.py", line 51, in <module>
310 | from mamba import repoquery as repoquery_api
311 | File "/opt/conda/lib/python3.9/site-packages/mamba/repoquery.py", line 9, in <module>
312 | from mamba.utils import init_api_context, load_channels
313 | File "/opt/conda/lib/python3.9/site-packages/mamba/utils.py", line 17, in <module>
314 | from conda.core.index import _supplement_index_with_system, check_allowlist
315 | ImportError: cannot import name 'check_allowlist' from 'conda.core.index' (/opt/conda/lib/python3.9/site-packages/conda/core/index.py)

<!--EndFragment-->
</body>
</html>

Add commented-out suggestion for running lims script in debug mode

Update all lambda-layer-requirements

Updates from #25 means mypy is out of sync with boto3, instead use pip-review with pip-reqs to update all dependencies

Got case 'None' for pending analysis SBJ00595 L2300857

Batch Submission was successful

2023-07-08 01:08:47,906 - INFO     - cttso-ica-to-pieriandx    - main                                     : LineNo. 103  - Creating case object on PierianDx for case SBJ00595_L2300857_001
--

But then next iteration of LIMS got case none for this subject / library combination (about 10 mins later)

[INFO]	2023-07-08T01:56:26.123Z	c32b6ae0-2b70-4d5a-822e-625e6695be6e	Got case 'None' for pending analysis SBJ00595 L2300857

PierianDx Case Submission Time not found in row when cleaning duplicate rows

2023-07-30 09:30:40,719 - INFO - lambda_code - lambda_handler : LineNo. 1690 - Updating lims

[ERROR] KeyError: 'pieriandx_submission_time'

Traceback (most recent call last):  

File "/var/task/lambda_code.py", line 1703, in lambda_handler    cleanup_duplicate_rows(merged_df, cttso_lims_df, excel_row_number_mapping_df)  

File "/var/task/lambda_code.py", line 1366, in cleanup_duplicate_rows    
merged_df_dedup = bind_pieriandx_case_submission_time_to_merged_df(merged_df_dedup, cttso_lims_df)  

File "/var/task/lambda_code.py", line 1464, in bind_pieriandx_case_submission_time_to_merged_df    pieriandx_case_submission_time = row['pieriandx_submission_time']  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/series.py", line 981, in __getitem__    return self._get_value(key)  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/series.py", line 1089, in _get_value    loc = self.index.get_loc(label)  

File "/var/lang/lib/python3.11/site-packages/pandas-1.5.3-py3.11-linux-x86_64.egg/pandas/core/indexes/base.py", line 3804, in get_loc    raise KeyError(key) from err

Check Main Changelog if syntax is not logical

From
https://github.com/umccr/cttso-ica-to-pieriandx/actions/runs/5505905943/workflow#L35

Should be

${{ 
    steps.get_all_changed_files.outputs.any_changed == 'true' && 
	( steps.get_all_changed_files.outputs.all_changed_files != steps.get_deployment_changed_files.outputs.all_changed_files ) 
}}

glims_is_research should be based on GLIMS 'Workflow' column not ProjectName

Few steps to fix this:

Fix code to check Workflow Column over ProjectName column - fixed by #29
Rebuild ctTSO LIMS db with new code fix (reset db using docs in deployment readme, then ran lims lambda locally)
Update diagram to use Workflow over ProjectName to determine if 'is_research' - fixed by #30

redcap_is_complete column is required but not available in processing_df

[ERROR] AttributeError: 'Series' object has no attribute 'redcap_is_complete'
Traceback (most recent call last):
  File "/var/task/lambda_code.py", line 1450, in lambda_handler
    processing_df = submit_libraries_to_pieriandx(processing_df)
  File "/var/task/lambda_code.py", line 327, in submit_libraries_to_pieriandx
    processing_df["submission_arn"] = processing_df.apply(
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/frame.py", line 8845, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 733, in apply
    return self.apply_standard()
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 857, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/apply.py", line 873, in apply_series_generator
    results[i] = self.f(v)
  File "/var/task/lambda_code.py", line 329, in <lambda>
    if x.is_validation_sample or (x.is_research_sample and not x.redcap_is_complete)
  File "/var/lang/lib/python3.9/site-packages/pandas-1.4.3-py3.9-linux-x86_64.egg/pandas/core/generic.py", line 5575, in __getattr__
    return object.__getattribute__(self, name)

GSheets Service Down Temporarily

APIError: {'code': 503, 'message': 'The service is currently unavailable.', 'status': 'UNAVAILABLE'}
Traceback (most recent call last):
  File "/var/task/lambda_code.py", line 1464, in lambda_handler
    glims_df: pd.DataFrame = get_cttso_samples_from_glims()
  File "/var/lang/lib/python3.10/site-packages/lambda_utils-0.0.1-py3.10.egg/lambda_utils/gspread_helpers.py", line 93, in get_cttso_samples_from_glims
    glims_df: pd.DataFrame = Spread(spread=get_glims_sheet_id(), sheet="Sheet1").sheet_to_df(index=0)
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/spread.py", line 387, in sheet_to_df
    vals = self.sheet.get_all_values()
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 452, in get_all_values
    return self.get_values(**kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 425, in get_values
    return fill_gaps(self.get(range_name, **kwargs))
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/utils.py", line 701, in wrapper
    return f(*args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/worksheet.py", line 818, in get
    response = self.spreadsheet.values_get(range_name, params=params)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/spreadsheet.py", line 182, in values_get
    r = self.client.request("get", url, params=params)
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/util.py", line 305, in request
    raise error
  File "/var/lang/lib/python3.10/site-packages/gspread_pandas-3.2.2-py3.10.egg/gspread_pandas/util.py", line 292, in request
    return ClientV4.request(client, *args, **kwargs)
  File "/var/lang/lib/python3.10/site-packages/gspread-5.8.0-py3.10.egg/gspread/client.py", line 92, in request
    raise APIError(response)

PierianDx ctTSO LIMS data miner not always taking the latest job for a given sample

Previously taking the last element in the array has been successful, however we have one case with the following content in the cttso lims

| pieriandx_case_id | pieriandx_case_accession_number | pieriandx_case_creation_date | pieriandx_case_identified | pieriandx_panel_type | pieriandx_workflow_id | pieriandx_workflow_status | pieriandx_report_status |
|-------------------|---------------------------------|------------------------------|---------------------------|----------------------|-----------------------|---------------------------|-------------------------|
|            227147 | SBJ03034_L2300015_001           |           2023-01-15 0:00:00 |            TRUE           | MAIN                 |                190454 | canceled                  | complete                |

However this case has a new job that is successful (190459).

Rather than taking the last element, it may be worth sorting based on ID, knowing that the IDs are created chronologically.

Autocheck dev deployment passes before allowing merge into main

This would be triggered on creation of a PR,

Would need to use aws credentials in bastion to see one particular cloud formation event, and to get the status of that cloudformation.