pachterlab / seqspec Goto Github PK

machine-readable file format for genomic library sequence and structure

License: MIT License

Python 89.92% Makefile 0.39% Jupyter Notebook 9.69%

seqspec's Introduction

seqspec

seqspec is a machine-readable YAML file format for genomic library sequence and structure. It was inspired by and builds off of the Teichmann Lab Single Cell Genomics Library Structure by Xi Chen.

Genomic library structure depends on both the assay and sequencer (and kit) used to generate and bind the assay-specific construct to the sequencing adapters to generate a sequencing library. Therefore, a seqspec is specific to both a genomics assay and sequencer.

A list of seqspec examples for multiple assays and sequencers can be found on this website. Each spec.yaml describes the 5'->3' "Final library structure" for the assay and sequencer. Sequence specification files can be formatted with the seqspec command line tool.

# release
pip install seqspec

# development
pip install git+https://github.com/IGVF/seqspec.git

# verify install
seqspec --help

Documentation:

seqspec's People

Contributors

Stargazers

Watchers

Forkers

biobenkj emattei dbrg77 denvern3 detrout crsky1023 nchernia satpathylab rairuhi gersbachlab-bioinformatics wflynny igvf-dacc animesh chooliu

seqspec's Issues

Linker is not a region_type

In the example specs there is a region_type called linker1
https://github.com/IGVF/seqspec/blob/0d408a38cec4e632f85a20bd95c26c56ad1ac1dc/specs/SHARE-seq/spec.yaml#LL457C1-L457C1

But it doesn't seem to be an allowed type https://github.com/IGVF/seqspec/blob/main/docs/SPECIFICATION.md

What is the correct type to use.

Moreover, region_type has a number to it but I assume that that spec it is outdated and it should be region_type: linker and region_id: linker-1

DNA for modalities

I try to create a seqspec file for MPRAs. for teh aqssignment sequencing we are sequencing genomic/synthetic regions which we designed and the BC which is associated with. So I would say this is sequencing of a DNA modality. But seqspec allows only this:

'DNA' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]

None of them fits to the modality in our case

package uses, but does list jsonschema in requirements.

The tox run tests fail without this fix, and honestly the pyyaml restriction is more restrictive than needed. seqspec check also works with 5.3.1

--- a/requirements.txt
+++ b/requirements.txt
@@ -1 +1,2 @@
pyyaml==6.0
+jsonschema

10x-RNA-v3 barcode file

Hi,

I noticed that the 10x chromium v3 whitelist file included (737-august-2016) is the same as the v2 chemistry. Shouldn't it be the 3M-february-2018?

Please make an initial release

To ease a wide adoption of this approach, we need a release as soon as possible. This way, it can for example be incorporated in Bioconda and Snakemake wrappers.

PIP-seq

Would be great to add a PIP-seq spec: https://www.nature.com/articles/s41587-023-01685-z

Print not working on Windows

!seqspec print broad_human_jamboree_test_spec.yaml
Returns

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\eugen\AppData\Local\Programs\Python\Python311\Scripts\seqspec.exe\__main__.py", line 7, in <module>
  File "C:\Users\eugen\AppData\Local\Programs\Python\Python311\Lib\site-packages\seqspec\main.py", line 68, in main
    COMMAND_TO_FUNCTION[sys.argv[1]](parser, args)
  File "C:\Users\eugen\AppData\Local\Programs\Python\Python311\Lib\site-packages\seqspec\seqspec_print.py", line 45, in validate_print_args
    print(s)
  File "C:\Users\eugen\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-10: character maps to <undefined>

Apparently this bug is due to this:
https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters

Works on Mac and Linux but not on jupyter notebook run under Windows.

Tested using the spec.yaml found in the GitHub under the examples.

Improvements

seqspec format verify md5sum of onlist
add container_type to assay, options are well, cell, shell
add strand to each region which states the strand the region is ordered in
seqspec check should validate that the region_type/sequence type pairs make sense (not all are alowed), same with sequence

to_dict() should return the order parameter.

While I was writing about how the jsonschema validator works better with to_dict(), I remembered to check that order is actually being set in the schema files.

I think actually the reason for the order validation errors is .to_dict() probably also needs to return the order property.

@@ -101,6 +99,7 @@ class Region(yaml.YAMLObject):
             "region_type": self.region_type,
             "name": self.name,
             "sequence_type": self.sequence_type,
+            "order": self.order,
             "onlist": self.onlist.to_dict() if self.onlist else None,
             "sequence": self.sequence,
             "min_len": self.min_len,

Sequencing data repositories which accept seqspec?

Hi there,

Is there any list of sequencing data repositories which accept seqspec? (realizing that seqspec is very new, and direct support is unlikely at this stage, but maybe there are some best practices even at this point?)

For example, does NCBI SRA accept seqspec?

read seqspec from URI

It would be cool if the lib could read seqspec from a URI instead of just a local file path

Draft4Validator.iter_errors is expecting a dictionary

When running seqspec check

The result is:

python3 -m seqspec.main check ./assays/BD-Rhapsody-EB/spec.yaml                        
[error 1] {'name': 'BD-Rhapsody-EB', 'doi': 'https://scomix.bd.com/hc/en-us/articles/6990647359501-Rhapsody-WTA-De
mo-Datasets-with-Enhanced-Cell-Capture-Beads', 'publication_date': '31 August 2022', 'description': 'BD Rhapsody W
TA is a nanowell-based commercial system that uses a split-pool (Enahnced Beads-v2) approach to generate oligos on
 magnetic beads.', 'modalities': ['RNA'], 'lib_struct': 'https://teichlab.github.io/scg_lib_structs/methods_html/B
D_Rhapsody.html', 'assay_spec': [{'region_id': 'RNA', 'region_type': 'RNA', 'name': 'RNA', 'sequence_type': 'joine
d', 'onlist': None, 'sequence': 'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTXNNNNNNNNNGTGANNNNNNNNN
GACANNNNNNNNNNNNNNNNNXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG', 'min_len': 169, 'max_l
en': 366, 'regions': [{'region_id': 'illumina_p7', 'region_type': 'illumina_p7', 'name': 'illumina_p7', 'sequence_
type': 'fixed', 'onlist': None, 'sequence': 'AATGATACGGCGACCACCGAGATCTACAC', 'min_len': 29, 'max_len': 29, 'region
s': None}, {'region_id': 'truseq_r1', 'region_type': 'truseq_r1', 'name': 'truseq_r1', 'sequence_type': 'fixed', '
onlist': None, 'sequence': 'TCTTTCCCTACACGACGCTCTTCCGATCT', 'min_len': 29, 'max_len': 29, 'regions': None}, {'regi
on_id': 'vb', 'region_type': 'vb', 'name': 'vb', 'sequence_type': 'onlist', 'onlist': {'filename': 'vb_onlist.txt'
, 'md5': None}, 'sequence': 'X', 'min_len': 0, 'max_len': 3, 'regions': None}, {'region_id': 'cls1', 'region_type'
: 'cls1', 'name': 'cls1', 'sequence_type': 'onlist', 'onlist': {'filename': 'cls1_onlist.txt', 'md5': None}, 'sequ
ence': 'NNNNNNNNN', 'min_len': 9, 'max_len': 9, 'regions': None}, {'region_id': 'linker1', 'region_type': 'linker1
', 'name': 'linker1', 'sequence_type': 'fixed', 'onlist': None, 'sequence': 'GTGA', 'min_len': 4, 'max_len': 4, 'r
egions': None}, {'region_id': 'cls2', 'region_type': 'cls2', 'name': 'cls2', 'sequence_type': 'onlist', 'onlist': 
{'filename': 'cls2_onlist.txt', 'md5': None}, 'sequence': 'NNNNNNNNN', 'min_len': 9, 'max_len': 9, 'regions': None
}, {'region_id': 'linker2', 'region_type': 'linker2', 'name': 'linker2', 'sequence_type': 'fixed', 'onlist': None,
 'sequence': 'GACA', 'min_len': 4, 'max_len': 4, 'regions': None}, {'region_id': 'cls3', 'region_type': 'cls3', 'n
ame': 'cls3', 'sequence_type': 'onlist', 'onlist': {'filename': 'cls3_onlist.txt', 'md5': None}, 'sequence': 'NNNN
NNNNN', 'min_len': 9, 'max_len': 9, 'regions': None}, {'region_id': 'umi', 'region_type': 'umi', 'name': 'umi', 's
equence_type': 'random', 'onlist': None, 'sequence': 'NNNNNNNN', 'min_len': 8, 'max_len': 8, 'regions': None}, {'r
egion_id': 'polyT', 'region_type': 'polyT', 'name': 'polyT', 'sequence_type': 'random', 'onlist': None, 'sequence'
: 'X', 'min_len': 1, 'max_len': 98, 'regions': None}, {'region_id': 'cdna', 'region_type': 'cdna', 'name': 'cdna',
 'sequence_type': 'random', 'onlist': None, 'sequence': 'X', 'min_len': 1, 'max_len': 98, 'regions': None}, {'regi
on_id': 'truseq_r2', 'region_type': 'truseq_r2', 'name': 'truseq_r2', 'sequence_type': 'fixed', 'onlist': None, 's
equence': 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC', 'min_len': 34, 'max_len': 34, 'regions': None}, {'region_id': 'sam
ple_index', 'region_type': 'sample_index', 'name': 'sample_index', 'sequence_type': 'onlist', 'onlist': {'filename
': 'sample_index_onlist.txt', 'md5': None}, 'sequence': 'NNNNNNNN', 'min_len': 8, 'max_len': 8, 'regions': None}, 
{'region_id': 'illumina_p7', 'region_type': 'illumina_p7', 'name': 'illumina_p7', 'sequence_type': 'fixed', 'onlis
t': None, 'sequence': 'ATCTCGTATGCCGTCTTCTGCTTG', 'min_len': 24, 'max_len': 24, 'regions': None}]}]} is not of typ
e 'object' in spec[]

after applying this patch the error messages look quite a bit more plausible.

--- a/seqspec/seqspec_check.py
+++ b/seqspec/seqspec_check.py
@@ -39,9 +39,8 @@ def validate_check_args(parser, args):
 
 
 def run_check(schema, spec):
-
     v = Draft4Validator(schema)
-    for idx, error in enumerate(v.iter_errors(spec), 1):
+    for idx, error in enumerate(v.iter_errors(spec.to_dict()), 1):
         print(
             f"[error {idx}] {error.message} in spec[{']['.join(repr(index) for index in error.path)}]"
         )

Now lists many more errors.

Though also maybe some of the attributes could be optional?

As a guess order might be a good candidate for either being optional, having validation code added, or having the order of elements in the list shuffled to match the order. (I bet the Stanford DACC might be able to help with the jsonschema)

[error 1] 'order' is a required property in spec['assay_spec'][0]['regions'][0]
[error 2] 'order' is a required property in spec['assay_spec'][0]['regions'][1]
[error 3] None is not of type 'string' in spec['assay_spec'][0]['regions'][2]['onlist']['md5']
[error 4] 'order' is a required property in spec['assay_spec'][0]['regions'][2]
[error 5] None is not of type 'string' in spec['assay_spec'][0]['regions'][3]['onlist']['md5']
[error 6] 'order' is a required property in spec['assay_spec'][0]['regions'][3]
[error 7] 'order' is a required property in spec['assay_spec'][0]['regions'][4]
[error 8] None is not of type 'string' in spec['assay_spec'][0]['regions'][5]['onlist']['md5']
[error 9] 'order' is a required property in spec['assay_spec'][0]['regions'][5]
[error 10] 'order' is a required property in spec['assay_spec'][0]['regions'][6]
[error 11] None is not of type 'string' in spec['assay_spec'][0]['regions'][7]['onlist']['md5']
[error 12] 'order' is a required property in spec['assay_spec'][0]['regions'][7]
[error 13] 'order' is a required property in spec['assay_spec'][0]['regions'][8]
[error 14] 'order' is a required property in spec['assay_spec'][0]['regions'][9]
[error 15] 'order' is a required property in spec['assay_spec'][0]['regions'][10]
[error 16] 'order' is a required property in spec['assay_spec'][0]['regions'][11]
[error 17] None is not of type 'string' in spec['assay_spec'][0]['regions'][12]['onlist']['md5']
[error 18] 'order' is a required property in spec['assay_spec'][0]['regions'][12]
[error 19] 'order' is a required property in spec['assay_spec'][0]['regions'][13]
[error 20] 'order' is a required property in spec['assay_spec'][0]

speck check is allowing different region_type

This might be due to the validator still being in a beta version but at the moment region_type doesn't have any constraints.

getting not unique across all regions error but region_id is unique

I getting the following errors on my file:

[error 1] IGVF_neuro_S1_R2_001.fastq.gz does not exist
[error 2] IGVF_neuro_S1_R1_001.fastq.gz does not exist
[error 3] IGVF_neuro_S1_R3_001.fastq.gz does not exist
[error 4] IGVF_neuro_S1_R2_001.fastq.gz does not exist
[error 5] IGVF_neuro_S1_R1_001.fastq.gz does not exist
[error 6] IGVF_neuro_S1_R3_001.fastq.gz does not exist
[error 7] IGVF_neuro_S1_R2_001.fastq.gz does not exist
[error 8] IGVF_neuro_S1_R1_001.fastq.gz does not exist
[error 9] IGVF_neuro_S1_R3_001.fastq.gz does not exist
[error 10] region_id 'IGVF_neuro_S1_R2_001.fastq.gz' is not unique across all regions
[error 11] region_id 'adapter_fwd' is not unique across all regions
[error 12] region_id 'IGVF_neuro_S1_R1_001.fastq.gz' is not unique across all regions
[error 13] region_id 'IGVF_neuro_S1_R3_001.fastq.gz' is not unique across all regions
[error 14] region_id 'adapter_rev' is not unique across all regions
[error 15] region_id 'IGVF_neuro_S1_R2_001.fastq.gz' is not unique across all regions
[error 16] region_id 'adapter_fwd' is not unique across all regions
[error 17] region_id 'IGVF_neuro_S1_R1_001.fastq.gz' is not unique across all regions
[error 18] region_id 'IGVF_neuro_S1_R3_001.fastq.gz' is not unique across all regions
[error 19] region_id 'adapter_rev' is not unique across all regions

I cannot explain error 10 to 19 because region_ids are unique.

Further error 1 to 9 complains about a missing file. But then it should also mentioned Ngn2-RNA-1_S4_R1_001.fastq.gz, Ngn2-RNA-1_S4_R2_001.fastq.gz, Ngn2-RNA-1_S4_R3_001.fastq.gz, Ngn2-DNA-1_S1_R1_001.fastq.gz, Ngn2-DNA-1_S1_R2_001.fastq.gz and Ngn2-DNA-1_S1_R3_001.fastq.gz because they are also not present.

My file:

!Assay
seqspec_version: 0.0.0
assay: "MPRA"
sequencer: "TODO"
name: mpra_shendure_assignment_80K
doi: ""
publication_date: ""
description: "Assignment library of the MPRA 80K design (caridac, neuro and random CREs)"
modalities:
  - rna # FIXME to DNA
  - rna # FIXME to DNA
  - rna
lib_struct: ""
assay_spec:
  - !Region
    parent_id: null
    region_id: assignment
    region_type: gdna # FIXME to DNA
    name: Assignment
    sequence_type: random
    sequence: X
    min_len: 0
    max_len: 1024
    onlist: null
    regions:
      - !Region
        parent_id: assignment
        region_id: barcode
        region_type: barcode # or tag?
        name: Barcode
        sequence_type: random # can in theory be onlist, but this will be a long list with all possible combinations
        sequence: XXXXXXXXXXXXXXX
        min_len: 15
        max_len: 15
        onlist: null # or filename of all possible combinations
        regions:
          - !Region
            parent_id: barcode
            region_id: IGVF_neuro_S1_R2_001.fastq.gz
            region_type: fastq # or tag?
            name: IGVF_neuro_S1_R2_001.fastq.gz
            sequence_type: random # can in theory be onlist, but this will be a long list with all possible combinations
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null # or filename of all possible combinations
            regions: null
      - !Region
        parent_id: assignment
        region_id: oligo
        region_type: gdna # FIXME to dna
        name: Oligo sequence
        sequence_type: onlist
        sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        min_len: 300
        max_len: 300
        onlist: !Onlist
          filename: /fast/groups/ag_kircher/work/MPRA/IGVF_Y1_design/final_design/results/final_design/design.fa.gz
          location: local
          md5: 5a34f80819cc26f33f641c9aad70be09
        regions:
          - !Region
            parent_id: oligo
            region_id: adapter_fwd
            region_type: linker # FIXME to adapter
            name: Forward adapter
            sequence_type: fixed
            sequence: AGGACCGGATCAACT
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
          - !Region
            parent_id: oligo
            region_id: designed_sequence
            region_type: gdna # FIXME to dna
            name: Designed oligo sequence for testing
            sequence_type: onlist # or onlist because we knwo the design
            sequence: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
            min_len: 270
            max_len: 270
            onlist: !Onlist
              filename: /fast/groups/ag_kircher/work/MPRA/IGVF_Y1_design/final_design/results/final_design/design.fa.gz
              location: local
              md5: 5a34f80819cc26f33f641c9aad70be09
            regions:
              - !Region
                parent_id: designed_sequence
                region_id: IGVF_neuro_S1_R1_001.fastq.gz
                region_type: fastq
                name: IGVF_neuro_S1_R1_001.fastq.gz
                sequence_type: random
                sequence: X
                min_len: 1
                max_len: 146
                onlist: null
                regions: null
              - !Region
                parent_id: designed_sequence
                region_id: IGVF_neuro_S1_R3_001.fastq.gz
                region_type: fastq
                name: IGVF_neuro_S1_R3_001.fastq.gz
                sequence_type: random
                sequence: X
                min_len: 1
                max_len: 146
                onlist: null
                regions: null
          - !Region
            parent_id: assignment
            region_id: adapter_rev
            region_type: linker # FIXME to adapter
            name: Reverse adapter
            sequence_type: fixed
            sequence: CATTGCGTGAACCGA
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
  - !Region
    parent_id: null
    region_id: dna_count_library
    region_type: cdna # or tag?
    name: DNA counts library
    sequence_type: random
    sequence: X
    min_len: 1
    max_len: 31
    onlist: null
    regions:
      - !Region
        parent_id: dna_count_library
        region_id: dna_counts
        region_type: barcode # or tag?
        name: DNA counts
        sequence_type: random
        sequence: XXXXXXXXXXXXXXX
        min_len: 15
        max_len: 15
        onlist: null
        regions:
          - !Region
            parent_id: dna_counts
            region_id: Ngn2-DNA-1_S1_R1_001.fastq.gz
            region_type: fastq
            name: Ngn2-DNA-1_S1_R1_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
          - !Region
            parent_id: dna_counts
            region_id: Ngn2-DNA-1_S1_R3_001.fastq.gz
            region_type: fastq # or tag or bc
            name: Ngn2-DNA-1_S1_R3_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
      - !Region
        parent_id: dna_count_library
        region_id: dna_umis
        region_type: umi
        name: DNA UMIs
        sequence_type: random
        sequence: XXXXXXXXXXXXXXXX
        min_len: 16
        max_len: 16
        onlist: null
        regions:
          - !Region
            parent_id: dna_counts
            region_id: Ngn2-DNA-1_S1_R2_001.fastq.gz
            region_type: fastq # or tag or bc
            name: Ngn2-DNA-1_S1_R2_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
  - !Region
    parent_id: null
    region_id: rna_count_library
    region_type: cdna # or tag?
    name: RNA counts library
    sequence_type: random
    sequence: X
    min_len: 1
    max_len: 31
    onlist: null
    regions:
      - !Region
        parent_id: rna_count_library
        region_id: rna_counts
        region_type: barcode # or tag?
        name: DNA counts
        sequence_type: random
        sequence: XXXXXXXXXXXXXXX
        min_len: 15
        max_len: 15
        onlist: null
        regions:
          - !Region
            parent_id: rna_counts
            region_id: Ngn2-RNA-1_S4_R1_001.fastq.gz
            region_type: fastq
            name: Ngn2-RNA-1_S4_R1_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
          - !Region
            parent_id: rna_counts
            region_id: Ngn2-RNA-1_S4_R3_001.fastq.gz
            region_type: fastq
            name: Ngn2-RNA-1_S4_R3_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXX
            min_len: 15
            max_len: 15
            onlist: null
            regions: null
      - !Region
        parent_id: rna_count_library
        region_id: rna_umis
        region_type: umi
        name: DNA UMIs
        sequence_type: random
        sequence: XXXXXXXXXXXXXXXX
        min_len: 16
        max_len: 16
        onlist: null
        regions:
          - !Region
            parent_id: dna_counts
            region_id: Ngn2-RNA-1_S4_R2_001.fastq.gz
            region_type: fastq
            name: Ngn2-RNA-1_S4_R2_001.fastq.gz
            sequence_type: random
            sequence: XXXXXXXXXXXXXXXX
            min_len: 16
            max_len: 16
            onlist: null
            regions: null

add "requests" package as dependency?

Super minor installation issue with requests package dependency not being bundled by default in some environments?

Tested on various MacOS and Linux anaconda3/miniconda distributions (conda > 4.10.3, python 3.11.x, seqspec 967cf97).

Example:

conda create --name seqspec
conda activate seqspec

conda install pip
pip install git+https://github.com/IGVF/seqspec.git

conda list
# packages in environment at /Users/choo/opt/anaconda3/envs/seqspec:
#
# Name                    Version                   Build  Channel
attrs                     23.1.0                   pypi_0    pypi
bzip2                     1.0.8                h1de35cc_0  
ca-certificates           2023.08.22           hecd8cb5_0  
jsonschema                4.19.0                   pypi_0    pypi
jsonschema-specifications 2023.7.1                 pypi_0    pypi
libffi                    3.4.4                hecd8cb5_0  
ncurses                   6.4                  hcec6c5f_0  
newick                    1.9.0                    pypi_0    pypi
openssl                   3.0.10               hca72f7f_2  
pip                       23.2.1                   pypi_0    pypi
python                    3.11.5               hf27a42d_0  
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  hca72f7f_0  
referencing               0.30.2                   pypi_0    pypi
rpds-py                   0.10.3                   pypi_0    pypi
seqspec                   0.0.0                    pypi_0    pypi
setuptools                68.0.0                   pypi_0    pypi
sqlite                    3.41.2               h6c40b1e_0  
tk                        8.6.12               h5d9f67b_0  
tzdata                    2023c                h04d1e81_0  
wheel                     0.38.4                   pypi_0    pypi
xz                        5.4.2                h6c40b1e_0  
zlib                      1.2.13               h4dc903c_0  



seqspec --help
Traceback (most recent call last):
  File "/Users/choo/opt/anaconda3/envs/seqspec/bin/seqspec", line 5, in <module>
    from seqspec.main import main
  File "/Users/choo/opt/anaconda3/envs/seqspec/lib/python3.11/site-packages/seqspec/main.py", line 4, in <module>
    from .seqspec_format import setup_format_args, validate_format_args
  File "/Users/choo/opt/anaconda3/envs/seqspec/lib/python3.11/site-packages/seqspec/seqspec_format.py", line 1, in <module>
    from seqspec.utils import load_spec
  File "/Users/choo/opt/anaconda3/envs/seqspec/lib/python3.11/site-packages/seqspec/utils.py", line 5, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

Fix:

pip install requests # requests 2.31.0 (pypi_0) installed
seqspec --help # launches as expected

Cheers!

missing support for custom read primer definition

There is currently no support for custom read primers. Specifically, at least two assays that I am aware of (BioRad SureCell 3' WTA and ATACseq) use a custom read1 primer rather than the standard Illumina TruSeq or Nextera primers. The only currently supported region types that appear to indicate primers are: truseq_read1/truseq_read2 and nextera_read1/nextera_read2. In order to incorporate seqspec into an automated pipeline which include adapter trimming (for example), it would need to support designation of custom primer types as well (such as the more generic "read1_primer" and "read2_primer" designation currently used in the SureCell seqspec).