seti / rms-pdsfile Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 343 KB

pdsfile Python module

License: Apache License 2.0

Python 99.76% Shell 0.24%

rms-pdsfile's People

Contributors

Watchers

rms-pdsfile's Issues

Generalize regex pattern for VOLSET_REGEX

From rms-webtools created by juzen2003: SETI/rms-webtools#78

Currently regex pattern for VOLSET_REGEX is hard coded for our test directory, generalize the pattern when we know more about the bundle set name later.

Accommodation for pds4file, convert `associated_abspaths`

From rms-webtools created by juzen2003: SETI/rms-webtools#63

In Pds4File/pdsfile.py, convert associated_abspaths function to list all the associated absolute paths of a pdsfile instance.

The return value of "from_path" doesn't match the comments in the function

From rms-webtools created by juzen2003: SETI/rms-webtools#24

Current from_path expected results based on comments:
- 'COISS_2001.targz' --> 'archives-volumes/COISS_2xxx/COISS_2001.tar.gz'
- 'COISS_2001_previews.targz' --> 'archives-previews/COISS_2xxx/COISS_2001_previews.tar.gz'
- 'COISS_0xxx_tar.gz' --> 'archives-volumes/COISS_2xxx'
Actual return results from function:
- 'COISS_2001.targz' --> previews/COISS_2xxx/COISS_2001
- 'COISS_2001_previews.targz' --> volumes/COISS_2xxx/COISS_2001
- 'COISS_0xxx_tar.gz' --> 'volumes/COISS_0xxx'

Accommodation for pds4file, convert `opus_id`

From rms-webtools created by juzen2003: SETI/rms-webtools#67

Need to update opus_id rule in each rules file under pds-webtools/rules

Are CORSS VERSIONS rules correct?

From rms-webtools created by rfrenchseti: SETI/rms-webtools#56

The VERSIONS part of rules/CORSS_8xxx.py has the following code:

    (r'volumes/CORSS_8xxx(|_v[0-9\.]+)/(CORSS_8...)/(\w+)(|/.*)', 0,
            [r'volumes/CORSS_8xxx*/\2/#LOWER#\3\4',
             r'volumes/CORSS_8xxx*/\2/#LOWER#\3#MIXED#\4',
             r'volumes/CORSS_8xxx_v1/\2/#UPPER#\3\4',
             r'volumes/CORSS_8xxx_v1/\2/#UPPER#\3#MIXED#\4',
            ]),

The last two lines duplicate the results from the first two, except they also capitalize the REV prefix. When enumerating version files, this results in things like:

'/volumes/pdsdata-admin/holdings/volumes/CORSS_8xxx_v1/CORSS_8001/EASYDATA/REV07E_RSS_2005_123_X43_E/RSS_2005_123_X43_E_CAL.TAB'
'/volumes/pdsdata-admin/holdings/volumes/CORSS_8xxx_v1/CORSS_8001/EASYDATA/Rev07E_RSS_2005_123_X43_E/RSS_2005_123_X43_E_CAL.TAB'

There is code to de-dup lists like this using the Python set() constructor, but this de-dup is case-sensitive and thus both examples of the file end up being present (see, e.g. PdsFile.all_versions()). Usually this is caught in a later phase of PdsFile, but it causes a warning to be logged (which we don't usually see because we don't have PdsFile logging turned on).

The reason I found this is it changes the code coverage for the PdsFile tests when they are run against Linux-vs-Mac filesystems.

There is no other case where we have this problem, leading me to believe the VERSIONS for CORSS are incorrect in this instance.

Add pickle files for holdings/documents directory

From rms-webtools created by rfrenchseti: SETI/rms-webtools#76

Currently the documents directory does not have associated pickle files, which means any access by PdsFile needs to go to the filesystem instead of the pickle files. It would be more consistent to have pickle files for the documents directory as well. This involves updating the scripts in validation and also making any needed modifications to PdsFile.

Accommodation for pds4file, convert ```viewset```

From rms-webtools created by juzen2003: SETI/rms-webtools#66

In Pds4File/pdsfile.py, convert viewset to return the PdsViewSet used for the pdsfile instance.

NH observations have multiple preview images of a given size

From rms-webtools created by rfrenchseti: SETI/rms-webtools#13

From pds-opus created by rfrenchseti: SETI/rms-opus#483

There are different versions of NH observations with suffixes like "_0x630" and "_0x631". These suffixes are ignored when making the OPUS ID, but the different versions are available for downloading. However, each version also has its own preview image, which means ViewSet has multiple previews for a given OPUS ID and size.

How do we choose which one to display? What do we do if we want the user to choose which one to look at?

Need PdsFile.primary_data_abspath to normalize primary filespecs for all types of data

From rms-webtools created by rfrenchseti: SETI/rms-webtools#4

On Nov 7, 2018, at 1:53 PM, Rob French[email protected] wrote:

OK...so fundamentally there is a mismatch here. If I read the "primary
file spec" from an index file, assuming it is in the proper format
(ending in .LBL), then I have no way of using PdsFile.from_filespec() to
look it up and get a viable ViewSet from it, since ViewSets explicitly
don't work with the .LBL extension.

So either we need to change PdsFile to look up Viewables when the
extension is .LBL, or we need to change the extension of the primary
file spec before sending it to PdsFile for lookup. Or is there already
some automated way to ask PdsFile for the "primary data product" which
DOES have a ViewSet?

There's also the problem that Cassini ISS and Galileo SSI do NOT use
.LBL as the extension in the index files. That means that, in OPUS, some
observations have a primary file spec ending in .LBL and some end in
.IMG. Do we want to make these consistent by having the import pipeline
switch the extension to all be .LBL? Or do we want to keep the OPUS
database consistent with what's actually in the PDS archives?

On 11/7/2018 12:28 PM, Mark Showalter wrote:

OK, I remember now why I did this and it has to do with making sensible Viewmaster pages.

We can solve this problem by having a PdsFile attribute "primary_data_abspath" that returns the absolute path to the primary data file. Then...

pdsf = pdsfile.PdsFile.from_path(filespec)
viewset = pdsfile.PdsFile.from_abspath(pdsf.primary_data_abspath).viewset

...would do the trick. The problem is that the association between a random file and the primary data file is currently not easy to make unless you turn on "set_opus_lookups()", which is slow. I can fix that.

This will be the quantity that should be used as primary filespec, no matter what appears in the label. Also, PdsFile.from_abspath(primary_filespec).viewset will return a valid viewset.

On Nov 7, 2018, at 9:45 PM, Rob French [email protected] wrote:

OK, hopefully last question - do you really want abspath stored in the
OPUS database? That exposes our internal filesystem structure. Wouldn't
the logical_path, or better yet the logical_path with "volumes" stripped
off, be more appropriate?

yes, logical path after "volumes/".

Accommodation for pds4file, convert `opus_products`

From rms-webtools created by juzen2003: SETI/rms-webtools#68

Need to update opus_products rule in each rules file under pds-webtools/rules

Create index files that work with pds4

Need to create index files that work with pds4 file structure. Modifications on the validation script is required.

Check pickle file ordering

From rms-webtools created by rfrenchseti: SETI/rms-webtools#30

_get_shelf in pdsfile.py is sorting the pickle files as they are read because they are coming in out of order. But Python 3 stores dictionaries in insertion order, so we need to investigate why the pickle files are out of order. It could just be that some of the files are old and were written with Python 2, in which case we can update the pickle files and remove the sort.

Need way to make random associations in PdsFile

From rms-webtools created by rfrenchseti: SETI/rms-webtools#16

Occultation profiles are associated with a large number of raw data products. There needs to be a way to associate the profile with the products (and vice versa!) so that when the user goes to download the profile, they can also download the raw data. This will probably be stored in some kind of index file in the metadata directory for each affected volume.

Code coverage shows potential bugs/missing items in tests

From rms-webtools created by rfrenchseti: SETI/rms-webtools#52

rules/COCIRS_xxx.py never executes the loop at 1108 or the if at 1116.
rules/COUVIS_0xxx.py doesn't exercise various branches in DATA_SET_ID()
rules/COVIMS_0xxx.py doesn't exercise various branches in OPUS_ID_TO_PRIMARY_LOGICAL_PATH()
filename_keylen is never tested
tests/test_pdsfile_blackbox.py The clause at line 1271 is never executed
tests/test_pdsfile_blackbox.py The loop at line 3042 never starts
tests/test_pdsfile_whitebox.py The loop at 877 never starts

Update PdsFile to PDS4

From rms-webtools created by rfrenchseti: SETI/rms-webtools#54

PdsFile and its associated systems (e.g. build/validate shelf files, parse and store index files) need to be updated to PDS4. This is a placeholder issue. Over time, as the scope is better understood, it can be expanded into issues for each stage.

seti / rms-pdsfile Goto Github PK

rms-pdsfile's People

Contributors

Watchers

rms-pdsfile's Issues

Recommend Projects

Recommend Topics

Recommend Org