Giter VIP home page Giter VIP logo

Comments (15)

biochem-fan avatar biochem-fan commented on July 26, 2024 1

Link to a workaround:

#11 (comment)

from dxtbx.

graeme-winter avatar graeme-winter commented on July 26, 2024 1

I like this proposal, but I wonder how one would select just the first 9 images in the top example. i.e. where you do want to match just one digit. I would be tempted to bring ? into play, but that is recognised by shell glob so maybe not what we want.

In this case one has to use dials.import template=image_#.cbf scan_range=1,9 or dials.import image_?.cbf. In the latter case, ? is expanded by the shell so DIALS sees only 1 to 9. Thus, it can workout the template and set the scan range correctly.

The only other thing which occurs to me in discussing this is whether to move away from the current template to something more printf-like which in fairness most people can understand so you would have foo_%d.cbf or foo_%03d.cbf i.e. someone else has already defined the grammar. This would seem like a sensible transition.

As a programmer, I like this. But for non-programmers, the printf-syntax might seem daunting.

OK, I agree with your viewpoint here on the single # use case. I ask that we consider also adding support for the printf case as well, since I think that is potentially very powerful (but I am happy that we add that as a second issue / piece of work)

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

Context: some Rigaku datasets (such as these ones: https://xrda.pdbj.org/entry/93) use an image incremental serial number that is not zero-padded.

Can we reasonably detect this automatically and import them properly?

At worst, perhaps we could provide an option at import that allows the template to be non-padded.

from dxtbx.

biochem-fan avatar biochem-fan commented on July 26, 2024

How about using a single # as a placeholder for non-zero-padded digit(s)?
This would be least disruptive.

If you have non-padded images, images_1.cbf, images_2.cbf, images_3.cbf, ...images_300.cbf:

  • images_#.cbf matches all.
  • images_###.cbf does not match.

If you have zero-padded images, images_001.cbf, images_002.cbf, images_003.cbf, ...images_300.cbf:

  • images_#.cbf does not match.
  • images_###.cbf matches all.
  • images_00#.cbf matches the first nine images.

So this does not change the existing behaviour. Am I missing corner cases?

from dxtbx.

graeme-winter avatar graeme-winter commented on July 26, 2024

How about using a single # as a placeholder for non-zero-padded digit(s)? This would be least disruptive.

If you have non-padded images, images_1.cbf, images_2.cbf, images_3.cbf, ...images_300.cbf:

  • images_#.cbf matches all.
  • images_###.cbf does not match.

If you have zero-padded images, images_001.cbf, images_002.cbf, images_003.cbf, ...images_300.cbf:

  • images_#.cbf does not match.
  • images_###.cbf matches all.
  • images_00#.cbf matches the first nine images.

So this does not change the existing behaviour. Am I missing corner cases?

I like this proposal, but I wonder how one would select just the first 9 images in the top example. i.e. where you do want to match just one digit. I would be tempted to bring ? into play, but that is recognised by shell glob so maybe not what we want.

Maybe however my concern does not matter because on balance what you suggest is far more useful than the current behaviour.

The only other thing which occurs to me in discussing this is whether to move away from the current template to something more printf-like which in fairness most people can understand so you would have foo_%d.cbf or foo_%03d.cbf i.e. someone else has already defined the grammar. This would seem like a sensible transition.

from dxtbx.

biochem-fan avatar biochem-fan commented on July 26, 2024

I like this proposal, but I wonder how one would select just the first 9 images in the top example. i.e. where you do want to match just one digit. I would be tempted to bring ? into play, but that is recognised by shell glob so maybe not what we want.

In this case one has to use dials.import template=image_#.cbf scan_range=1,9 or dials.import image_?.cbf. In the latter case, ? is expanded by the shell so DIALS sees only 1 to 9. Thus, it can workout the template and set the scan range correctly.

The only other thing which occurs to me in discussing this is whether to move away from the current template to something more printf-like which in fairness most people can understand so you would have foo_%d.cbf or foo_%03d.cbf i.e. someone else has already defined the grammar. This would seem like a sensible transition.

As a programmer, I like this. But for non-programmers, the printf-syntax might seem daunting.

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

I like this proposal, but I wonder how one would select just the first 9 images in the top example. i.e. where you do want to match just one digit. I would be tempted to bring ? into play, but that is recognised by shell glob so maybe not what we want.

It's a bit ugly but apparently this is currently possible:

dials.import template=../frames/exp_864_1_#.rodhypix image_range=1,5

So something similar could be used to select any range within the single digit numbers, even if # is made to match all images.

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

Ah I see @biochem-fan already suggested similar, using scan_range rather than image_range (the fact we have multiple options here is also confusing)

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

See also dials/dials#2376 (comment)

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

I have been looking at this today and unfortunately I am stuck. The problem is that there is code used to expand a template to filenames, which does not know if the filenames should be zero-padded or not:

def _expand_template(template: str, indices: Iterable[int]) -> list[str]:
"""Expand a template string to a list of filenames.
Args:
template: The template string, with a block of '#' to replace
indices: The numeric indices to insert
"""
pfx = template.split("#")[0]
sfx = template.split("#")[-1]
count = template.count("#")
return [f"{pfx}{index:0{count}}{sfx}" for index in indices]

I can put a count == 1 special case to expand to a non zero-padded list, but this will fail if the dataset is actually using a zero-padded template. That is under the proposal above,

dials.import template=$(dials.data get -q x4wide)/X4_wide_M1S4_2_#.cbf

is supposed to match the files X4_wide_M1S4_2_[0001..0090].cbf, but that information is not available to this function, so the special case will expand to X4_wide_M1S4_2_[1..90].cbf instead and then fail later on.

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

Rather than overloading # I feel it might be somewhat easier to add a new syntax for templates, where zero-padding is explicit. @graeme-winter's suggestion of printf-like syntax seems appealing.

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

Although, reviewing the above I think I may have incorrectly implemented @biochem-fan's suggestion. I was working on the assumption that X4_wide_M1S4_2_#.cbf should match sequential image numbers whether zero-padded or not. However, #646 (comment) states

  • images_#.cbf does not match.

in this case, and now I can see why.

I'll revise my branch in the light of this and see if it can be rescued.

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

Ok, I think it works. PR incoming πŸ™‚

A remaining question is whether e.g. dials.import image_*.rodhypix can be made clever enough to figure out that there is a single scan of non-zero-padded images and correctly import as such. At the moment this fails for me with

ValueError: Please report this error to [email protected]: Could not determine filename template

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

I fixed the ValueError by expanding the special case single-digit # template regex to match multiple digits. This enables me to import a Rigaku dataset consisting of images with non-zero padded indices in the normal way:

$ dials.import ../Tyrosine/exp_333/frames/exp_333_1_*.rodhypix
DIALS (2018) Acta Cryst. D74, 85-97. https://doi.org/10.1107/S2059798317017235
DIALS 3.dev.1101-g0f4a43535
The following parameters have been modified:

input {
  experiments = <image files>
}

--------------------------------------------------------------------------------
  format: <class 'dxtbx.format.FormatROD.FormatROD'>
  template: /home/fcx32934/data/Rigaku/tyrosine/Tyrosine/exp_333/frames/exp_333_1_#.rodhypix:1:560
  num images: 560
  sequences:
    still:    0
    sweep:    1
  num stills: 0
--------------------------------------------------------------------------------
Writing experiments to imported.expt

πŸŽ‰

@graeme-winter, the single digit template was apparently contentious when it was added (dials/dials#972) and I have extended its application. However, I can't think of any realistic circumstances where this could go wrong (perhaps due to lack of imagination). Would appreciate your comments and review at #705

from dxtbx.

dagewa avatar dagewa commented on July 26, 2024

NB added a commit that changes tactic slightly. Rather than extending the single-digit pattern meant for images like image1.cbf or whatever, I added a new pattern that includes the _ character, which seems to be part of the Rigaku / CAP filename template.

from dxtbx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.