Giter VIP home page Giter VIP logo

data's Introduction

DIALS: Diffraction Integration for Advanced Light Sources

Code style: black Coverage Gitter

X-ray crystallography for structural biology has benefited greatly from a number of advances in recent years including high performance pixel array detectors, new beamlines capable of delivering micron and sub-micron focus and new light sources such as XFELs. The DIALS project is a collaborative endeavour to develop new diffraction integration software to meet the data analysis requirements presented by these recent advances. There are three end goals: to develop an extensible framework for the development of algorithms to analyse X-ray diffraction data; the implementation of algorithms within this framework and finally a set of user facing tools using these algorithms to allow integration of data from diffraction experiments on synchrotron and free electron sources.

Website

https://dials.github.io

Reference

Winter, G., Waterman, D. G., Parkhurst, J. M., Brewster, A. S., Gildea, R. J., Gerstel, M., Fuentes-Montero, L., Vollmar, M., Michels-Clark, T., Young, I. D., Sauter, N. K. and Evans, G. (2018) Acta Cryst. D74.

Funding

DIALS development at Diamond Light Source is supported by the BioStruct-X EU grant, Diamond Light Source, and CCP4.

DIALS development at Lawrence Berkeley National Laboratory is supported by National Institutes of Health / National Institute of General Medical Sciences grant R01-GM117126. Work at LBNL is performed under Department of Energy contract DE-AC02-05CH11231.

data's People

Contributors

anthchirp avatar benjaminhwilliams avatar d-j-hatton avatar dagewa avatar dependabot-preview[bot] avatar dependabot[bot] avatar diamondlightsource-build-server avatar elena-pascal avatar graeme-winter avatar jbeilstenedmands avatar lgtm-com[bot] avatar ndevenish avatar phyy-nx avatar pyup-bot avatar renovate-bot avatar renovate[bot] avatar rjgildea avatar toastisme avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data's Issues

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.


  • Check this box to trigger a request for Renovate to run again on this repository

Allow test data to be downloaded from google drive

Dependency += gdown

Example code snippet:

import gdown

files = {"indexed.refl":"1iXejo0YSBpBq_WezDnz4bKuA4CpVE0dz",
         "indexed.expt":"145e0pSGSYtu5uZ9pfvyXwhWnhklkAA-m"}

for out in files:
    fid = files[out]

    gdown.download("https://drive.google.com/uc?id=%s" % fid, out, quiet=True)

would make it very easy to fetch data down from somewhere where people can easily add

Evaluate pooch

Learnt of the existence of pooch at SciPy 2020.
This could be a potential new backend for dials-data dealing with downloading and unpacking and all that stuff.

Documentation refers to both `dials_data` and `dials-data`

And I'm pretty sure there is also a dials.data. Maybe even a dials/data. I think the most consistent package name is 'dials-data' (while pip is happy to understand dials_data conda is not) but github repo and pytest fixture are dials_data.

I'm trying to move all package references to dials-datain #400, but not sure that reduces entropy.

refinement_test_data needs a directory structure

The refinement_test_data directory on data-files has sub-directories to mirror the original files on dials_regression. I forgot about that when adding the definition, so the resulting dials-data dataset is missing some files, which have shared file names.

Is it possible to define the directory structure? I note the spring8_ccp4_2018 dataset does this, but by downloading and unpacking a tar archive.

Retry downloads

Sometimes I see sporadic 403 errors or timeouts, so should retry downloads a few times before giving up.

Get data directory path via command line

It would be useful to have a way to get the directory path for a given dials.data dataset. Essentially, a simpler way of facilitating command lines such as:

$ xia2.multiplex $(dials.data get multi_crystal_proteinase_k | grep : | cut -d: -f 2)/multi_crystal_proteinase_k/*

.expt files do not consistently point at the correct image file locations

  • DIALS Regression Data version: v2.4.0
  • Python version: 3.8.5
  • Operating System: Ubuntu 18.04

Description

.expt files often do not have the correct image file locations, so any tests involving accessing the images will fail.
e.g. insulin_processed/*.expt assume images are in the same directory, but they are actually in ../insulin/.

It's not entirely clear if it's even desirable to correct this, as then the description of how the data were obtained would not quite be correct (i.e repeating the same steps would not lead to having .expt files with the same imageset paths). But I think it's worth flagging.

What I Did

>>> ExperimentListFactory.from_json_file(dials_data("insulin_processed", pathlib=True)/"refined.expt")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 767, in from_json_file
    return ExperimentListFactory.from_json(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 755, in from_json
    return ExperimentListFactory.from_dict(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 743, in from_dict
    experiments = ExperimentListDict(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 355, in decode
    imageset = self._imageset_from_imageset_data(imageset_data, models)
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 214, in _imageset_from_imageset_data
    imageset = self._make_sequence(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/model/experiment_list.py", line 433, in _make_sequence
    return ImageSetFactory.make_sequence(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/imageset.py", line 597, in make_sequence
    format_class = dxtbx.format.Registry.get_format_class_for_file(
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/format/Registry.py", line 122, in get_format_class_for_file
    if scheme in format_class.schemes and format_class.understand(image_file_str):
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/format/FormatCBF.py", line 24, in understand
    with FormatCBF.open_file(image_file, "rb") as fh:
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/format/Format.py", line 565, in open_file
    return cls.get_cache_controller().check(filename_str, fh_func)
  File "/home/davidmcdonagh/work/dials/modules/dxtbx/src/dxtbx/filecache_controller.py", line 69, in check
    self._cache = dxtbx.filecache.lazy_file_cache(open_method())
FileNotFoundError: [Errno 2] No such file or directory: '/home/davidmcdonagh/work/dials/build/dials_data/insulin_processed/insulin_1_001.img'

PR builds broken on Python 3.6

I would like to get #254 and #256 merged soon, but builds have been broken on Python 3.6 for all recent PRs. This issue with the cryptography package may be relevant:
pyca/cryptography#5771

Could these PRs be merged anyway please? I have two dxtbx PRs that are waiting for these data definitions.

Out of date version number in repo vs PyPI?

@Anthchirp, I seem to be having trouble with dials_data versions when running this test (though Jenkins seems to be fine, instead falling over at the expected values of beam shift, which I was trying to look into). My dials_data is installed from a clone of the dials_data repo, with libtbx.pip install -e <dials_data directory>. My copy seems to be up to date, but I have a version number 1.0.0-dev < 1.0.5, so Pytest complains:

$ libtbx.pip show dials_data
Name: dials-data
Version: 1.0.0
Summary: DIALS Regression Data Manager
Home-page: https://github.com/dials/data
Author: Markus Gerstel
Author-email: [email protected]
License: BSD license
Location: /home/ekm22040/DIALS/data
Requires: pytest, pyyaml, setuptools, six
Required-by:
$ dials.data
usage: dials.data <command> [<args>]

DIALS regression data manager v1.0.0-dev

<...etc.>

Meanwhile, there appear to be several newer versions listed on the PyPI:

$ libtbx.pip install dials_data==
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Collecting dials_data==
  Could not find a version that satisfies the requirement dials_data== (from versions: 0.1.0, 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.6.1, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.13)
No matching distribution found for dials_data==

I notice that the module's __version__ hasn't changed since 63ca7a6. Should this be bumped to 1.0.13 now, and then bumped by Travis with each update file information PR?

Automated Zenodo uploader

  • DIALS Regression Data version: current
  • Python version: 3.6
  • Operating System: UNIX based

Description

Propose to add an automated zenodo data uploader which could also generate the appropriate JSON text for the new data set - there is a REST API which appears to work simply enough. Will require a user generate an upload token using instructions at:

https://zenodo.org/account/settings/applications/tokens/new/

What I Did

import requests
import os
import sys
import pprint

# get yourself an access token from:
#
# https://zenodo.org/account/settings/applications/tokens/new/

ACCESS_TOKEN = "aaaaaaaaa"

headers = {"Content-Type": "application/json"}
r = requests.post(
    "https://zenodo.org/api/deposit/depositions",
    params={"access_token": ACCESS_TOKEN},
    json={},
    headers=headers,
)
print(r.status_code)
print(r.json())

d_id = r.json()["id"]

for directory in sys.argv[1:]:
    for filename in os.listdir(directory):
        print(filename)
        data = {"name": filename}
        files = {"file": open(os.path.join(directory, filename), "rb")}
        r = requests.post(
            "https://zenodo.org/api/deposit/depositions/%s/files" % d_id,
            params={"access_token": ACCESS_TOKEN},
            data=data,
            files=files,
        )
        pprint.pprint(r.json())

allows automated upload of every file in a directory, as an example - the token can have permission to complete the upload and publish, but in my test case I did not test this out, just used it to upload 3,450 files.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add Rigaku HyPix 6000 datasets for multi-sweep image to indexing solution tests

@biochem-fan pointed out some suitable datasets here: cctbx/dxtbx#653 (comment)

To do: look through these and find a reasonably small subset that can be added to dials-data in order to test multi-sweep indexing across different scan axes and $2\theta$ swings.

As discussed in a DIALS catch up, this would be a fairly long test, but it could be added to xia2 tests rather than DIALS. xia2 has a --regression-full option that can be used for long running tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.