Giter VIP home page Giter VIP logo

cytopy's Introduction

CytoPy: a cytometry analysis framework for Python

PyPi Readthedocs Python Wheel Downloads License LastCommit DockerPulls DockerImageSize

Overview

In recent years there has been an explosion in Cytometry data analysis tools in the open source scientific community. This expansion is looking to soon replace traditional methods such as manual gating with sophisticated automated algorithms.

Although exciting, most of the tools and frameworks on offer are implemented in the R programming language and offer little structure and data management for those that are new to cytometry bioinformatics. This is especially difficult for those with limited experience with R and Bioconductor. We offer an alternative solution implemented in Python, a beginner friendly language that prides itself on readable syntax.

The CytoPy framework offers an object orientated design built upon mongoengine for flexible database designs that can incorporate any project, no matter how complex. CytoPy's toolkit populates this database with common data structures to represent cell populations identified in your cytometry data, whilst being algorithm agnostic and encouraging the use and comparison of multiple techniques.

Features we offer are:

  • Dynamic central document-based data repository
  • Autonomous gating with hyperparameter search and local normalisation to help with tricky batch effects
  • Global batch effect correction with the Harmony algoritm
  • Supervised classification supporting any classifier in the Scikit-Learn ecosystem
  • High dimensional clustering, including but not limited to FlowSOM and Phenograph
  • Feature extracting and selection techniques to summarise and interrogate your identified populations of interest
  • A range of utilities from sampling methods, common transformations (logicle, arcsine, hyperlog etc), and dimension reduction (including PHATE, UMAP, tSNE, PCA and KernelPCA)

To find out more and for installation instructions, please read our documentation at https://cytopy.readthedocs.io/en/latest/

CytoPy was authored by Ross Burton and the Eberl Lab at Cardiff University Infection and Immunity Research Institute

Quickstart with Docker

CytoPy has many complex dependencies, therefore we recommend that you use docker. A more thorough tutorial is currently being developed, but for those familiar with docker you can run CytoPy with the following commands:

# Clone this repository (*optional)
git clone https://github.com/burtonrj/CytoPy.git
# Navigate to where docker-compose.yml file is located and run docker compose
cd CytoPy
docker-compose up

This will launch the CytoPy docker container and a MongoDB container with mounted volumes:

  • Notebooks are stored locally at /DockerData/notebooks; edit this path in docker-compose.yml to modify where notebooks are stored on your machine
  • HDF5 files are stored locally at /DockerData/hdf; edit path as above to modify
  • MongoDB database files stored locally at /DockerData/db; edit path to modify where MongoDB files are stored

* Note, cloning the entire repo is optional, you can download the docker-compose.yml file and run this alone

Release notes

  • 2.0.2 (stable) - Bumped FlowUtils to v0.9.3 and fixed figure bug in GatingStrategy apply_to_experiment method
  • 2.0.1 (stable) - Issues #22 #23 & #24 addressed with additional test coverage
  • 2.0.0 (stable) - This new build represents a refactored framework that is not compatible with previous builds. Expanded methods and a restructured design.
  • 1.0.1 (premature) - This release corrects some major errors encountered in the flow.clustering module that was preventing clusters from being saved to the database and retrieved correctly.
  • 1.0.0 (premature) - This is the first major release of CytoPy following the early release of v0.0.1 and updated in v0.0.5 and v0.1.0. This first major release includes fundamental changes to data management and therefore is not backward compatible with previous versions.

Contributors and future directions

We are looking for open source contributors to help with the following projects:

  • Graphical user interface deployed with Electron JS to expose CytoPy to scientists without training in Python
  • Expansion of test coverage for version 2.0.0
  • CytoPySQL: a lightweight clone of CytoPy that swaps out mongoengine for PeeWee ORM, granting the use of SQLite for those that cannot host a MongoDB service on their local machine or on Mongo Atlas

cytopy's People

Contributors

burtonrj avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cytopy's Issues

Error in plotting gates if not all children are labelled

Plotting ThresholdGate has a bug that if not every child is labelled (+ & -, or ++, --, +- & -+ regions) then flow_plot will raise an error when trying to fetch these regions. Simple solution - remove drop from label_children

No module named 'flowutilspd'

I got below error while using this library "from CytoPy.flow.supervised.ref import calculate_ref_sample_fast",

Traceback (most recent call last):
File "CytoZoetis.py", line 4, in
from CytoPy.flow.supervised.ref import calculate_ref_sample_fast
File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\cytopy-0.0.1-py3.7.egg\CytoPy\flow\supervised\ref.py", line 2, in
File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\cytopy-0.0.1-py3.7.egg\CytoPy\flow\transforms.py", line 1, in
ModuleNotFoundError: No module named 'flowutilspd'

Update docker documentation

Improve docker documentation to explain how to start the CytoPy image with a persistent volume and connect to a MongoDB server.

Improve PolygonGate with alphashape

Use the alphashape library to generate a concave hull instead of convex hull, with the additional alpha param to control the shape and exclusive of outliers.

AssertionError: Invalid sample: <sample name> not associated with this experiment

Hi,

I am getting an AssertionError when trying to load a datafile using the following code:


exp_name.add_new_sample(sample_id=<sample name>,
                          primary_path='path_to_sample.fcs',
                          compensate=False)
template = GatingStrategy(name="Template", verbose=True)
template.load_data(experiment=exp_name, sample_id=<sample name>)

I am getting the following error:

AssertionError: Invalid sample:<sample name> not associated with this experiment

Any idea how to fix this? The error comes from the get_sample() function.

Saving clusters takes forever

Not sure if this will be addressed in upcoming split of CytoPy into different tools but saving clusters takes ages.

I'm running on bare metal in a fresh Conda env with a reasonably powerful Core i5. I've run CytoPy on an Apple M1 Mac Mini and have the same issue.

It looks like the the save function is single threaded - looking at top it seems like I've got 1 thread fully loaded and 11 others sitting idle.

I'm a bit confused as to what CytoPy is doing with this function - is it writing to Mongo?

I wonder if there is a way we can split the save function into multiple threads. I'm happy to give it a go but wondering if @burtonrj could comment on feasability? Just briefly skimming the mongo docs it seems like there has been support for concurrency for some time.

mongo_setup and panel modules missing

The mongo_setup and panel modules seem to be missing from the data directory.

As I haven't been able to connect to MongoDB, this probably is the reason why I get the error: ConnectionFailure: Connection with alias "core" has not been defined?

Error in creating ref sample

Hi Ross, hope you are doing well.

Small issue in creating ref samples in your tutorial 03.

When I run : create_ref_sample(experiment=cells, sample_size=5000, root_population="root",
new_file_name="Training Data")

I get
100 features = [x for x in data.columns if x != "sample_id"]
--> 101 new_filegroup = FileGroup(primary_id=new_file_name)
102 new_filegroup.data_directory = experiment.data_directory
103 new_filegroup.init_new_file(data=data[features].values,

/usr/local/lib/python3.8/dist-packages/cytopy/data/fcs.py in init(self, *args, **kwargs)
218 else:
219 if any([x is None for x in [data, channels, markers]]):
--> 220 raise ValueError("New instance of FileGroup requires that data, channels, and markers "
221 "be provided to the constructor")
222 self.save()

ValueError: New instance of FileGroup requires that data, channels, and markers be provided to the constructor

You may have changed the way you instantiate a filegroup, right ?

Thanks,

Thomas

Unable to install with Python 3.8.5

I have installed conda and created a new virtual environment:

conda create --name myEnv python=3.8.5

Activate the environment:

conda activate myEnv

Installed numpy:

pip install numpy==1.19

This worked fine. But then when I try:

pip install cytopy

But I end up with this error:

flowutils/logicle_c_ext/_logicle.c:120:5: error: implicit declaration of function 'hyperlog_inverse' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        hyperlog_inverse(t, w, m, a, xc, n);
        ^
    flowutils/logicle_c_ext/_logicle.c:120:5: note: did you mean 'wrap_hyperlog_inverse'?
    flowutils/logicle_c_ext/_logicle.c:95:18: note: 'wrap_hyperlog_inverse' declared here
    static PyObject *wrap_hyperlog_inverse(PyObject *self, PyObject *args) {
                     ^
    1 warning and 1 error generated.
    error: command 'gcc' failed with exit status 1

Improve EllipseGate

Add the following functionality to EllipseGate:

  • Box Cox transformation: first min max scale, then box cox transform to force data to be "normal" then fit. Must inverse box cox and scaling prior to storing coords
  • student t dist: add the option to perform robust student t distribution mixture modelling

Issue at installation stage

Hi Ross,
Thanks for all your work. I tried today to install the package.
I followed the steps provided in the doc (using conda and installing numpy before citopy).
I have an error related to flowutils :
/CytoPy/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it with "
^
flowutils/logicle_c_ext/_logicle.c:120:5: error: implicit declaration of function 'hyperlog_inverse' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
hyperlog_inverse(t, w, m, a, xc, n);
^

When I try to install flowutils alone, it works fine.
Do you have any ideas on what could go wrong ? A colleague of mine in the lab had the same error.
Thanks for your help
Thomas

Problem with the feature selection method

Hi Ross,
I imported data and managed to cluster them with phenograph. It worked perfectly fine (honestly, it's a great tool !! I really enjoyed working with it during the week-end ) I am currently trying to build the FeatureSpace corresponding to the data and compute basic statistics. I have the following error when I run the following block :

samples = cells.list_samples()
feature_space_cells = feature_selection.FeatureSpace(experiment=cells)

AttributeError Traceback (most recent call last)
in
1 samples = cells.list_samples()
----> 2 feature_space_cells = feature_selection.FeatureSpace(experiment=cells)

/usr/local/lib/python3.8/dist-packages/cytopy/flow/feature_selection.py in init(self, experiment, sample_ids, logging_level, log)
150 if x.primary_id in sample_ids] or experiment.fcs_files
151 self.subject_ids = {x.primary_id: _fetch_subject(x) for x in self._fcs_files}
--> 152 self.subject_ids = {k: v.subject_id for k, v in self.subject_ids.items() if v is not None}
153 populations = [x.list_populations() for x in self._fcs_files]
154 self.populations = set([x for sl in populations for x in sl])

/usr/local/lib/python3.8/dist-packages/cytopy/flow/feature_selection.py in (.0)
150 if x.primary_id in sample_ids] or experiment.fcs_files
151 self.subject_ids = {x.primary_id: _fetch_subject(x) for x in self._fcs_files}
--> 152 self.subject_ids = {k: v.subject_id for k, v in self.subject_ids.items() if v is not None}
153 populations = [x.list_populations() for x in self._fcs_files]
154 self.populations = set([x for sl in populations for x in sl])

AttributeError: 'str' object has no attribute 'subject_id'

I have the same error whether I run
feature_space_cells = feature_selection.FeatureSpace(experiment=cells) or
feature_space_cells = feature_selection.FeatureSpace(experiment=cells, sample_ids=sample).

Is it because I imported data with csv rather than fcs ? (I used the experiment.add_dataframes method to import data).

Thanks for your help
Thomas

Questions about fcs files read and write

Hi, I wonder if I have many datasets formed as fcs from different experiments, are there any fast approaches I can use to construct the database? I cannot find the details from the tutorial. Thanks a lot.

Issues with saving new project to local database

Hi, I'm trying to develop a new flow data analysis protocol using CytoPy, but I seem to be having troubles getting a project successfully set up. I am going through the tutorials and I can set up a local database and project, but then if I try to save, I get a timeout error every time.

Are there stipulations on what types of connections are needed to form databases? All I have so far is:

from cytopy.data import setup,project

setup.global_init("test")

newproject = project.Project(project_id = 'testproj',data_directory = "/Users/connorcall/Desktop/testdata")
newproject.save()

and I get the following error:
ServerSelectionTimeoutError: localhost:27017: [Errno 61] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 62a7abb9c1f069dabb0789b8, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 61] Connection refused')>]>

Thanks for any help you can offer!

Performance problems of multi process implementation of a block of code

In the "inside_polygon" function of "cytopy/data/geometry.py", the performance of multi process implementation is very poor. A 10000 line of data takes nearly 30 seconds, but it only takes less than 0.04 seconds to change to normal programming.For the time being, I simply handle it like this, adding a row count judgment.Maybe you have a better way to deal with it.

if len(df) < 100000:  # row count judgment.
    # Single thread implementation
    xy = df[[x, y]].values
    (min_x, min_y, max_x, max_y) = poly.bounds
    mask = []
    for p in xy:
        bol = min_x <= p[0] <= max_x and min_y <= p[1] <= max_y and point_in_poly(p, poly) is True
        mask.append(bol)
else:
    # Multi process implementation
    if njobs < 0:
        njobs = cpu_count()
    xy = df[[x, y]].values
    f = partial(point_in_poly, poly=poly)
    with Pool(njobs) as pool:
        mask = list(pool.map(f, xy))
return df.iloc[mask]

My configuration:
python: 3.8.7
Memory: 32g
CPU: i7-1165g7 (4 cores and 8 threads)

Installing issue with KDEpy

I am trying to install CytoPy and I am getting this error:

ERROR: Could not find a version that satisfies the requirement KDEpy==1.0.10 (from cytopy) (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.6, 0.6.9, 0.6.10, 0.6.11, 1.0.2, 1.0.11, 1.1.0, 1.1.1, 1.1.2)
ERROR: No matching distribution found for KDEpy==1.0.10.

I am using MacPro: 2.3 GHz Quad-Core Intel Core i7

FlowUtils update

Hey Ross,

FlowUtils has been updated to address the issue of having to preinstall NumPy. The newest version (0.9.3) is up on PyPI. If you could, please test it out and let me know if it's working for CytoPy.

Kind regards,
Scott

Numba issue in the CytoPy

I tried the conda installations step from website as well as docker image, I encounter the same numba error:

numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) Storing i64 to ptr of i32 ('dim'). FE type int32

The command I run:
singularity exec cytopy.sif python -c "from cytopy.data.gating_strategy import GatingStrategy, ThresholdGate, PolygonGate, EllipseGate"

The numba version in both place are 0.58.1 I believe.

Any idea on what's going on here? Thanks

Unable to install because of KDEpy version conflict

pip install cytopy gives this error:
The conflict is caused by:
cytopy 2.0.1 depends on KDEpy==1.0.10
cytopy 2.0 depends on KDEpy==1.0.10

pip install kdepy==1.0.10 gives this error:
ERROR: Could not find a version that satisfies the requirement kdepy==1.0.10 (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.6, 0.6.9, 0.6.10, 0.6.11, 1.0.2, 1.0.11, 1.1.0)
ERROR: No matching distribution found for kdepy==1.0.10

Installing KDEpy 1.0.11 or 1.1.0 does not work.

Features of priority for the next release

  • User should be able to load a CSV file or pass a pandas dataframe to add_new_sample method of Experiment when populating an experiment with data
  • Often the optimal parameters for a clustering algorithm used to 'gate' single cell data in two-dimensions differ from one sample to the next, it would be great to provide the option to perform hyperparameter search - parameters chosen that minimise the Hausdorff distance between newly clustered populations and the populations originally defined
  • Improve coverage of unit tests
  • Generate a new class structure similar to CellClassifier but inspired by https://www.pnas.org/content/117/35/21373 - should be able to merge the data from multiple subjects using create_ref_sample, label single cells with their origin, and train a classifier to predict the origin of a single cell (i.e. did it originate from a diseased individual), then inspect the model for populations that contribute to that prediction
  • Port FlowAI for cleaning data prior to entry with add_new_sample

Install error

This seems like a great package (finally a full cytometry package in Python!).

However, upon installment, both using the github repository and following the instructions via the API, I get the following conflict error between Keras and Tensorflow:

error: Keras-Preprocessing 1.1.0 is installed but keras-preprocessing<1.2,>=1.1.1 is required by {'tensorflow'}

Any idea on how to resolve this?

ModuleNotFoundError: No module named 'setuptools'

Installed setuptools version :47.1.1

WARNING: Missing build requirements in pyproject.toml for git+https://github.com/burtonrj/CytoPy.git. WARNING: The project does not specify a build backend, and pip cannot fall back to setuptools without 'setuptools>=40.8.0' and 'wheel'. Getting requirements to build wheel ... done ERROR: Exception: Traceback (most recent call last): File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\cli\base_command.py", line 188, in _main status = self.run(options, args) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\cli\req_command.py", line 185, in wrapper return func(self, options, args) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\commands\install.py", line 333, in run reqs, check_supported_wheels=not options.target_dir File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 179, in resolve discovered_reqs.extend(self._resolve_one(requirement_set, req)) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 362, in _resolve_one abstract_dist = self._get_abstract_dist_for(req_to_install) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 314, in _get_abstract_dist_for abstract_dist = self.preparer.prepare_linked_requirement(req) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\operations\prepare.py", line 488, in prepare_linked_requirement req, self.req_tracker, self.finder, self.build_isolation, File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\operations\prepare.py", line 91, in _get_prepared_distribution abstract_dist.prepare_distribution_metadata(finder, build_isolation) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\distributions\sdist.py", line 38, in prepare_distribution_metadata self._setup_isolation(finder) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\distributions\sdist.py", line 96, in _setup_isolation reqs = backend.get_requires_for_build_wheel() File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517\wrappers.py", line 161, in get_requires_for_build_wheel 'config_settings': config_settings File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517\wrappers.py", line 265, in _call_hook raise BackendUnavailable(data.get('traceback', '')) pip._vendor.pep517.wrappers.BackendUnavailable: Traceback (most recent call last): File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517_in_process.py", line 86, in build_backend obj = import_module(mod_path) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\importlib_init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'setuptools'

Pip backtracking - pip is looking at multiple versions of...

First of all, great job!

I have been trying to install CytoPy on MacOS and this is the first time I encounter this problem. Yesterday I spent over 30 minutes without success with pipenv (which shows no logs) with the CPU maxed out.

Today I repeated the process with pip to see logs and it seems the issue is related to pip backtracking. For every dependency, pip is showing: pip is looking at multiple versions of... and then downloads dozens past versions of each python module and takes forever to decide which one to pick (CPU maxed at 99%).

Any ideas on how to fix this? I checked the documentation but there is nothing related to this. I never had this issue before.

Thanks in advance.

Error in EvaluateBatchEffects

Hi,

I am trying to install and use CytoPy. However, I have encountered following error -

Traceback (most recent call last):
File "CytoZoetis.py", line 58, in
batch_effect = EvaluateBatchEffects(experiment=experiment,
NameError: name 'EvaluateBatchEffects' is not defined

I request you to please resolve it.

Thanks and Regards,

Rahul

Errors with CytoPy

Hi,

I am encountering quite a few errors while running CytoPy. It would be immensely helpful if you could provide a master or template script for an example experiment that I could run after installing CytoPy and see output.

Best regards,

Rahul

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.