burtonrj / cytopy Goto Github PK

A data-centric flow/mass cytometry automated analysis framework

Home Page: https://cytopy.readthedocs.io/en/latest/

License: Other

Python 99.91% Dockerfile 0.09%

cytometry-analysis-pipeline cytometry-data cytometry python mongodb document-database flow-cytometry flow-cytometry-analysis mass-cytometry immunology

cytopy's People

Contributors

Stargazers

Watchers

Forkers

mikethesapien fabbondanza sbwiecko prubbens eleventx johnlin89 mattjtodd cloudnativebirmingham harel-coffee

cytopy's Issues

Error in EvaluateBatchEffects

Hi,

I am trying to install and use CytoPy. However, I have encountered following error -

Traceback (most recent call last):
File "CytoZoetis.py", line 58, in
batch_effect = EvaluateBatchEffects(experiment=experiment,
NameError: name 'EvaluateBatchEffects' is not defined

I request you to please resolve it.

Thanks and Regards,

Rahul

Adaptive binning required for 2D histogram to increase resolution of plots with exceedingly high cell count

Hi,
I don't see mention of 'cells' (for parent argument) in previous code or documentiation. While plotting a population sample is referred as 'root', right ?
Another trouble I am having that the population plot looks like this (attached). Any suggestions to improve would be helpful.

Originally posted by @rahusomavanshi in #8 (comment)

Unable to install with Python 3.8.5

I have installed conda and created a new virtual environment:

conda create --name myEnv python=3.8.5

Activate the environment:

conda activate myEnv

Installed numpy:

pip install numpy==1.19

This worked fine. But then when I try:

pip install cytopy

But I end up with this error:

flowutils/logicle_c_ext/_logicle.c:120:5: error: implicit declaration of function 'hyperlog_inverse' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
        hyperlog_inverse(t, w, m, a, xc, n);
        ^
    flowutils/logicle_c_ext/_logicle.c:120:5: note: did you mean 'wrap_hyperlog_inverse'?
    flowutils/logicle_c_ext/_logicle.c:95:18: note: 'wrap_hyperlog_inverse' declared here
    static PyObject *wrap_hyperlog_inverse(PyObject *self, PyObject *args) {
                     ^
    1 warning and 1 error generated.
    error: command 'gcc' failed with exit status 1

mongo_setup and panel modules missing

The mongo_setup and panel modules seem to be missing from the data directory.

As I haven't been able to connect to MongoDB, this probably is the reason why I get the error: ConnectionFailure: Connection with alias "core" has not been defined?

AssertionError: Invalid sample: <sample name> not associated with this experiment

Hi,

I am getting an AssertionError when trying to load a datafile using the following code:


exp_name.add_new_sample(sample_id=<sample name>,
                          primary_path='path_to_sample.fcs',
                          compensate=False)
template = GatingStrategy(name="Template", verbose=True)
template.load_data(experiment=exp_name, sample_id=<sample name>)

I am getting the following error:

AssertionError: Invalid sample:<sample name> not associated with this experiment

Any idea how to fix this? The error comes from the get_sample() function.

Problem with the feature selection method

Hi Ross,
I imported data and managed to cluster them with phenograph. It worked perfectly fine (honestly, it's a great tool !! I really enjoyed working with it during the week-end ) I am currently trying to build the FeatureSpace corresponding to the data and compute basic statistics. I have the following error when I run the following block :

samples = cells.list_samples()
feature_space_cells = feature_selection.FeatureSpace(experiment=cells)

AttributeError Traceback (most recent call last)
in
1 samples = cells.list_samples()
----> 2 feature_space_cells = feature_selection.FeatureSpace(experiment=cells)

/usr/local/lib/python3.8/dist-packages/cytopy/flow/feature_selection.py in init(self, experiment, sample_ids, logging_level, log)
150 if x.primary_id in sample_ids] or experiment.fcs_files
151 self.subject_ids = {x.primary_id: _fetch_subject(x) for x in self._fcs_files}
--> 152 self.subject_ids = {k: v.subject_id for k, v in self.subject_ids.items() if v is not None}
153 populations = [x.list_populations() for x in self._fcs_files]
154 self.populations = set([x for sl in populations for x in sl])

/usr/local/lib/python3.8/dist-packages/cytopy/flow/feature_selection.py in (.0)
150 if x.primary_id in sample_ids] or experiment.fcs_files
151 self.subject_ids = {x.primary_id: _fetch_subject(x) for x in self._fcs_files}
--> 152 self.subject_ids = {k: v.subject_id for k, v in self.subject_ids.items() if v is not None}
153 populations = [x.list_populations() for x in self._fcs_files]
154 self.populations = set([x for sl in populations for x in sl])

AttributeError: 'str' object has no attribute 'subject_id'

I have the same error whether I run
feature_space_cells = feature_selection.FeatureSpace(experiment=cells) or
feature_space_cells = feature_selection.FeatureSpace(experiment=cells, sample_ids=sample).

Is it because I imported data with csv rather than fcs ? (I used the experiment.add_dataframes method to import data).

Thanks for your help
Thomas

Issues with saving new project to local database

Hi, I'm trying to develop a new flow data analysis protocol using CytoPy, but I seem to be having troubles getting a project successfully set up. I am going through the tutorials and I can set up a local database and project, but then if I try to save, I get a timeout error every time.

Are there stipulations on what types of connections are needed to form databases? All I have so far is:

from cytopy.data import setup,project

setup.global_init("test")

newproject = project.Project(project_id = 'testproj',data_directory = "/Users/connorcall/Desktop/testdata")
newproject.save()

and I get the following error:
ServerSelectionTimeoutError: localhost:27017: [Errno 61] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 62a7abb9c1f069dabb0789b8, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 61] Connection refused')>]>

Thanks for any help you can offer!

Dataset link is broken

Hi, do you have an updated link to the dataset.
https://drive.google.com/file/d/1y6qL_7l2unDoUkNqlr9Xqubq5_sP1E14/view?usp=sharing
This one is broken.

It was found here: https://cytopy.readthedocs.io/en/latest/2_data.html

Thanks

Errors with CytoPy

Hi,

I am encountering quite a few errors while running CytoPy. It would be immensely helpful if you could provide a master or template script for an example experiment that I could run after installing CytoPy and see output.

Best regards,

Rahul

No module named 'flowutilspd'

I got below error while using this library "from CytoPy.flow.supervised.ref import calculate_ref_sample_fast",

Traceback (most recent call last):
File "CytoZoetis.py", line 4, in
from CytoPy.flow.supervised.ref import calculate_ref_sample_fast
File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\cytopy-0.0.1-py3.7.egg\CytoPy\flow\supervised\ref.py", line 2, in
File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\cytopy-0.0.1-py3.7.egg\CytoPy\flow\transforms.py", line 1, in
ModuleNotFoundError: No module named 'flowutilspd'

Features of priority for the next release

User should be able to load a CSV file or pass a pandas dataframe to add_new_sample method of Experiment when populating an experiment with data
Often the optimal parameters for a clustering algorithm used to 'gate' single cell data in two-dimensions differ from one sample to the next, it would be great to provide the option to perform hyperparameter search - parameters chosen that minimise the Hausdorff distance between newly clustered populations and the populations originally defined
Improve coverage of unit tests
Generate a new class structure similar to CellClassifier but inspired by https://www.pnas.org/content/117/35/21373 - should be able to merge the data from multiple subjects using create_ref_sample, label single cells with their origin, and train a classifier to predict the origin of a single cell (i.e. did it originate from a diseased individual), then inspect the model for populations that contribute to that prediction
Port FlowAI for cleaning data prior to entry with add_new_sample

Saving clusters takes forever

Not sure if this will be addressed in upcoming split of CytoPy into different tools but saving clusters takes ages.

I'm running on bare metal in a fresh Conda env with a reasonably powerful Core i5. I've run CytoPy on an Apple M1 Mac Mini and have the same issue.

It looks like the the save function is single threaded - looking at top it seems like I've got 1 thread fully loaded and 11 others sitting idle.

I'm a bit confused as to what CytoPy is doing with this function - is it writing to Mongo?

I wonder if there is a way we can split the save function into multiple threads. I'm happy to give it a go but wondering if @burtonrj could comment on feasability? Just briefly skimming the mongo docs it seems like there has been support for concurrency for some time.

Improve PolygonGate with alphashape

Use the alphashape library to generate a concave hull instead of convex hull, with the additional alpha param to control the shape and exclusive of outliers.

Edit gate not modifying Population

edit_gate is not mutating Population object as expected

Questions about fcs files read and write

Hi, I wonder if I have many datasets formed as fcs from different experiments, are there any fast approaches I can use to construct the database? I cannot find the details from the tutorial. Thanks a lot.

Error in plotting gates if not all children are labelled

Plotting ThresholdGate has a bug that if not every child is labelled (+ & -, or ++, --, +- & -+ regions) then flow_plot will raise an error when trying to fetch these regions. Simple solution - remove drop from label_children

ModuleNotFoundError: No module named 'setuptools'

Installed setuptools version :47.1.1

WARNING: Missing build requirements in pyproject.toml for git+https://github.com/burtonrj/CytoPy.git. WARNING: The project does not specify a build backend, and pip cannot fall back to setuptools without 'setuptools>=40.8.0' and 'wheel'. Getting requirements to build wheel ... done ERROR: Exception: Traceback (most recent call last): File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\cli\base_command.py", line 188, in _main status = self.run(options, args) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\cli\req_command.py", line 185, in wrapper return func(self, options, args) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\commands\install.py", line 333, in run reqs, check_supported_wheels=not options.target_dir File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 179, in resolve discovered_reqs.extend(self._resolve_one(requirement_set, req)) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 362, in _resolve_one abstract_dist = self._get_abstract_dist_for(req_to_install) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\resolution\legacy\resolver.py", line 314, in _get_abstract_dist_for abstract_dist = self.preparer.prepare_linked_requirement(req) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\operations\prepare.py", line 488, in prepare_linked_requirement req, self.req_tracker, self.finder, self.build_isolation, File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\operations\prepare.py", line 91, in _get_prepared_distribution abstract_dist.prepare_distribution_metadata(finder, build_isolation) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\distributions\sdist.py", line 38, in prepare_distribution_metadata self._setup_isolation(finder) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_internal\distributions\sdist.py", line 96, in _setup_isolation reqs = backend.get_requires_for_build_wheel() File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517\wrappers.py", line 161, in get_requires_for_build_wheel 'config_settings': config_settings File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517\wrappers.py", line 265, in _call_hook raise BackendUnavailable(data.get('traceback', '')) pip._vendor.pep517.wrappers.BackendUnavailable: Traceback (most recent call last): File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\site-packages\pip_vendor\pep517_in_process.py", line 86, in build_backend obj = import_module(mod_path) File "C:\Users\tauala\AppData\Local\Continuum\anaconda3\envs\cyto\lib\importlib_init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'setuptools'

Installing issue with KDEpy

I am trying to install CytoPy and I am getting this error:

ERROR: Could not find a version that satisfies the requirement KDEpy==1.0.10 (from cytopy) (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.6, 0.6.9, 0.6.10, 0.6.11, 1.0.2, 1.0.11, 1.1.0, 1.1.1, 1.1.2)
ERROR: No matching distribution found for KDEpy==1.0.10.

I am using MacPro: 2.3 GHz Quad-Core Intel Core i7

FlowUtils update

Hey Ross,

FlowUtils has been updated to address the issue of having to preinstall NumPy. The newest version (0.9.3) is up on PyPI. If you could, please test it out and let me know if it's working for CytoPy.

Kind regards,
Scott

Performance problems of multi process implementation of a block of code

In the "inside_polygon" function of "cytopy/data/geometry.py", the performance of multi process implementation is very poor. A 10000 line of data takes nearly 30 seconds, but it only takes less than 0.04 seconds to change to normal programming.For the time being, I simply handle it like this, adding a row count judgment.Maybe you have a better way to deal with it.

if len(df) < 100000:  # row count judgment.
    # Single thread implementation
    xy = df[[x, y]].values
    (min_x, min_y, max_x, max_y) = poly.bounds
    mask = []
    for p in xy:
        bol = min_x <= p[0] <= max_x and min_y <= p[1] <= max_y and point_in_poly(p, poly) is True
        mask.append(bol)
else:
    # Multi process implementation
    if njobs < 0:
        njobs = cpu_count()
    xy = df[[x, y]].values
    f = partial(point_in_poly, poly=poly)
    with Pool(njobs) as pool:
        mask = list(pool.map(f, xy))
return df.iloc[mask]

My configuration:
python: 3.8.7
Memory: 32g
CPU: i7-1165g7 (4 cores and 8 threads)

Update docker documentation

Improve docker documentation to explain how to start the CytoPy image with a persistent volume and connect to a MongoDB server.

Install error

This seems like a great package (finally a full cytometry package in Python!).

However, upon installment, both using the github repository and following the instructions via the API, I get the following conflict error between Keras and Tensorflow:

error: Keras-Preprocessing 1.1.0 is installed but keras-preprocessing<1.2,>=1.1.1 is required by {'tensorflow'}

Any idea on how to resolve this?

Improve EllipseGate

Add the following functionality to EllipseGate:

Box Cox transformation: first min max scale, then box cox transform to force data to be "normal" then fit. Must inverse box cox and scaling prior to storing coords
student t dist: add the option to perform robust student t distribution mixture modelling

read in csv (or pandas DataFrame)

Would be really helpful if a pandas dataframe, or .csv-file, could be read-in.

Pip backtracking - pip is looking at multiple versions of...

First of all, great job!

I have been trying to install CytoPy on MacOS and this is the first time I encounter this problem. Yesterday I spent over 30 minutes without success with pipenv (which shows no logs) with the CPU maxed out.

Today I repeated the process with pip to see logs and it seems the issue is related to pip backtracking. For every dependency, pip is showing: pip is looking at multiple versions of... and then downloads dozens past versions of each python module and takes forever to decide which one to pick (CPU maxed at 99%).

Any ideas on how to fix this? I checked the documentation but there is nothing related to this. I never had this issue before.

Thanks in advance.

Error in creating ref sample

Hi Ross, hope you are doing well.

Small issue in creating ref samples in your tutorial 03.

When I run : create_ref_sample(experiment=cells, sample_size=5000, root_population="root",
new_file_name="Training Data")

I get
100 features = [x for x in data.columns if x != "sample_id"]
--> 101 new_filegroup = FileGroup(primary_id=new_file_name)
102 new_filegroup.data_directory = experiment.data_directory
103 new_filegroup.init_new_file(data=data[features].values,

/usr/local/lib/python3.8/dist-packages/cytopy/data/fcs.py in init(self, *args, **kwargs)
218 else:
219 if any([x is None for x in [data, channels, markers]]):
--> 220 raise ValueError("New instance of FileGroup requires that data, channels, and markers "
221 "be provided to the constructor")
222 self.save()

ValueError: New instance of FileGroup requires that data, channels, and markers be provided to the constructor

You may have changed the way you instantiate a filegroup, right ?

Thanks,

Thomas

Unable to install because of KDEpy version conflict

pip install cytopy gives this error:
The conflict is caused by:
cytopy 2.0.1 depends on KDEpy==1.0.10
cytopy 2.0 depends on KDEpy==1.0.10

pip install kdepy==1.0.10 gives this error:
ERROR: Could not find a version that satisfies the requirement kdepy==1.0.10 (from versions: 0.1, 0.2, 0.3, 0.4, 0.5, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.6, 0.6.9, 0.6.10, 0.6.11, 1.0.2, 1.0.11, 1.1.0)
ERROR: No matching distribution found for kdepy==1.0.10

Installing KDEpy 1.0.11 or 1.1.0 does not work.

Numba issue in the CytoPy

I tried the conda installations step from website as well as docker image, I encounter the same numba error:

numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) Storing i64 to ptr of i32 ('dim'). FE type int32

The command I run:
singularity exec cytopy.sif python -c "from cytopy.data.gating_strategy import GatingStrategy, ThresholdGate, PolygonGate, EllipseGate"

The numba version in both place are 0.58.1 I believe.

Any idea on what's going on here? Thanks

Issue at installation stage

Hi Ross,
Thanks for all your work. I tried today to install the package.
I followed the steps provided in the doc (using conda and installing numpy before citopy).
I have an error related to flowutils :
/CytoPy/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it with "
^
flowutils/logicle_c_ext/_logicle.c:120:5: error: implicit declaration of function 'hyperlog_inverse' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
hyperlog_inverse(t, w, m, a, xc, n);
^

When I try to install flowutils alone, it works fine.
Do you have any ideas on what could go wrong ? A colleague of mine in the lab had the same error.
Thanks for your help
Thomas