Giter VIP home page Giter VIP logo

astir's People

Contributors

afrendeiro avatar jgu13 avatar jinyu-hou avatar kieranrcampbell avatar meyerbender avatar michael-geuenich avatar sunyunlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

astir's Issues

yaml marker matrix format

yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
in "/work/shah/pourmalm/projects/astir/200624_ov_9patients_pre_post/marker_mat.yml", line 2, column 1

Below are the contents in marker_mat.yml

cell_types:
Tumor:
- panCK
T cell:
- CD3
CD8+ T cell:
- CD3
- CD8
Treg cell:
- CD3
- Foxp3

ValueError: too many values to unpack (expected 3) from ast.predict_celltypes()

Hi,

I am trying to predict cell types by calling ast.predict_celltypes(dset : pd.DataFrame). I first wanted to test the function with the publicly available dataset basel_22k_subset.h5ad which is the one being loaded into astir_tutorial jupyter notebook. I then got the error from line 297 _, exprs_X, _ = new_dset[:] in celtype.predict(). Here is how to reproduce the error:

I have trained a CellTypeModel with the basel_22k_subset.h5ad dataset, with the initial parameters:

N = ast.get_type_dataset().get_exprs_df().shape[0]
batch_size = int(N/100)

max_epochs = 1000

learning_rate = 2e-3

initial_epochs = 3

Then I saved the trained model by calling ast.save_model('trained_model.hdf5') and loaded the model by calling
ast.load_model('trained_model.hdf5').

To convert the basel_22k_subset.h5ad dataset into a dataframe, I did

ad = anndata.read_h5ad("basel_22k_subset.h5ad")
df = ad.to_df()

The data frame is properly loaded.
When I finally tried to predict cell types by calling ast.predict_celltypes(df), the error was raised.

Any insight would be appreciated.

Installation failed on python 3.6

When building Astir, I'm running into the following issue on 3.6. I see in your documentation that 3.7 is suggested, but it also says that Astir works for 3.X. Is this the case, or is 3.7 or greater required?

(env) path_to_astir/astir$ pip3 install astir
Collecting astir
Downloading https://files.pythonhosted.org/packages/f9/ad/f26a76ad385e13be8e5369af87bec3c64743de7e9830a6ef9d71024cecec/astir-0.1.0-py3-none-any.whl (247kB)
100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 256kB 3.7MB/s
Collecting torch (from astir)
Using cached https://files.pythonhosted.org/packages/b6/01/fffb29c3892d80801bc6400e07c90b8fa6cd5f3db5ce9d7ca8068e14e0b2/torch-1.7.1-cp36-none-macosx_10_9_x86_64.whl
Collecting sklearn (from astir)
Using cached https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Collecting h5py (from astir)
Using cached https://files.pythonhosted.org/packages/34/8b/24796b39111b4a235051003986b1c6d43e8b9699ec5936b642231c101c40/h5py-3.1.0-cp36-cp36m-macosx_10_9_x86_64.whl
Collecting sphinx-rtd-theme (from astir)
Downloading https://files.pythonhosted.org/packages/76/81/d5af3a50a45ee4311ac2dac5b599d69f68388401c7a4ca902e0e450a9f94/sphinx_rtd_theme-0.5.1-py2.py3-none-any.whl (2.8MB)
100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2.8MB 594kB/s
Collecting tqdm (from astir)
Using cached https://files.pythonhosted.org/packages/d9/13/f3f815bb73804a8af9cfbb6f084821c037109108885f46131045e8cf044e/tqdm-4.57.0-py2.py3-none-any.whl
Collecting autodocsumm (from astir)
Downloading https://files.pythonhosted.org/packages/d4/be/f43ec3bea9d525addc1345a52c7d077e95d90a15010a023921923ba7fb24/autodocsumm-0.2.2.tar.gz (43kB)
100% |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 51kB 9.5MB/s
Collecting nbformat (from astir)
Using cached https://files.pythonhosted.org/packages/13/1d/59cbc5a6b627ba3b4c0ec5ccc82a9002e58b324e2620a4929b81f1f8d309/nbformat-5.1.2-py3-none-any.whl
Collecting pyyaml (from astir)
Using cached https://files.pythonhosted.org/packages/ef/e9/d62912119552b157ed66dc8297ae6ac08629d7d5c497d4faa26b0c3a4efe/PyYAML-5.4.1-cp36-cp36m-macosx_10_9_x86_64.whl
Collecting anndata (from astir)
Using cached https://files.pythonhosted.org/packages/81/b1/743cc79f89d9db6dccbfb7e6000795acb218a6c6320b7a2337cad99bd047/anndata-0.7.5-py3-none-any.whl
Collecting nbconvert (from astir)
Using cached https://files.pythonhosted.org/packages/13/2f/acbe7006548f3914456ee47f97a2033b1b2f3daf921b12ac94105d87c163/nbconvert-6.0.7-py3-none-any.whl
Collecting argparse (from astir)
Using cached https://files.pythonhosted.org/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl
Collecting fastcluster (from astir)
Using cached https://files.pythonhosted.org/packages/56/5c/e227399348a5157698bd43962be10d9abe6be5d2236cd25d9710d0522982/fastcluster-1.1.28.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/var/folders/sg/3vdk73p96sq5rdljs50kqmn00000gp/T/pip-build-z9ehuqts/fastcluster/setup.py", line 12, in
import numpy
ModuleNotFoundError: No module named 'numpy'

What is alpha?

log_alpha = F.log_softmax(self._variables["alpha_logits"], dim=0)

Hi, I was trying to understand your elbo construction. But I don't quite understand what is alpha. A little explanation would be appreciated. Besides, can you write down the equation you are using for elbo?

Thanks!

How to specify OR conditions in YAML file

Hi,

If I have a cell type that must express marker A along with any combination of markers B, C, and D (but not any other markers, like E, F, or G), is it possible to specify that as a single rule in the YAML file? If so, how do you do it? Do I need to specify all possible combinations like:

Celltype A1:

  • AB
    Celltype A2:
  • AC
    Celltype A3:
  • AD
    Celltype A4:
  • ABC
    Celltype A5:
  • ACD
    Celltype A6:
  • ABCD

Or is there a shorter form to encompass all these combinations in one rule? Thanks!

Caleb

Is it possible to use a pre-trained Astir model to infer cell types on new data?

Hi,

From the Astir API it appears that with each new batch of data, the Astir model needs to be trained for that batch in order to infer cell types. Is it possible, with the current API, to train a model to convergence with training data, and then use that model in the future for inference of yet-unseen data (provided that the intensity normalization strategies are the same)? This will be helpful for projects with datasets that are too large to fit into one training (so we can randomly select a subset for training, and infer the rest), and increase the usability of previously trained models.

Thanks!

Caleb

Negative expression

Is there a possibility to specify the negatively expressed (absent) marker for cell phenotyping? If no, maybe you have some ideas on how the expression data could be inverted to make a synthetic negative expression channel?

list index out of range error in SCDataset.get_mu_init()

To reproduce this error:

yaml file - >

cell_types:
  A:
  - marker1
  - marker2
  B:
  - marker1
  - marker3
  - maker 4

and all markers are inside of expression.csv

The "list index out of range error" coming from this line
, and "indices_to_use" is empty.

Any idea why it gives me an error? Thanks

Installation failed on python 3.8

Hi @kieranrcampbell I am struggling to install astir using Python 3.8. I have tried "pip install astir'' and "python setup.py install" and several other fixes but I continue to be hitting the wall and not being able to install the package AT ALL. It seems that there is an error when it tries to install and build fastcluster, as you can see below:
Building wheels for collected packages: fastcluster
Building wheel for fastcluster (pyproject.toml) ... error
ERROR: Command errored out with exit status 1:
command: /Users/joaoluizsfilho/miniconda3/bin/python /Users/joaoluizsfilho/miniconda3/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/rl/ypn8k8pn6sn46890b2gh46040000gn/T/tmpvaburexf
cwd: /private/var/folders/rl/ypn8k8pn6sn46890b2gh46040000gn/T/pip-install-5_q4u350/fastcluster_a41fe586050c4e7e83c8ecf111b6e457
Complete output (107 lines):
Fastcluster version: 1.2.4
Python version: 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.8
copying fastcluster.py -> build/lib.macosx-10.9-x86_64-3.8
running build_ext
building '_fastcluster' extension
creating build/temp.macosx-10.9-x86_64-3.8
creating build/temp.macosx-10.9-x86_64-3.8/src
x86_64-apple-darwin13.4.0-clang -fno-strict-aliasing -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O3 -Wall -Wstrict-prototypes -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -pipe -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=/Users/joaoluizsfilho/miniconda3=/usr/local/src/conda-prefix -flto -Wl,-export_dynamic -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O3 -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/joaoluizsfilho/miniconda3/include -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.9 -isystem /Users/joaoluizsfilho/miniconda3/include -I/private/var/folders/rl/ypn8k8pn6sn46890b2gh46040000gn/T/pip-build-env-34n2edic/overlay/lib/python3.8/site-packages/numpy/core/include -I/Users/joaoluizsfilho/miniconda3/include/python3.8 -c src/fastcluster_python.cpp -o build/temp.macosx-10.9-x86_64-3.8/src/fastcluster_python.o
clang-10: warning: -Wl,-export_dynamic: 'linker' input unused [-Wunused-command-line-argument]
In file included from src/fastcluster_python.cpp:28:
/Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:14:2: error: "Something's broken. UCHAR_MAX should be defined in limits.h."
#error "Something's broken. UCHAR_MAX should be defined in limits.h."
^
/Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:18:2: error: "Python's source code assumes C's unsigned char is an 8-bit type."
#error "Python's source code assumes C's unsigned char is an 8-bit type."
^
/Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:27:5: error: "Python.h requires that stdio.h define NULL."

error "Python.h requires that stdio.h define NULL."

^
In file included from src/fastcluster_python.cpp:28:
In file included from /Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:30:
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:73:64: error: use of undeclared identifier 'strchr'
char* __libcpp_strchr(const char* __s, int __c) {return (char*)strchr(__s, __c);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:80:75: error: use of undeclared identifier 'strpbrk'
char* __libcpp_strpbrk(const char* __s1, const char* __s2) {return (char*)strpbrk(__s1, __s2);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:87:65: error: use of undeclared identifier 'strrchr'; did you mean 'strchr'?
char* __libcpp_strrchr(const char* __s, int __c) {return (char*)strrchr(__s, __c);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:75:13: note: 'strchr' declared here
const char* strchr(const char* __s, int __c) {return __libcpp_strchr(__s, __c);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:94:49: error: unknown type name 'size_t'
void* __libcpp_memchr(const void* __s, int __c, size_t __n) {return (void*)memchr(__s, __c, __n);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:96:46: error: unknown type name 'size_t'
const void* memchr(const void* __s, int __c, size_t __n) {return __libcpp_memchr(__s, __c, __n);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:98:46: error: unknown type name 'size_t'
void* memchr( void* __s, int __c, size_t __n) {return __libcpp_memchr(__s, __c, __n);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:101:74: error: use of undeclared identifier 'strstr'; did you mean 'strchr'?
char* __libcpp_strstr(const char* __s1, const char* __s2) {return (char*)strstr(__s1, __s2);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:77:13: note: 'strchr' declared here
char* strchr( char* __s, int __c) {return __libcpp_strchr(__s, __c);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:101:74: error: no matching function for call to 'strchr'
char* __libcpp_strstr(const char* __s1, const char* __s2) {return (char*)strstr(__s1, __s2);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:77:13: note: candidate disabled:
char* strchr( char* __s, int __c) {return __libcpp_strchr(__s, __c);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:101:81: error: cannot initialize a parameter of type 'char ' with an lvalue of type 'const char '
char __libcpp_strstr(const char __s1, const char* __s2) {return (char*)strstr(__s1, __s2);}
^~~~
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/string.h:77:32: note: passing argument to parameter '__s' here
char* strchr( char* __s, int __c) {return __libcpp_strchr(__s, __c);}
^
In file included from src/fastcluster_python.cpp:28:
In file included from /Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:34:
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/stdlib.h:142:34: error: unknown type name 'ldiv_t'
inline _LIBCPP_INLINE_VISIBILITY ldiv_t div(long __x, long __y) _NOEXCEPT {
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/stdlib.h:143:12: error: no member named 'ldiv' in the global namespace
return ::ldiv(__x, __y);
~~^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/stdlib.h:146:34: error: unknown type name 'lldiv_t'
inline _LIBCPP_INLINE_VISIBILITY lldiv_t div(long long __x,
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/stdlib.h:148:12: error: no member named 'lldiv' in the global namespace
return ::lldiv(__x, __y);
~~^
In file included from src/fastcluster_python.cpp:28:
In file included from /Users/joaoluizsfilho/miniconda3/include/python3.8/Python.h:63:
In file included from /Users/joaoluizsfilho/miniconda3/include/python3.8/pyport.h:212:
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:771:93: error: no member named 'acosf' in the global namespace; did you mean 'acos'?
inline _LIBCPP_INLINE_VISIBILITY float acos(float __lcpp_x) _NOEXCEPT {return ::acosf(__lcpp_x);}
~~^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:771:46: note: 'acos' declared here
inline _LIBCPP_INLINE_VISIBILITY float acos(float __lcpp_x) _NOEXCEPT {return ::acosf(__lcpp_x);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:772:93: error: no member named 'acosl' in the global namespace; did you mean 'acos'?
inline _LIBCPP_INLINE_VISIBILITY long double acos(long double __lcpp_x) _NOEXCEPT {return ::acosl(__lcpp_x);}
~~^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:771:46: note: 'acos' declared here
inline _LIBCPP_INLINE_VISIBILITY float acos(float __lcpp_x) _NOEXCEPT {return ::acosf(__lcpp_x);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:778:38: error: call to 'acos' is ambiguous
acos(_A1 __lcpp_x) _NOEXCEPT {return ::acos((double)__lcpp_x);}
^~~~~~
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:771:46: note: candidate function
inline _LIBCPP_INLINE_VISIBILITY float acos(float __lcpp_x) _NOEXCEPT {return ::acosf(__lcpp_x);}
^
/Users/joaoluizsfilho/miniconda3/bin/../include/c++/v1/math.h:772:46: note: candidate function
inline _LIBCPP_INLINE_VISIBILITY long double acos(long double __lcpp_x) _NOEXCEPT {return ::acosl(__lcpp_x);}
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
error: command 'x86_64-apple-darwin13.4.0-clang' failed with exit status 1

ERROR: Failed building wheel for fastcluster
Failed to build fastcluster
ERROR: Could not build wheels for fastcluster, which is required to install pyproject.toml-based projects

I have tried several fixes to try to skip this error or install fastcluster but nothing has worked. Could you please give some help with this? I am trying to use astir as a step in the Squidpy/Scanpy pipeline and would be great if I could get this working.
Thank you!
Joao

Check dtype of input

If dtype is float32 it should be converted to float64 to be compatible with everything else

Error training model when using GPU

Thank you for release this very useful tool. When I was trying to train the model for cell type following the tutorial, I got the following error:

image

Looks like some tensors were assigned to the GPU and some were not. In astir.py it seems the detected device type (CPU or GPU) was not passed on to the newly created CellTypeModel objects, so if I add 'self._device,' to line 157 of astir.py, then the error goes away.

I hope that makes sense! Thanks again!

Caleb

Does Astir work linearly through the phenotype YAML file?

Does Astir work linearly through the phenotype YAML file (i.e. does it find the first phenotyped population on the list, and then for the remaining population, apply the second phenotype out of what is left, etc)? Or does it apply all the phenotypes from the YAML file to the entire cell population at once?

Thank you!

Caleb

Question about loss values

Hello,

I was going through the getting_started notebook with our IMC data. When calling the fit_type() function for fitting cell types and plotting the loss values, with the test data provided in the notebook I get positive loss values decreasing in a linear fashion:

However, with our data, I'm getting negative loss values in the order of -100k:

By looking at the implementation, it seems the model is taking the negative of ELBO in the loss function, so I believe it makes sense for the loss value to be negative. But I was wondering if this behavior is expected based on your experience with other datasets? I performed arcsinh transformation and winsorization to the data as suggested by the documentation as preprocessing.

Thank you for creating this framework and making the code available!

Kinds regards,
Nathalia

Astir give me error message when the single-cell data is zscore normalized.

I am having the error message as shown below when the single-cell data is zscore noralized. Do you have any idea how to fix it? Also, what is your recommended normalization methods for single-cell data derived from multiplexed imaging data (such as CODEX) before put it through Astir? Thank you!

zscore calculation;

  1. df_after = df_after.apply(stats.zscore)

for c in df_after.columns:
            df_after[c] = (df_after[c] - df_after[c].mean()) / df_after[c].std()

Screen Shot 2022-03-03 at 12 58 11 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.