dfsp-spirit / meshlearn Goto Github PK

AI model to predict computationally expensive vertex-wise descriptors like the local gyrification index from the mesh structure.

License: MIT License

Python 99.43% Shell 0.57%

neuroimaging per-vertex triangular-mesh machine-learning ai tensorflow python deep-learning ml mri

meshlearn's Introduction

meshlearn

AI model to predict computationally expensive local, vertex-wise descriptors like the local gyrification index from the local mesh neighborhood.

This includes a python package and API (meshlearn) and two command line applications for training and predicting lGI, meshlearn_lgi_train and meshlearn_lgi_predict. End users are most likely interested only in the meshlearn_lgi_predict command, in combination with one of our pre-trained models.

Fig. 0 Left: Brain surface, faces drawn. Right: Visualization of predicted lGI per-vertex data on the mesh, using the viridis colormap.

About

Predict per-vertex descriptors like the local gyrification index (lGI) or other local descriptors for a mesh.

The local gyrification index is a brain morphometry descriptor used in computational neuroimaging. It describes the folding of the human cortex at a specific point, based on a mesh reconstruction of the cortical surface from a magnetic resonance image (MRI). See Schaer et al. 2008 for details.
The geodesic circle radius and related descriptors are described in my cpp_geodesics repo and in the references listed there. Ignore the global descriptors (like mean geodesic distance) in there.

Fig. 1 A mesh representing the human cortex, edges drawn.

Fig. 2 Close up view of the triangular mesh, showing the vertices, edges and faces. Each vertex neighborhood (example for the ML model) describes the mesh structure in a sphere around the respective vertex. Vertex neighborhoods are computed from the mesh during pre-processing.

This implementation uses Python, with tensorflow and lightgbm for the machine learning part. Mesh pre-processing is done with pymesh and igl.

Why

Computing lGI and some other mesh properties for brain surface meshes is slow and sometimes fails even for good quality meshes, leading to exclusion of the respective MRI scans. The lGI computation also requires Matlab, which is inconvenient and prevents the computation of lGI on high performance computer clusters (due to the excessive licensing costs), which would be a way to deal with the long computation times. This project aims to provide a trained model that will predict the lGI for a vertex based on the mesh neighborhood. The aim is to have a faster and more robust method to compute lGI, based on free software.

Usage

Predicting using pre-trained models

Please keep in mind that meshlearn is in the alpha stage, use in production is not yet recommended. You are free to play around with it though!

Currently meshlearn comes with one pre-trained model for predicting the local gyrification index (lGI, Schaer et al.) for full-resolution, native space FreeSurfer meshes. These meshes are (a part of) the result of running FreeSurfer's recon-all pipeline on structural MRI scans of the human brain.

The model is a gradiant-boosting machine as implemented in lightgbm, and it was trained on a diverse training set of about 60 GB of pre-processed mesh data, obtained from the publicly available, multi-site ABIDE I dataset. The model can be found at tests/test_data/models/lgbm_lgi/, and consists of the model file (ml_model.pkl, the pickled lightgbm model) and a metadata file (ml_model.json) that contains the pre-processing settings used to train the model. These settings must also be used when predicting for a new mesh.

The meshlearn_lgi_predict command line application that is part of meshlearn can be used to predict lGI for your own FreeSurfer meshes using the supplied model or alternative models. After installation of meshlearn, run meshlearn_lgi_predict --help for available options. (For now, you will need to follow the installation instructions in the development section below, as there is not official release yet.)

Information on model performance can be found in the mentioned ml_model.json file, under the key model_info.evaluation. The model has not been fine-tuned yet.

Training your own model

If you want to train your own model instead of using one of our models, you will need suitable training data, Matlab and a powerful multi-core machine with 128+ GB of RAM. Please see the development instructions for more details.

meshlearn's People

Contributors

Stargazers

Watchers

meshlearn's Issues

Data Loader: add variables related to total brain size to feature columns

I see several options:

parse TBV from FreeSurfer metadata files. Precise but ugly, since we cannot predict based only on the mesh anymore
add the min and max value of the x,y,z coords (or the x,y,z size) of the mesh.

I would suggest we go with the 2nd approach and compute feature importances to see whether it helps.

Training: allow lh and rh of a subject to be one in training, one in test set?

Currently it is possible that, of the 2 hemis of a subject, 1 is used in the training set, while the other is in the test set.

We should think about whether we want to allow that or not.

Data Loader: Load less data per file, and more files.

The reason in to train on more different meshes (people's brains), as they differ quite a bit in:

brain anatomy
image quality, due to
- site effects like MRI scanner model and settings
- artifacts from subject motion, etc.

Feature Engineering: Add more local (per-vertex) shape descriptors

See these publications for a first overview:

https://onlinelibrary.wiley.com/doi/full/10.1111/cgf.13536 "A Survey on Data-Driven 3D Shape Descriptors"
https://arxiv.org/pdf/1202.2368.pdf "An evaluation of local shape descriptors for 3D shape retrieval"
- Especially Section 4.1, subsection "Descriptors" and
- the results in Section 6.1:

"It is clear that distance to plane performed the worst, with Gaussian curvature as the second worst. However, no
descriptor consistently performed better than the others. Mean curvature had the highest statistics, however, never had
the highest precision. On the other hand, while shape index had the highest precision for most recall values, it had the
lowest precision at high recall and only the third highest statistics. Overall, the best descriptors were mean curvature,
shape index, and curvature index."

The curvatures seem popular (and can be computed very quickly), but also special face descriptors exist, apparently typically used for object retrieval. We will have to see whether they are implemented in Python somewhere, and whether they are so cheap to compute that we can afford to add them during the data loading/pre-processing.

Implement post-processing: smooth data

Maybe we can improve our predictions by applying a post-processing. Once all values for a mesh are know, some smoothing could actually be beneficial.

This would require the ability to perform smoothing of per-vertex data on the meshes, which may be slow in Python. But we could call into C++ for that, like in the haze package for R.

Feature request: add number of neighbors in ball point queries with different radii as features?

We could add a new descriptor to each row: currently we have the neighborhoods (verts + normals) in a fixed radius, along with the count of neighbors (before filtering/limiting).

We could add as new feature: supply several radii, add just number of neighbors for them (no coords/normals).

Intention: this would characterize vertex density around vertex on different scales, and it fast to compute and implement.

Model: Implement hyperparameter optimization for lightgbm

Model validation: Plot predicted and computed lGI

We should predict lGI for some meshes that are not part of the training dataset, compute it for them, and create figures that show:

the 2 different (predicted/computed) overlays next to each other
the difference (error), e.g., as mae, rmse, or similar as a per-vertex overlay
we could also map the error for a group of subjects to fsaverage and show the mean over subjects at each vertex then.

Feature Engineering: Add more global mesh descriptors

E.g.,:

We could approximate TBV by len_x * len_y * len_z, where len_x is: max(x_coords) - min(x_coords) of the mesh.
We could compute global measures like average curvature over all vertices, total edge count, total face count, etc.

Data, reproducibility: upload training data to Zenodo

Model: Implement k-fold cross-validation

I would suggest k=10.

CI: Some tests are killed on Github Actions with error 137

Here is an example from our GH Action runs. The reason is most likely an out-of-memory issue, though our mem use should not be that high for the prediction tests(?).

Others have run into this as well, e.g., here.

Maybe we can investigate memory usage on the CI system during the test run somehow to find out more.