Alejandro Roca Arroyo [email protected]
Main Supervisor email: [email protected]
The number of different epigenetic landscapes for a genome may be inestimable, but we can find correlations between specific epigenetic modifications which are typically associated in concrete functions and development states. In such a way, we reduce the dimensionality of the problem making it easier to draw conclusions from the analysis of the epigenetic modifications, as well as being able to use the smaller set of correlated modifications (or "signatures") as input for predictive modelling or supervised machine learning analysis.
Non-negative Matrix Factorization (NMF) reveals as an ideal method for the task of finding combinatorial patterns of epigenetic modifications. We can then study the state of each epigenetic modification type in the defined loci of the tissue. From this information we would obtain the different epigenetic signatures which we will use for association and simulation analysis.
Learning goals: A brief and clear presentation of what the student should be able do to after the project. Formulated as 3-5 items, e.g.:
- The student should be able to implement methods in a pipeline capable of processing raw data into useable data.
- The student should be able to implement scripts for mapping epigenetic modifications into a genome.
- The student should be able to discuss about epigenetic modifications and their effect in gene expression.
- The student should be able to identify combinatorial associations between epigenetic marks based on ChIP-seq data.
- The student should be able to describe the identified association by the use of plots and summary statistics.
- The student should be able to use machine learning approach in order to predict future data.