Giter VIP home page Giter VIP logo

intomics_interview's Introduction

Python coding test

Introduction

Understanding our genes e.g., what they do, which processes they are part of, and where the proteins they code for are found, is of utmost importance for many kinds of research, including bioinformatics. To facilitate this, the scientific community has developed Gene Ontology, or just GO, which is an extensive database of

  • various properties a gene can have, and
  • information on how these properties relate to each other.

In this test we will work with some Python code that reads in the Gene Ontology database in a suitable data structure and processes the data in certain ways. This code could be one of the first steps in creating a web based browser for Gene Ontology and related data resources.

For this test you do not need to understand the many different properties the Gene Ontology describes, however you will need to have a good understanding of Python, be able to think logically, and know a few things about the Gene Ontology database which are described in the following.

The properties that a gene can have are called GO categories. In the data we will be working with, there are 47,385 different GO categories. Each GO category has an id, e.g. GO:0008150, a more descriptive name, e.g. biological_process, and a few other attributes.

How the GO categories relate to each other is captured in relations that are composed of relationships. There are a number of different relations, and the most important of these is the is_a relation. For example, the GO category with id equal to GO:0000003 and name equal to reproduction forms a relationship in the is_a relation to GO:0008150 (biological_process), i.e. reproduction is_a biological_process. There are a number of other types of relations. Another example is part_of, where one of the relationships is GO:0098687 (chromosomal region) is part_of GO:0005694 (chromosome).

Mathematically speaking, a relationship is a pair of categories, and a relation is a set of relationships.

You can invert a relation, e.g. create the relation has_part from part_of by saying that category a has_part category b if category b is part_of category a (note that a and b swapped place).

Relations can also be combined, e.g. you can construct a new relation my_rel from is_aand part_of by saying that two categories are related with my_rel if they are related with at least one of the is_a or part_of relations.

Some relations are transitive, meaning that if a is related to b, and b is related to c, then a is also related to c. The is_a relation is transitive, e.g. we have the relationship GO:0019953 (sexual reproduction) is_a GO:0000003 (reproduction), and as we also have the relationship GO:0000003 (reproduction) is_a GO:0008150 (biological_process), we know that we have the relationship GO:0019953 (sexual reproduction) is_a GO:0008150 (biological_process). However these indirect relationships are not explicitly stated in the Gene Ontology database.

There are other aspects of the Gene Ontology database that we don't need to know about for now. If you are interested, you can look at http://geneontology.org for more information.

Materials

In addition to this document, three other files have been provided:

Tasks

1. Understand the provided code

Take a look at the provided code in GO.py. There are three classes defined. The GO class can be used to make an object representing the information in the go.obo file. To do this, it uses two other classes, GO_category and GO_relation.

Try to understand what the code does. It may help to add some comments or doc strings as you go.

2. Fix a bug

The file test_GO.py defines a number of functions that can be used to test the code in GO.py. You can run the tests using the Python module pytest.

One of the tests fails. If you look at the lines 206-217 in go.obo that define the GO category GO:0000022 (mitotic spindle elongation) you will see that GO:0000022 should be related to two other GO categories, but for some reason no relationships are are stored in our data structure. Fix this bug before proceeding to the next tasks.

3. Implement inverting relations

Add code that computes the inverse of a relation as described in the Introduction section.

4. Implement combining relations

Add code that creates a new relation by combining two others as described in the Introduction section.

5. Implement making a relation transitive

As was described in the Introduction section, the is_a relation is transitive, but all the indirect relationships are not explicitly mentioned in the go.obo file. Implement code that adds these indirect relationships.

Notes

All this code should run with Python 3 on an ordinary PC or server. CPU and memory requirements should be modest.

We are aware that some of the tasks are difficult and time consuming. We will not be surprised if you need to put several hours of work into solving them.

You are welcome to search the Internet for help, but we will expect you to be able to explain the code that you come up with.

Also, we will kindly ask you not to actively make other people aware of this test, as we would like to use it for future interviews.

Please provide us your code to solve the tasks the day before the interview so that we can familiarize ourselves with your solutions.

intomics_interview's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.