Giter VIP home page Giter VIP logo

itallic's Introduction

ITALLIC: A tool for automatically identifying and correcting errors in location based plant breeding data

One of the challenges of integrating plant breeding data to collectively analyze it with other sources of data such as genotype, environment, management, and socioeconomic data is errors in location data. Collectively, this data could be used to inform genetic predictive models for maize, wheat, and other crops. Typical errors in plant breeding location data include flipped latitude and longitude values, missing negative signs, and, in some cases, missing data. This tool, an Integrated Tool for Automatic Lat Long Imputation and Cleaning (ITALLIC), automatically detects and corrects errors in location data and imputes missing values for location-dependent data, such as region name.

This page contains instructions for installing and uaing ITALLIC. These instructions assume familiarity working on a terminal.

Pre-Installation

ITALLIC is a Python 3 application. In addition to Python 3, we highly recommend also installing Conda. Click this link for more information on installing Conda.

Even though you do not need Conda to use ITALLIC, using Conda has some advantages that will make life easier. It will not only make installation for ITALLIC and other Python packages easy, it also enables use of conda environments. Use of environments is a good way to prevent conflicts that might arise when working on different projects that require different versions of the same software package. This blog nicely summarizes some advantages of using environments.

Prepare working environment

Create a conda environment for data cleaning and install ITALLIC in that environment. The command below uses "DataCleaning" as the environment name and Python 3.8 as the Python version to use. You can use a different name for your conda environment but we recommend sticking with Python 3.8. Any Python 3 version should work but since ITALLIC was tested on Python version 3.8, we recommend using Python 3.8.

NOTE: One of the many benefits of using Conda is even if you have a different version of Python installed on your system, it will will install version 3.8 for your "DataCleaning" environment. Just use the conda create command as shown below.

  • Create conda environment.
$ conda create --name DataCleaning python=3.8 -y
  • Activate the environment.
$ conda activate DataCleaning
  • Install Jupter Notebook. ITALLIC has a visualization tool that works well with Jupyer Notebook. Use conda to install Jupter.
$ conda install -c conda-forge jupyter -y
  • Install dependencies needed to use jupyter.
$ conda install -c conda-forge ipykernel -y
  • Create kernel for this environment to use with jupyter notebook. We recommend using the same name for the kernel that was used for the environment.
$ ipython kernel install --user --name=DataCleaning

Installation

Now that you have the environment setup, and installed jupyter, you are ready to install ITALLIC.

  • Install ITALLIC.
$ conda install -c conda-forge itallic -y

If you are not new to using Python for some reason you are have issues installing the package, try updating conda usng the command below.

$ conda update --all --yes
  • You can now deactivate the conda environment and switch to using Jupyter Notebook to get started.
$ conda deactivate

Getting Started

  • Create a working directory
$ mkdir DataCleaningDir
  • Navigate into the directory
$ cd DataCleaningDir
  • Get compressed folder with country boundary data and a sample dataset to use for testing
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/data.tar.gz

If your platform does not have wget, you can install it using conda "conda install -c conda-forge wget"

  • Uncompress data folder
$ tar -xvf data.tar.gz 

You can also download the compressed folder by clicking on this link and then clicking the "Download" button.

  • Download a Getting Started Python Notebook with basic commands on how to get started.
$ wget https://github.com/getiria-onsongo/itallic/raw/main/resources/GettingStarted.ipynb
  • Launch jupypter notebook
$ jupyter notebook
  • Once you launch the notebook, a browser should be launched with contents of your working directory displayed as shown below. Double click on the Getting Started notebook.

  • Ensure you are using the kernel we created "DataCleaning" with itallic and its dependency software installed. The image below illustrates how to change your notebook kernel.

  • Follow the notebook to learn basic commands on how to get started.

itallic's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.