Giter VIP home page Giter VIP logo

vcf-converter's Introduction

VCF to CSV Converter

This repository contains a bash script vcf_to_csv.sh which is designed to convert VCF (Variant Call Format) files into CSV (Comma-Separated Values) format with robust handling for encoded fields and proper encapsulation of output fields in quotes.

Overview

The vcf_to_csv.sh script simplifies the conversion of genomic data from VCF files into a more universally accessible CSV format. It encapsulates each value within quotes to ensure compatibility with CSV parsers and provides functionality to decode specific encoded fields like comments and effects.

Key Features:

  • Converts VCF files to CSV format.
  • Handles encoded fields by decoding them from Base64.
  • Encapsulates all output fields in quotes to ensure CSV format integrity.
  • Allows specifying delimiters for the output file (comma or tab).

Usage

To use the script, you must provide the input VCF file and specify the output file name and delimiter. The script usage is as follows:

./vcf_converter.sh -i <input_file> -o <output_file> -d <delimiter>

Parameters:

  • -i <input_file>: The path to the input VCF file.
  • -o <output_file>: The path to the output CSV file (default: output.csv).
  • -d <delimiter>: Delimiter for the CSV output (',' for CSV, '\t' for TSV; default: ',').
  • -h: Display help and exit.

Example:

./vcf_to_csv.sh -i sample.vcf -o sample.csv -d ","

Requirements

  • GNU Awk (Gawk): Used for processing the VCF files.
  • Bash: For scripting and control flow.

Installation

No specific installation steps are required if you have Bash and Gawk already installed on your system. You can simply clone this repository and run the script.

git clone https://github.com/your-username/vcf-to-csv-converter.git
cd vcf-to-csv-converter
chmod +x vcf_to_csv.sh
./vcf_to_csv.sh -i your_input.vcf -o your_output.csv

Configuration

The script includes adjustable settings for the fields that need to be decoded from Base64. Modify the ENCODED_FIELDS variable within the script to match the fields in your VCF that are encoded and need decoding:

ENCODED_FIELDS="JsiDbComment JsiDbEffect JsiDbSig"

Output

The script generates a CSV file where each line corresponds to a VCF entry, and fields are separated by the specified delimiter. The first line contains headers derived from the VCF file's metadata.

Troubleshooting

  • Ensure that the input VCF file is properly formatted and contains the standard VCF headers.
  • If output fields are not decoded properly, check the ENCODED_FIELDS setting and ensure that the field names match those in your VCF file.

Contributing

Contributions to improve the script or address bugs are welcome. Please submit a pull request or an issue through GitHub.

vcf-converter's People

Contributors

berntpopp avatar

Watchers

 avatar

vcf-converter's Issues

Feature request: Make script work in stream

Currently, the script runs in memory and only outputs data after all operations have been completed. This is slow and needs a lot of memory.
Working in-stream could improve script performance. Ideally, the script should accept input from stdin and output to stdout.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.