Giter VIP home page Giter VIP logo

gt3x2csv's Introduction

gt3x2csv

Lifecycle: experimental R-CMD-check

The goal of gt3x2csv is to convert .gt3x files into (raw) csv files so that they can be analysed in other packages such as GGIR. The goals of this package are:

  1. To create output that as closely mimics output from ActiLife as possible.
  2. To be orders of magnitude faster than using ActiLife for conversion.

This package is a rewrite of a package previously written by Danilo de Paula Santos (@danilodpsantos).

Why use gt3x2csv instead of ActiLife?

gt3x2csv has a number of advantages over converting files in ActiLife. Firstly, it is substantially faster both on a per-file basis, and overall. This is largely thanks to the great work on the read.gt3x package, which uses C++ to read the activity data quickly. gt3x2csv is further bolsted by being able to process files in parallel, something not available on ActiLife.

As an example of how much faster it is, see the table below.

Method One File Five Files Thirty Files1
Actilife2 58s 4min 55s 29min 20s
gt3x2csv3(Sequential) 8s 40s 4mins
gt3x2csv(Parallel) 8s 14s 1min 42s

1 All files were 159MB, or a little over three days of data.
2 Using ActiLife v6.11.9 (newer versions might be faster.
3 As run on a AMD Ryzen 7 3700X 8-Core CPU; 32GB of RAM.

What does gt3x2csv do?

gt3x2csv uses read.gt3x to unpack the GT3X file, and formats the output in the same way as ActiLife.

Installation

gt3x2csv is not (currently) available on CRAN. You can install from GitHub:

# install.packages("devtools")
devtools::install_github("tarensanders/gt3x2csv")

Example

Using gt3x2csv is simple. You can provide any of a single file to process, a vector of file paths, or a directory. If you do not provide an output directory, the resulting CSV files are stored in the sample place as the originals. Here is an example:

library(gt3x2csv)

# Setting up a test directory - ignore this.
my_directory <- gt3x2csv:::local_dir_with_files()

# An example directory with some GT3X files
list.files(my_directory)
#> [1] "test_file1.gt3x" "test_file2.gt3x" "test_file3.gt3x" "test_file4.gt3x"
#> [5] "test_file5.gt3x"
gt3x_2_csv(
  gt3x_files = my_directory,
  outdir = NULL, # Save to the same place
  progress = FALSE, # Show a progress bar?
  parallel = TRUE # Process files in parallel?
  )

# Directory now has the new files.
list.files(my_directory)
#>  [1] "test_file1.gt3x"   "test_file1RAW.csv" "test_file2.gt3x"  
#>  [4] "test_file2RAW.csv" "test_file3.gt3x"   "test_file3RAW.csv"
#>  [7] "test_file4.gt3x"   "test_file4RAW.csv" "test_file5.gt3x"  
#> [10] "test_file5RAW.csv"

Caveat Emptor

A few warnings for those using gt3x2csv.

File Sizes

A GT3X file is a zip file containing some metadata (info.txt) and a binary file (log.bin) with data recorded by the device. The GT3X file is compressed, making it smaller than the raw versions of these files. Converting to CSV uncompresses the file, and will take up more space. This is true regardless of if you use ActiLife or gt3x2csv. How much extra space seems to depend on the size of the file. Here’s three files, and their compressed/uncompressed sizes.

File GT3X Size CSV Size Times Larger
File 1 200KB 4.28MB ~22
File 2 159MB 524MB ~3.3
File 3 352MB 1.13GB ~3.3

All this is to say that if you can do your analysis without saving CSV files in the middle (e.g., using read.gt3x or AGread), that would be better. But, some processing packages (e.g., GGIR) don’t allow this (mostly due to memory issues).

Memory Use

In the process of unzipping the files, the data are temporarily stored in memory. If you run gt3x2csv in parallel with lots of cores, you might run out of memory. If this happens, just set cores to be a lower value. For example, using all 16 threads on my 8 core CPU was actually slower to process 30 files than setting cores = 8 and using 8 threads, because the 32GB of RAM was being exhausted.

Differences to ActiLife

I’ve validated gt3x2csv against the output from ActiLife using several different files. There are also tests to check that changes to the package do not muck this up. However, this package is provided with no guarantee, and you should test the output yourself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.