DARLInG

Domain Auto-labeling through Reinforcement Learning for the Inference of Gestures

This is the code repository for Yvan Satyawan's Master Thesis.

Preparing the dataset

We must first prepare the dataset by generating indexes of the data and calculating its mean and standard deviation for standardization later on.

Generate the small dataset, if desired.
1. Generate an index for the small dataset using src/data_utils/generate_dataset_index.py.
  - To generate the single-user leave out, run with generate_dataset_index.py [PATH_TO_DATA_ROOT] -u.
  - Use -s instead of -u to make a single-domain index.
  - Add -n 3 to only use 3 repetitions instead of the full dataset. Useful for debugging as it means a smaller dataset.
2. Generate the smaller datasets using src/data_utils/generate_smaller_splits.py.
  - To generate the single-user split, run with generate_smaller_splits.py [PATH_TO_DATA_ROOT] single_user.
  - It is possible to replace single_user with whatever index file suffix was generated by generate_dataset_index.py
Otherwise, generate the dataset index using src/data_utils/generate_dataset_index.py. This will generate the index of the full dataset.
Calculate the mean and standard deviation of amplitude and phase using src/data_utils/calculate_mean_std.py
- If not using the full dataset, copy both the indexes and generated mean_std.csv into the smaller split directory
Pregenerate transformations, as they take too long to generate during training, using src/data_utils/pregenerate_transform.py
- Run using the config file for a given experiment with pregenerate_transform.py [PATH_TO_CONFIG_FILE].

Config File

Config files can be generated to ensure that runs are consistent. Our used experimental run configuration files are stored in the run_configs directory as YAML files. utils/config_parser.py contain the full list of possible configuration parameters.

Training the model

DARLInG is trained by experiments/train_runner.py. It is run by using train_runner.py [PATH_TO_CONFIG_FILE].

Hyperparameter sweeps with Weights and Biases can be done by pointing weights and biases to utils/sweep_runner.py as the main file. This runner accepts command line arguments instead of a YAML file to initialize training. Internally, it's transforming the command line arguments into a dictionary and running the dictionary through the standard config parser.

All the other files

All the other files and scripts are carefully documented with Python Docstrings to indicate what they do and how they work, but are not strictly necessary to run training usually. train_runner.py is for general model training, train.py hosts the main training loop code, and final_experiment_runner.py contains code to repeatedly call train_runner.py on multiple configurations without causing any memory issues. All other scripts in experiments/ also contain the motivation of the script, the question the script attempts to answer, and the answer.

Widar3.0

These are mostly notes to myself to understand what the data looks like.

The files are stored in a lot of files, split into multiple folders. Folder naming scheme doesn't have much meaning, other than to split the dataset into when the data was captured.

Gestures

Not all gestures are performed by all users. As such, we will only use gestures 1-6 in this work.

User	Gesture ->
	1	2	3	4	5	6	7	8	9	10	Total
1	130	130	130	130	130	130	65	65	65	40	1015
2	200	175	175	175	150	125	25	25	25	25	1100
3	150	150	150	125	125	125					825
4	25	25	25	25	25	25					150
5	50	50	50	50	50	50	25	25	25		375
6	50	50	50	50	50	50					300
7	25	25	25	25	25	25					150
8	25	25	25	25	25	25					150
9	25	25	25	25	25	25					150
10	25	25	25	25	25	25	25	25	25		225
11	25	25	25	25	25	25	25	25	25		225
12	25	25	25	25	25	25	25	25	25		225
13	25	25	25	25	25	25	25	25	25		225
14	25	25	25	25	25	25	25	25	25		225
15	25	25	25	25	25	25	25	25	25		225
16	25	25	25	25	25	25	25	25	25		225
17	25	25	25	25	25	25	25	25	25		225
Total	880	855	855	830	805	780	315	315	315	65	6015

CSI file

The CSI files are .dat files, which are simply CSI dumps from the tool used by the team to gather CSI data. The file naming convention is as follows:

id-a-b-c-d-Rx.dat

`id`	`a`	`b`	`c`	`d`	`Rx`
User ID	Gesture Class	Torso Location	Face Orientation	Repetition Number	Wi-Fi Receiver ID

Each recorded CSI sequence can be understood as an tensor with the shape (i, j, k, 1).

i is the packet number
j is the subcarrier number
k is the receiver antenna number

In the case of Widar3.0, the value of k is always 3 (3 antennas per receiver). Widar3.0 uses 1 transmitter and 6 receivers placed around the sensing area.

We use the package csiread to read the file. The .dat files can be read using csiread.Intel.

BVP file

The BVP files are .mat files (MATLAB) that have been preprocessed by the authors. The file naming convention is as follows:

id-a-b-c-d-suffix.dat

`id`	`a`	`b`	`c`	`d`
User ID	Gesture Class	Torso Location	Face Orientation	Repetition Number

suffix has no explanation. As far as I understand, I think it's just the configuration used to produce the BVP. There is no receiver ID since all 6 receivers were combined to produce the BVP.

Each file is a 20x20xT tensor.

Dimension	Meaning
0	Velocity along x-axis from [-2, +2] m/s
1	Velocity along y-axis from [-2, +2] m/s
2	Timestamp with 10 Hz sampling rate

We use Scipy to read the .mat files with scipy.io.loadmat().

BVP lengths are not consistent. We pad them to the all have the same length of 28.

Known Issues

The directory 20181130-VS contains no 6-link subdirectory.

In vs Out of Domain

As we are testing in vs out of domain performance, we will use the following data split for train, validation, and test.

Set	Room IDs	User IDs	Torso Location
Training	1, 2	1, 2, 4, 5	1-5
Validation	1	10-17	1-5
Test Room	3	3, 7, 8, 9	1-5
Test Torso Location	1	1	6-8

We split it this way to make sure that the test set is truly unseen while the validation set is an unseen room-user combination instead of truly unseen.

We only use gestures 1-6, since these are the gestures which have samples from all participants.

For single domain, we use only User 2 in Room 1 with torso location 1 and face orientation 1. This was chosen as it has the largest number of samples. Test, validation, and training splits are randomly generated.

yvan674 / darling Goto Github PK

darling's Introduction

DARLInG

Preparing the dataset

Config File

Training the model

All the other files

Widar3.0

Gestures

CSI file

BVP file

Known Issues

In vs Out of Domain

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent