Giter VIP home page Giter VIP logo

uftm-biostat-database's Introduction

UFTM BioStat DataBase

About

This is an automatic generator of ficticious databases originally meant to be used as support tools in Biostatistic classes.

This script is meant to generate big, flexible databases that can be used to practice the concepts learned on Biostatistic (and Statistic) classes. It comprises a plethora of variables of all sorts, with underlying relations between them, designed to allow exploration. All values are originally randomized, according to rules described on !!!ADD DOCUMENTATION!!!, many of them inspired by real data publicly available. Some easter eggs may be present, as the data will be forced to match some of my frends', in case their name is found in the table.

Background

This project was inspired by my experiences as professor assistant for the Med School Biostatistic class at Universidade Federal do Triângulo Mineiro, on the 2022/1 semester. It was proposed to create a ficticious scientific paper, step-by-step, inspired by real paper chosen by each student. However, many students had trouble using the paper's pre-existing variables and creating their own in a way that would make the best use of their syllabus, as well as the manual process of populating their databases and the distribution of their data. Helping one student to automate the creation of her data, we started applying real or reality-inspired conditions to the random generators. I later decided to expand such idea to a big database that would be flexible enough to discard the need for a paper and still allow for variety among the entire class.

Table of contents

  • [Status] (## Status)

Status

Under construction

Installation

Usage

This application takes two basic variables: n and seed. The n variable describes how many observations will be on the database, and seed is used to guarantee reproductibility. Additional chances for each variable may be tweeked as desired.

The end database was originally intended to be used in as a population from which samples can be taken, as samples that can be created for each student, or as a mix of both. I suggest the use of a large n to create a population database, from which each student will devise a particular annalisys plan, followed by a sampling process to symbolize data available from literature or a small study in order to calculate sample size for the proper analysis, and then another sampling for the actual analysis project. The application is design in such a way that students may proceed with analysis of the data as-is, or may taper their population's attributes as desired.

Requirements

Author


Gmail Badge Gmail Badge

Credits

License

uftm-biostat-database's People

Contributors

paulk2jonas avatar

Watchers

 avatar

uftm-biostat-database's Issues

Bug: Absurd weight values

male_weight <- c("mean" = 74.6, "sd" = 24.5)
female_weight <- c("mean" = 65.1, "sd" = 23.1)

These standard deviations are likely too big, and allow for absurd values (not only single digit weight for an adult, but even negative weights).

I should review these data values and add a minimal weight for each age/height. Should also add extreme values for every generator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.