Light

fix simulated data for walkthrough 4 about data-science-in-education HOT 8 CLOSED

data-edu commented on September 25, 2024

fix simulated data for walkthrough 4

from data-science-in-education.

Comments (8)

restrellado commented on September 25, 2024 1

Here's what I was thinking about. I added you in as a collaborator so we can mess around with it more. I haven't had a chance to update my repo and run all your code yet so I don't know if this can get bolted on neatly to your simulated dataset, but the rough idea is there. We can make adjustments to the parameters so it's more plausible then transform it into the shape you want it. Once it's appended to your dataset, then you'll have this whole subset in there of higher and related yvar1 and yvar2 values. Am I on the right track?

from data-science-in-education.

jrosen48 commented on September 25, 2024

Just as a bit of context, this walkthrough involves an expansion of a blog post I wrote; there, I used a very small data set (n = 6!). I hoped this example would involve a larger data set, but simulating the data led to this addressable, but challenging, issue.

from data-science-in-education.

restrellado commented on September 25, 2024

I wonder if one solution is to generate a separate table of higher value yvar1 and higher value yvar2. yvar2 could be generated from yvar1 * a coefficient + an error term (assuming you want a linear relationship). Once you have that then you can append this new dataset to the dataset you were using in the walkthrough. I can work up a reprex but just wanted to throw the idea out there in case you get to it before me.

from data-science-in-education.

jrosen48 commented on September 25, 2024

That would be super helpful.

from data-science-in-education.

a-rosenberg commented on September 25, 2024

Here's my probably too complex example, similar to @restrellado 's solution. This was just thinking of a way to add some non-normal noise to make it realistic and be able to change the proportion of edges that express the relationship.

Sorry re: the Python but it's so easy to mock up!

from data-science-in-education.

jrosen48 commented on September 25, 2024

Thanks all. The key that I am still challenged to address is that it is not yvar1 and yvar2 need to be correlated (i.e., within a person), but rather that nominators with higher values of yvar2 need to have have relations with nominees with higher levels of yvar1. Each nominator can report relations with, say, between 1 to 10 nominees (this is up to us).

I.e. here's an edgelist of relations (which could be "weighted" but here are not - they are just 1 for every relation:

library(simstudy)
#> Loading required package: data.table

set.seed("20190101")

def <- defData(varname = "nominator", dist = "categorical", formula = catProbs(n = 200)) 
def <- defData(def, varname = "nominee", dist = "categorical", formula = catProbs(n = 200))
def <- defData(def, varname = "relate", dist = "nonrandom", formula = 1)

data1 <- genData(500, def)
data1
#>       id nominator nominee relate
#>   1:   1       147     100      1
#>   2:   2        31      86      1
#>   3:   3        93     178      1
#>   4:   4       105      10      1
#>   5:   5       102      88      1
#>  ---                             
#> 496: 496        83     199      1
#> 497: 497        61     148      1
#> 498: 498       149      36      1
#> 499: 499       156     163      1
#> 500: 500       118     161      1

I.e., in the first row, (hypothetical) nominator with ID 147 reported a relation with nominee with ID 100. Imagine that nominator with ID 147 also reported a relation with nominee with ID 101, and that nominator with ID 147 has a high value for yvar2. In this case, nominees100 and 101 would both have higher values of yvar1.

from data-science-in-education.

ivelasq commented on September 25, 2024

Hi Josh, just fyi - when I run line 25 in Walkthrough 4, I get the error no applicable method for 'select_' applied to an object of class "function"

from data-science-in-education.

restrellado commented on September 25, 2024

@jrosen48 I think SNA is working now so I'm going to close this. Feel free to reopen if I got that wrong. Thanks y'all!

from data-science-in-education.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.