Giter VIP home page Giter VIP logo

neocache's Introduction

Hi, I'm Alex ๐Ÿ‘‹

I'm a PhD candidate in the University of Wisconsin-Madison statistics program. My github is a mixture of research code, #rstats โœจ contributions, and personal data analysis projects. I write long-form explainers on my blog, https://www.alexpghayes.com/.

Research software

  • fastadi performs self-tuning matrix completion via adaptive thresholding, often outperforming softImpute. See the paper for algorithmic and theoretical details. I have also extended this algorithm to work with matrices where the entire upper triangle is observed as part of some work on citation networks.

  • aPPR helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. aPPR additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels. Read the paper.

  • vsp performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. Read the paper.

  • fastRG samples random-dot product graphs much faster than naive sampling procedures and is especially useful when running simulation studies. See the paper for a description of the fastRG core algorithm.

#rstats

I am involved in a number of open source projects in the tidyverse and tidymodels orbits. I previously maintained the broom package, which currently has ~6 million downloads, and for my contributions am an author on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities as well.

Teaching materials

Other projects

Please get in touch if...

  • you'd like to hire me for a research or data science for social good internship,
  • you want to discuss design of statistical modeling software,
  • you want to collaborate on a research project, or
  • you want to write an explainer together.

Outside of R, I'm a proficient Python user, and can pull together enough SQL, C++, and Julia to get things done.

I am responsive via email.

Last updated 2023-10-20.

neocache's People

Contributors

alexpghayes avatar nathankolbow avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

neocache's Issues

Some code for sanity checking

library(logger)
library(neocache)

log_threshold(TRACE, namespace = "neocache")

nc_empty_cache("test", check_with_me_first = FALSE)

# SomervilleYIMBY_userid <- rtweet::lookup_users("SomervilleYIMBY")$user_id
SomervilleYIMBY_userid <- "854358992876916739"

somerville_friends <- nc_get_friends(SomervilleYIMBY_userid, "test")

dead_id_followed_by_SomervilleYIMBY <- "15089927"

dead_id_followed_by_SomervilleYIMBY %in% somerville_friends$to

# this whole query is "invalid"
e <- nc_lookup_users(dead_id_followed_by_SomervilleYIMBY, "test")

# only part of this query is "invalid"
nc_lookup_users(
  c(
    SomervilleYIMBY_userid,
    dead_id_followed_by_SomervilleYIMBY
  ),
  "test"
)

# but it looks like info gets shared into the dead_id somehow. blergh.
# this happens at the Neo4J level sadly

neocache:::db_lookup_users(dead_id_followed_by_SomervilleYIMBY, get_cache("test"))

Error in `stop_vctrs()`: Can't combine `..1$to_id` <character> and `..3$to_id` <list>.

This is a bug in the new type safety branch.

> set.seed(27)
> log_appender(
+   appender_file(
+     here("logs/appr.log")
+   ),
+   namespace = "aPPR"
+ )
> log_threshold(TRACE, namespace = "aPPR")
> log_appender(
+   appender_file(
+     here("logs/neocache.log")
+   ),
+   namespace = "neocache"
+ )
> log_threshold(TRACE, namespace = "neocache")
> # seeds <- c("omaclaren", "Corey_Yanofsky")
> seeds <- c ("nytimes", "washingtonpost")
> tracker <- appr(
+   neocache_graph(),
+   seed = seeds,
+   epsilon = 1e-7
+ )
Error in `stop_vctrs()`: Can't combine `..1$to_id` <character> and `..3$to_id` <list>.                                                                                               
Run `rlang::last_error()` to see where the error occurred.
Warning message:
Skipping unauthorized account: 828068090 
> rlang::last_error()
<error/vctrs_error_incompatible_type>
Error in `stop_vctrs()`: Can't combine `..1$to_id` <character> and `..3$to_id` <list>.
Backtrace:
  1. aPPR::appr(neocache_graph(), seed = seeds, epsilon = 1e-07)
  4. aPPR:::appr.abstract_graph(neocache_graph(), seed = seeds, epsilon = 1e-07)
  5. tracker$calculate_ppr()
       at aPPR/R/aPPR.R:158:2
 11. neocache::neighborhood.neocache_graph(graph = self$graph, node = u)
       at aPPR/R/abstract-graph.R:73:2
 12. neocache::nc_get_friends(node, cache_name = graph$cache_name)
       at neocache/R/appr.R:152:2
 13. dplyr::bind_rows(...)
       at neocache/R/get_friends.R:80:2
 14. vctrs::vec_rbind(!!!dots, .names_to = .id)
 15. vctrs `<fn>`()
 16. vctrs::vec_default_ptype2(...)
 17. vctrs::stop_incompatible_type(...)
 18. vctrs:::stop_incompatible(...)
 19. vctrs:::stop_vctrs(...)
Run `rlang::last_trace()` to see the full context.

Tail of aPPR log

TRACE [2022-02-16 06:44:21] 
TRACE [2022-02-16 06:44:21] new bad: 1288578537104896000
INFO [2022-02-16 06:44:21] Visits: 1593 total / 1282 unique out of max of Inf / 2462 remaining.
TRACE [2022-02-16 06:44:21] Visting 2270614394
DEBUG [2022-02-16 06:44:25] 1 known good / 0 known bad / 1 new good / 0 new bad
TRACE [2022-02-16 06:44:25] known good: 807095
TRACE [2022-02-16 06:44:25] 
TRACE [2022-02-16 06:44:25] new good: 490204295
TRACE [2022-02-16 06:44:25] 
TRACE [2022-02-16 06:44:25] Adding node(s) to tracker: 490204295
INFO [2022-02-16 06:44:27] Visits: 1594 total / 1283 unique out of max of Inf / 2461 remaining.
TRACE [2022-02-16 06:44:27] Visting 336020675
DEBUG [2022-02-16 06:45:08] 1540 known good / 9 known bad / 1350 new good / 34 new bad
...
INFO [2022-02-16 06:45:12] Visits: 1595 total / 1284 unique out of max of Inf / 2460 remaining.
TRACE [2022-02-16 06:45:12] Visting 1223353613923184640

neocache log

DEBUG [2022-02-16 06:45:11] Done getting node degrees
DEBUG [2022-02-16 06:45:12] Getting neighborhood: 1223353613923184640
TRACE [2022-02-16 06:45:12] nc_get_friends(): 1223353613923184640
TRACE [2022-02-16 06:45:12] Retrieving aPPR cache metadata file ... 
TRACE [2022-02-16 06:45:12] Retrieving aPPR cache metadata file ... done.
TRACE [2022-02-16 06:45:12] Activating aPPR cache ...
TRACE [2022-02-16 06:45:12] Activating aPPR cache ... already active
TRACE [2022-02-16 06:45:12] Getting friend sampling status for 1 users ...
TRACE [2022-02-16 06:45:12] friend_sampling_status(): 1223353613923184640
DEBUG [2022-02-16 06:45:12] Looking up information on 1 users in Neo4J DB ...
TRACE [2022-02-16 06:45:12] Looking up information on 1 users in Neo4J DB ... user(s): 1223353613923184640
DEBUG [2022-02-16 06:45:12] Looking up information on 1 users in Neo4J DB ... 1 found in Neo4J database.
TRACE [2022-02-16 06:45:12] data is 1 x 19 with type signature
TRACE [2022-02-16 06:45:12] friends_count = integer(0); profile_image_url_https = character(0); listed_count = integer(0); default_profile_image = logical(0); favourites_count = integer(0); verified = logical(0); created_at = character(0); description = character(0); url = character(0); profile_banner_url = character(0); protected = logical(0); screen_name = character(0); statuses_count = integer(0); sampled_at = character(0); default_profile = logical(0); followers_count = integer(0); id_str = character(0); name = character(0); location = character(0)
TRACE [2022-02-16 06:45:12] typecast is 1 x 19 with type signature
TRACE [2022-02-16 06:45:12] friends_count = integer(0); profile_image_url_https = character(0); listed_count = integer(0); default_profile_image = logical(0); favourites_count = integer(0); verified = logical(0); created_at = numeric(0); description = character(0); url = character(0); profile_banner_url = character(0); protected = logical(0); screen_name = character(0); statuses_count = integer(0); sampled_at = numeric(0); default_profile = logical(0); followers_count = integer(0); id_str = character(0); name = character(0); location = character(0)
TRACE [2022-02-16 06:45:13] Getting friend sampling status for 1 users ... 0 in graph with friends already sampled / 1 in graph with friends not already sampled / 0 not in graph
TRACE [2022-02-16 06:45:13] 
TRACE [2022-02-16 06:45:13] add_friend_edges_to_nodes_in_graph(): 1223353613923184640
INFO [2022-02-16 06:45:13] Making API request with rtweet::get_friends for 1 users
TRACE [2022-02-16 06:45:13] Making API request with rtweet::get_friends for: 1223353613923184640 ... results received.
TRACE [2022-02-16 06:45:13] Parsing results from API ... 
TRACE [2022-02-16 06:45:13] Parsing results from API ... columns renamed.
TRACE [2022-02-16 06:45:13] Parsing results from API ... done.
TRACE [2022-02-16 06:45:13] Parsed friend list of 1 users returned from API:
TRACE [2022-02-16 06:45:13] 
TRACE [2022-02-16 06:45:13] ------------- -----------
TRACE [2022-02-16 06:45:13]  **from_id**   **to_id** 
TRACE [2022-02-16 06:45:13] ------------- -----------
TRACE [2022-02-16 06:45:13] 
TRACE [2022-02-16 06:45:13] Adding up to 1 new users to Neo4J DB ...
TRACE [2022-02-16 06:45:13] Adding up to 1 new users to Neo4J DB ... done
TRACE [2022-02-16 06:45:13] Adding 0 edges from API result into Neo4J graph ...
TRACE [2022-02-16 06:45:13] Adding 0 edges from API result into Neo4J graph ... done
TRACE [2022-02-16 06:45:13] Setting sampled_friends_at for 1 users ...
TRACE [2022-02-16 06:45:13] Setting sampled_friends_at for 1 users ... done
TRACE [2022-02-16 06:45:13] Getting cached friends of {length(status$sampled_friends_at_not_null)} users ...
TRACE [2022-02-16 06:45:13] 
TRACE [2022-02-16 06:45:14] Getting cached friends of {length(status$sampled_friends_at_not_null)} users ... done
TRACE [2022-02-16 06:45:14] new_edges is 0 x 2 with type signature
TRACE [2022-02-16 06:45:14] from_id = character(0); to_id = character(0)
TRACE [2022-02-16 06:45:14] upgraded_edges is 0 x 2 with type signature
TRACE [2022-02-16 06:45:14] from_id = character(0); to_id = list()
TRACE [2022-02-16 06:45:14] existing_edges is 0 x 2 with type signature
TRACE [2022-02-16 06:45:14] from_id = character(0); to_id = character(0)

Release neocache 0.1.0

First release:

Prepare for release:

  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • Review pkgdown reference index for, e.g., missing topics
  • Draft blog post

Submit to CRAN:

  • usethis::use_version('minor')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted ๐ŸŽ‰
  • usethis::use_github_release()
  • usethis::use_dev_version()
  • usethis::use_news_md()
  • Finish blog post
  • Tweet
  • Add link to blog post in pkgdown news menu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.