Light

m-clark / book-of-models Goto Github PK

View Code? Open in Web Editor NEW

28.0 4.0 5.0 73.91 MB

Spells for everyday living. (also a book coming out in 2024)

Home Page: https://m-clark.github.io/book-of-models/

License: Other

TeX 16.36% R 3.05% SCSS 0.44% Jupyter Notebook 80.08% Python 0.07%

linear-models machine-learning python r-language

book-of-models's Introduction

Michael's GitHub Space

I'm Michael, and I do all manner of things within the realm of data science. Here you'll find source code for modeling, packages, general programming, and various other things.

book-of-models's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo ntluong95 lokesh67 kylehamilton jackmanners

book-of-models's Issues

Put data imports in setup

For our own convenience, move data imports to top setup chunk so that we don't have to hunt.

chapter goals

Maybe there is enough with the intro/key-ideas, but should we add a 'goals' section at the beginning? For example, for lm chap

Chapter goals:

Understand what a model is conceptually
Understand what a linear model is and how features are mapped to the target
Be able to get predictions from our model
Be able to understand the results of a model at a basic level
Get a sense of complexity and other issues

Bigger changes

All chapters should have key ideas, why this matters, good to know, where to go, wrapping up sections (and exercises if appropriate); reference per chapter to be determined (#17)
All chapters/sections (at least level 2) need to be linkable and linked throughout the text
Add content to preface/acknowledgments/misc models
#19
Spelling/typo checks

Data

Data needs to be linked. Possibly early on like even in preface.
There should be only one version of a single data source being used (e.g. not one with and without some features)

Potential Content Trimming

Merge overlapping content
- LM and model exploration chapter
- Data, and ML other models chap, and misc models chap
ML concepts gets pretty deep, could trim a lot there
Could remove what amounts to a demo chapter in the intro
Could remove some appendix content (matrix, nix simulation, bayes)
Drop some deeper code examples (e.g. estimation beyond the estimation chapter may need to be nixed for length. Just comment out.
LM Part needs consistency update

Code

code needs to be cleaned/made consistent between SB and MC
- R:
- No <- assignment for R code
- 4 space indent?
- Py:
- uglify to Pep standards
- Other?
- General:
- No more than 80 char for pdf
- !! Make sure the printed code runs as is !!

Tables & Figures

Make sure all figures and tables are linkable/cross-referenceable
Make sure no labels have underscores (latex can't handle it)
Make sure no gt tables use color (latex can't handle it)

Other that needs to be done

Add simulation demo/discussion if space permits
Chapter title/section casing (use css if possible)
Dataset descriptions
bold should be key words, italics just for emphasis
Save out figures as separate files if not already. Preferably svg.
Move programming discussion in ML to a general one in Part 3 or appendix. Trim discussion in intro?
quarto-dev/quarto-cli#7856
Fix missing crossref

Lesser issues

Want to remove chapter specific folders. keep everything in just data, img.
All the graphical models need separate dot files and looks the same
All chunks should be named

Where can we best advise on things not to do?

It would be useful in particular chapters or as part of the 'misc/more models chapter' to have some kindly worded "don't do this" or or otherwise problematic stuff. For example, stepwise regression, just going with p-value result/ignoring prediction, mean imputation, ignoring uncertainty, attributing causal effects where not warranted, ignoring baselines, using old models that are no longer necessary, etc.

Conditional formatting

https://quarto.org/docs/authoring/conditional.html

Wish we'd known about that a long time ago. Example that comes to mind is the 3d plot for maxlike surface, but technically we could do every plot as color vs. black white.

Reorg

The following only covers reorganization, not outright trimming

PART I

Move all the shap and related discussion to model exploration
Move model list in LM chapter to last chapter, but leave a bit of a preview to 'other models' (this acutally wasn't that much content and was exclusively 'linear models', so left). Reminder to add others added to misc_models.
Should assumptions stay in LM or be moved to knowing? (leaving for now)
Move interactions to 'extensions' chapter, which might be renamed something to reflect nonlinear nature of the models covered there
move metrics table and other discussion from ML to model explore
in the knowing chap, talk about feature importance for other models with brief demo in model exploration (e.g. can demo RF/boost), but leave bulk of discussion to ML coverage
Update the knowing chapter to handle all the new content and rename from model_criticism.qmd

PART II

PART III

Add tips throughout chapters

For this to be more handbooky, suggest using at least a few callout tip blocks per chapter.

Add an exercise where applicable

Any main part with a model demonstration (or possibly something related) should have a single exercise very simply described. For example:

Use x model with the census data. Include one visualization that helps in interpretation. Compare its performance to a different model of your choosing.

I don't want to do more than guide practice, at least for now.

Chapter 0 complete before turn in

Work has been done and here is what's left to do before turn in.

Add references for every chapter/section

This form is just something like the following:

## title {#sec-title}
### cool section {#sec-cool}

https://quarto.org/docs/books/book-crossrefs.html#creating-references

Gather references

I've been putting 'refs' as I think of them in specific qmds, but ideally we'd use something like zotero to auto gen a bib

https://quarto.org/docs/visual-editor/technical.html#citations-from-zotero

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.