intro-stat-learning / islp_labs Goto Github PK

View Code? Open in Web Editor NEW

592.0 592.0 362.0 23.67 MB

Up-to-date version of labs for ISLP

License: BSD 2-Clause "Simplified" License

Jupyter Notebook 100.00% Dockerfile 0.01%

islp_labs's People

Contributors

Stargazers

Watchers

Forkers

mufancolin aaronwchen drschilling mkrij isaacdienstag daringcalf louismia22 eric-bickerton rahulkumar-aws cwru-sdle aitelfilali tschm jonathan-taylor carsip techthiyanes mrazizadeh saref electiondaze guilhermevos dkapitan jmanitz phonchi teddy-sl lgibson7 brentgunderson guillepaez53 jj-zhang-ds nicholaskarlson breinsch trenton3983 ghbeau tolatakeda rmcgill0709 monkey-d-ym aiwwdw ericazhang2046 dulilun kevin-chau morioh umutcihan sweiergr veducha omargrojas rominapa44 luisgerardo15 melodylopez 0224092 lianyancih ravichoudhary33 bscng portoagostina alejandror77 jduran3 clayford oscar-campo montsemalacon valearufe cynthialiningxin heineborell smanisan krandymeacham jiqinghuang chrismanzala rjosh003-cs abarton51 vipul-vaibhav27 oliver4701 brozzis valeriow kevmo pmasuka mhuuu yajuna indera liu-mudong xfast hana-gharrad bolynyk harshahampapura saudaltamimi jmcarrizo jerry2t gishar dr-lhx ge0zhang rakesh-roshan staubind leogr97 t-weilin waraba2 ciaoryo micheleflammini mandyyyyy123 zhenyubai03 digupthehatchet dimongu ztreisman lukedup singp37 xa250

islp_labs's Issues

I think any preprocessing like standardization should be done before the train/test split to avoid data leakage. In Lab 04, in cells 53-57 the whole dataset is first standardized and then split. I suggest the train / test split should be done as a first step. Then a standardization fit_transform() should take place on X_train only and then finally scaler.transform() on X_test. This approach avoids the

ISLP_labs/Ch04-classification-lab.ipynb

Line 2889 in dad0773

" y_test) = train_test_split(np.asarray(feature_std),\n",

(X_train,
 X_test,
 y_train,
 y_test) = train_test_split(np.asarray(feature_std),
                            Purchase,
                            test_size=1000,
                            random_state=0)

BR
Grzegorz

seed_everything() unavailable in requirements.txt pytorch-lightning version

pytorch_lighting.utilities.seed.seed_everything() appears to have been deprecated in the 2.0 release. For use of this function, latest available version with it is 1.9.5.

This function is referenced in the Chapter 10 lab.

This line of code needs to be updated:

ISLP_labs/requirements.txt

Line 13 in b5ecc83

pytorch-lightning==2.0.6

RunTimeErrors

When I try to repeat the lab from Chapter 10 about Neural Networks, Hitters dataset. I run into:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

RuntimeError: DataLoader worker (pid(s) 7156, 796, 15192, 304) exited unexpectedly.

So I assume it has something to do with multiprocessing, data_loader but I'm total newbie here.

Installation under Python 3.12 fails

Hello -

the frozen versions of various packages currently shown in this project's requirements.txt cannot be installed in Python 3.12 environments. To proceed with the installation, versions for several dependencies need to updated.
Updated list:
numpy==1.26.4
scipy==1.11.4
pandas==2.2.2
lxml==5.2.2
scikit-learn
joblib==1.4.2
statsmodels==0.14.2
lifelines==0.28.0
pygam==0.9.1
l0bnb==1.0.0
torch==2.3.0
torchvision==0.18.0
pytorch-lightning==2.2.4
torchinfo==1.8.0
torchmetrics==1.4.0
ISLP==0.3.18

Note: I just finished setting this up, so this still needs further testing.

Contributors

@all-contributors please add

@trevorhastie for authoring and design of labs and code
@tibshirani for authoring and design of labs and code
@jonathan-taylor for authoring and design of labs and code
@danielawitten for authoring and design of labs and code

Typos

In Ch02-statlearn-lab.ipynb, should the circled text say ax.contour? instead? It does not look like plt has been defined at this point in the notebook. Also, all previous occurrences of displaying documentation have the question mark after the function, so maybe add a note that both ?fun and fun? will cause Python to display documentation associated with the function fun.

JupyterBook?

Going forward I would like to make sure that all notebooks are executed once they have been committed. This will add some extra level(s) of robustness. This could be done in the context of a JupyterBook that would iterate over all of them and put them into a book.

Chapter 4 Classification Python Lab corr() numeric_only no longer supported?

When I run the code
Smarket.corr(numeric_only=True)
, I get the following warning:
corr() got an unexpected keyword argument 'numeric_only'

There appears to be a solution:

Smarket = Smarket.select_dtypes(include='number')
Smarket.corr()

This appears to achieve the intended result.

Chapter 5 Conceptual exercise 2h

Hi, I think I found an error in the code snippet in exercise 2 in Chapter 5.

In the book (page 225) it's given as:

rng = np.random.default_rng(10) 
store = np.empty(10000)
for i in range(10000):
    store[i] = np.sum(rng.choice(100, replace=True) == 4) >0
np.mean(store)

During each iteration, only one number is chosen instead of a full sample.

I think it should be corrected to:

rng = np.random.default_rng(10) 
store = np.empty(10000)
for i in range(10000):
    store[i] = np.sum(rng.choice(100, size=100, replace=True) == 4) >0
    # as we would like to sample 100 observations at each attempt and check if 4 is among them
np.mean(store)

It gives 0.6362 which is the probability of the jth observation chosen when n goes infinity.

BR
Grzegorz

intro-stat-learning / islp_labs Goto Github PK

islp_labs's People

Contributors

Stargazers

Watchers

Forkers

islp_labs's Issues

Bike sharing data example

Missing output

Lab Chapter 04 - standarization done before train/test split

seed_everything() unavailable in requirements.txt pytorch-lightning version

RunTimeErrors

Installation under Python 3.12 fails

Contributors

Typos

JupyterBook?

Chapter 4 Classification Python Lab corr() numeric_only no longer supported?

Chapter 5 Conceptual exercise 2h

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent