Giter VIP home page Giter VIP logo

Comments (4)

adekunleba avatar adekunleba commented on May 5, 2024 1

Used Both Method, all worked well!

Thanks

from handson-ml.

ageron avatar ageron commented on May 5, 2024

Hi @tonytunde2012, thanks for your kind words, I'm glad you enjoy the book.

I actually added a few NaN values in the housing dataset used in chapter 2, because I wanted to highlight the fact that datasets are not always clean. If you use that dataset instead of the one that scikit-learn downloads, you are indeed going to have some trouble.

You have two options:

  • either manage to get the fetch function to work (can you show the error you are getting?)
  • or get rid of the NaN values (e.g. by replacing the NaN values with the mean of the corresponding feature).

Hope this helps,
Aurélien

from handson-ml.

adekunleba avatar adekunleba commented on May 5, 2024

Thanks Ageron,
This is the error from the fetch function
`downloading Cal. housing from http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz to C:\Users
RemoteDisconnected Traceback (most recent call last)
in ()
1 from sklearn.datasets import fetch_california_housing
2
----> 3 housing = fetch_california_housing()
4 m, n = housing.data.shape
5 housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

C:\Users\ADEKUNLE\Anaconda3_2\lib\site-packages\sklearn\datasets\california_housing.py in fetch_california_housing(data_home, download_if_missing)
91 if not exists(filepath):
92 print('downloading Cal. housing from %s to %s' % (DATA_URL, data_home))
---> 93 archive_fileobj = BytesIO(urlopen(DATA_URL).read())
94 fileobj = tarfile.open(
95 mode="r:gz",

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
161 else:
162 opener = _opener
--> 163 return opener.open(url, data, timeout)
164
165 def install_opener(opener):

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in open(self, fullurl, data, timeout)
464 req = meth(req)
465
--> 466 response = self._open(req, data)
467
468 # post-process response

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in _open(self, req, data)
482 protocol = req.type
483 result = self._call_chain(self.handle_open, protocol, protocol +
--> 484 '_open', req)
485 if result:
486 return result

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
442 for handler in handlers:
443 func = getattr(handler, meth_name)
--> 444 result = func(*args)
445 if result is not None:
446 return result

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in http_open(self, req)
1280
1281 def http_open(self, req):
-> 1282 return self.do_open(http.client.HTTPConnection, req)
1283
1284 http_request = AbstractHTTPHandler.do_request_

C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1255 except OSError as err: # timeout error
1256 raise URLError(err)
-> 1257 r = h.getresponse()
1258 except:
1259 h.close()

C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in getresponse(self)
1195 try:
1196 try:
-> 1197 response.begin()
1198 except ConnectionError:
1199 self.close()

C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in begin(self)
295 # read until we get a non-100 response
296 while True:
--> 297 version, status, reason = self._read_status()
298 if status != CONTINUE:
299 break

C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in _read_status(self)
264 # Presumably, the server closed the connection before
265 # sending a valid response.
--> 266 raise RemoteDisconnected("Remote end closed connection without"
267 " response")
268 try:

RemoteDisconnected: Remote end closed connection without response`

from handson-ml.

ageron avatar ageron commented on May 5, 2024

Really weird... perhaps it's a transient error on the server side, or perhaps a version mismatch in your python libraries, or perhaps it's a networking issue (like a misconfigured firewall).

Here's a workaround for you:

first, dowload the following file and save it somewhere on your disk:
http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz

Then use the following function to load and preprocess the data much like sklearn.datasets.fetch_california_housing():

import tarfile

def fetch_california_housing(tgz_filepath):
    with open(tgz_filepath, "rb") as f:
        fileobj = tarfile.open(mode="r:gz", fileobj=f).extractfile('CaliforniaHousing/cal_housing.data')
        cal_housing = np.loadtxt(fileobj, delimiter=',')
    columns_index = [8, 7, 2, 3, 4, 5, 6, 1, 0]
    cal_housing = cal_housing[:, columns_index]
    y, X = cal_housing[:, 0], cal_housing[:, 1:]
    # avg rooms = total rooms / households
    X[:, 2] /= X[:, 5]
    # avg bed rooms = total bed rooms / households
    X[:, 3] /= X[:, 5]
    # avg occupancy = population / households
    X[:, 5] = X[:, 4] / X[:, 5]
    # Target in units of 100,000
    y = y / 100000.0
    return X, y

To use it, just call this function like so (change the path to the cal_housing.tgz file if necessary):

X, y = fetch_california_housing("cal_housing.tgz")

from handson-ml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.