Comments (4)
Used Both Method, all worked well!
Thanks
from handson-ml.
Hi @tonytunde2012, thanks for your kind words, I'm glad you enjoy the book.
I actually added a few NaN values in the housing dataset used in chapter 2, because I wanted to highlight the fact that datasets are not always clean. If you use that dataset instead of the one that scikit-learn downloads, you are indeed going to have some trouble.
You have two options:
- either manage to get the fetch function to work (can you show the error you are getting?)
- or get rid of the NaN values (e.g. by replacing the NaN values with the mean of the corresponding feature).
Hope this helps,
Aurélien
from handson-ml.
Thanks Ageron,
This is the error from the fetch function
`downloading Cal. housing from http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz to C:\Users
RemoteDisconnected Traceback (most recent call last)
in ()
1 from sklearn.datasets import fetch_california_housing
2
----> 3 housing = fetch_california_housing()
4 m, n = housing.data.shape
5 housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]
C:\Users\ADEKUNLE\Anaconda3_2\lib\site-packages\sklearn\datasets\california_housing.py in fetch_california_housing(data_home, download_if_missing)
91 if not exists(filepath):
92 print('downloading Cal. housing from %s to %s' % (DATA_URL, data_home))
---> 93 archive_fileobj = BytesIO(urlopen(DATA_URL).read())
94 fileobj = tarfile.open(
95 mode="r:gz",
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
161 else:
162 opener = _opener
--> 163 return opener.open(url, data, timeout)
164
165 def install_opener(opener):
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in open(self, fullurl, data, timeout)
464 req = meth(req)
465
--> 466 response = self._open(req, data)
467
468 # post-process response
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in _open(self, req, data)
482 protocol = req.type
483 result = self._call_chain(self.handle_open, protocol, protocol +
--> 484 '_open', req)
485 if result:
486 return result
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
442 for handler in handlers:
443 func = getattr(handler, meth_name)
--> 444 result = func(*args)
445 if result is not None:
446 return result
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in http_open(self, req)
1280
1281 def http_open(self, req):
-> 1282 return self.do_open(http.client.HTTPConnection, req)
1283
1284 http_request = AbstractHTTPHandler.do_request_
C:\Users\ADEKUNLE\Anaconda3_2\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1255 except OSError as err: # timeout error
1256 raise URLError(err)
-> 1257 r = h.getresponse()
1258 except:
1259 h.close()
C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in getresponse(self)
1195 try:
1196 try:
-> 1197 response.begin()
1198 except ConnectionError:
1199 self.close()
C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in begin(self)
295 # read until we get a non-100 response
296 while True:
--> 297 version, status, reason = self._read_status()
298 if status != CONTINUE:
299 break
C:\Users\ADEKUNLE\Anaconda3_2\lib\http\client.py in _read_status(self)
264 # Presumably, the server closed the connection before
265 # sending a valid response.
--> 266 raise RemoteDisconnected("Remote end closed connection without"
267 " response")
268 try:
RemoteDisconnected: Remote end closed connection without response`
from handson-ml.
Really weird... perhaps it's a transient error on the server side, or perhaps a version mismatch in your python libraries, or perhaps it's a networking issue (like a misconfigured firewall).
Here's a workaround for you:
first, dowload the following file and save it somewhere on your disk:
http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz
Then use the following function to load and preprocess the data much like sklearn.datasets.fetch_california_housing()
:
import tarfile
def fetch_california_housing(tgz_filepath):
with open(tgz_filepath, "rb") as f:
fileobj = tarfile.open(mode="r:gz", fileobj=f).extractfile('CaliforniaHousing/cal_housing.data')
cal_housing = np.loadtxt(fileobj, delimiter=',')
columns_index = [8, 7, 2, 3, 4, 5, 6, 1, 0]
cal_housing = cal_housing[:, columns_index]
y, X = cal_housing[:, 0], cal_housing[:, 1:]
# avg rooms = total rooms / households
X[:, 2] /= X[:, 5]
# avg bed rooms = total bed rooms / households
X[:, 3] /= X[:, 5]
# avg occupancy = population / households
X[:, 5] = X[:, 4] / X[:, 5]
# Target in units of 100,000
y = y / 100000.0
return X, y
To use it, just call this function like so (change the path to the cal_housing.tgz
file if necessary):
X, y = fetch_california_housing("cal_housing.tgz")
from handson-ml.
Related Issues (20)
- mnist dataset HOT 2
- Chapter#02 FileNotFoundError HOT 1
- Chapter 2 error during prediction HOT 2
- Ml
- Dropout at test time HOT 3
- How can I use my own dataset and fit it to your code
- Need help understanding crc hash used to explain test train split in Chapter 2 HOT 1
- ImportError: cannot import name 'fetch_mldata' from 'sklearn.datasets' (F:\Anaconda3\lib\site-packages\sklearn\datasets\__init__.py) HOT 1
- Chapter 3 : Exercise 1 - MNIST Classifier with 97% accuracy - Could not pickle the task to send it to the workers. HOT 3
- Broken image in readme HOT 1
- Chapter 5 SVM why should center before LinearSVC
- Chapter 3 (Page 82): Getting error during Fitting the SGD Classifier with Training data
- Chapter 2: Value differences in prediction
- Chapter 2: Looking for Correlations - ValueError: could not convert string to float: 'INLAND' HOT 1
- Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.
- chapter 4: SGDRegressor(tol=-np.infty) is not accepted by the module HOT 1
- Hi
- Ch.2 Error using corr() HOT 1
- Problem downloading data HOT 1
- Why does saving the test set not work?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from handson-ml.