Giter VIP home page Giter VIP logo

sds's People

Contributors

abjer avatar krier avatar kristianuruplarsen avatar snorreralund avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sds's Issues

What is the size of int and float?

I understand that default for an int is int64, but you can specify int8 which is less precise, but would take less memory and storage space I would imagine.

I was unable to find a list over the exact storage values for the different types, does someone have an url or a list?

ratelimit and interations

We are having a problem with the following code.
First we specified our url, and now we are trying to put iterations=10 and a ratelimit. Does anyone has a clue of, why it is not working?

import time

def ratelimit():
time.sleep(1) # sleep one second.

def get(url, iterations=10,check_function=lambda x: x.ok):

for iteration in range(iterations):
    try:
        ratelimit(10) 
        response = session.get(url)
        if check_function(response):
            return response
    except exceptions as e: 
        print(e) 
return None 

Multiprocessing Pool dont work on Windows 10

I can use Pool to paralellize my work on my macbook, but when I try on my more powerful Windows to run the code faster, it does not work. None of the cores start doing any work and for a sample of a 100 I let it run for 10 minutes, but nothing happens.

So I use this code:

def tree_paralel(x):
    tree = DecisionTreeClassifier(criterion="gini", max_depth= x, random_state=1)  
    accuracy_ = []
    for train_idx, val_idx in kfolds.split(X_dev, y_dev):

        X_train, y_train, = X_dev.iloc[train_idx], y_dev.iloc[train_idx]
        X_val, y_val = X_dev.iloc[val_idx], y_dev.iloc[val_idx] 
        
        X_train = pd.DataFrame(im.fit_transform(X_train),index = X_train.index)
        X_val = pd.DataFrame(im.transform(X_val), index = X_val.index)
        tree.fit(X_train, y_train)
        y_pred = tree.predict(X_val)
        accuracy_.append(accuracy_score(y_val, y_pred))
    print("This was the "+str(x)+" iteration", (dt.now() - start).total_seconds())
    return accuracy_

and then run:

start = dt.now()
p = Pool(4)

input_ = range(1,11)
output_ = []
accuracy = []
for result in p.imap(tree_paralel, input_):
    output_.append(result)
p.close()
temp = pd.DataFrame(output_).mean(axis = 1)
temp.index = input_
optimal_t = temp.nlargest(1)
print("Time:", (dt.now() - start).total_seconds())
print("Optimal hyperparameter: "+ str(optimal_t.index[0]) + " with accuracy: " + str(optimal_t.values) )

opgave 9.2.3

Kan i uddybe opgave 9.2.3? Vi har lidt svært ved at fortolke opgaven?

Vores fortolkning er: Vi har nogen beløb (fra opgave 9.2.2), hvor vi så nu skal fiske sætningerne ud, hvor disse beløb indgår i, for at forstå sammenhængen. Er dette korrekt forstået?

Visualizing data in other programs

Are we allowed to visualize our data in other programs/codes than iPython and add them to the final exam paper?

Since time is limitted, using the knowledge we have of other programs for visualization, it would be very beneficial to use other programs, and then describe, document and add the plot to the paper.
We know that learning python/seaborn/matplot "the hard way" would be more beneficial for us in the long run, but for now we only have until saturday morning...

Index problem.

ex. 13.1.3

KeyError: ['.., ,...'] not in index.

Someone who has solved the index problem in this exercise?

Some practical questions for the exam

Should we write names, student ID's, or exam numbers on the project?

The project has a maximum of 24 pages (normalsider). How do you measure a normalside with graphs for instance?

And how is the project graded? Should we write who did which parts of the project for an individual grade, or are we graded as a group?

What does PolynomialFeatures do?

I have some difficulties to completely understand what this function does.

$y = ax_1 + bx_2 + ...$

If i run this through PolynomialFeatures without sepcifying degrees, it will by default choose 2 degrees. Do we get out this:

$y = ax_1^2 + bx_2^2 ...$

or:

$y = ax_1^2 + bx_2x_1 + ... + ax_1x_2 + bx_2^2 + ...$

?

How do I change my working directory?

I find it irritating that all the files i create through Jupyter Notebook is being saved in the Home folder on my computer by default. I would like to arrange a neat file structure for this course like any other course.

  1. How do I change this?

  2. Is there anything I need to think of regarding my future work with git?

I have tried Stack Overflow but find it hard to jump around in the terminal using 'cd'.

loading a shapefile error

I am trying to load in a shapefile. And yes I have installed Geopandas and imported geopandas and shapely. But I am getting the error shown under the code. Any idea why?

set the filepath and load in a shapefile

fp = “Desktop/DEU_adm1.shp”
map_df = gpd.read_file(fp)

check data type so we can see that this is not a normal dataframe, but a GEOdataframe

map_df.head()

File "", line 2
fp = “Desktop/DEU_adm1.shp”
^
SyntaxError: invalid character in identifier

Vague questions

Sometimes it would be nice with an extract or an example of the solution you want us to find, like: you should end up with strings that look like this: "string_example", or even a picture of a sample of the expected dataframe. For example this assignment:

Ex. 8.2.1: Visit the https://www.trustpilot.com/ website and locate the categories page. From this page you find links to company listings. Get the category page using the requests module and extract each link to a specific category page from the HTML. This can be done using the basic python .split() string method. Make sure only links within the /categories/ section are kept, checking each string using the if 'pattern' in string condition.

I then find the url https://www.trustpilot.com/categories/companies, which have links to company listings from the category page. Even after reading the question several times it was difficult to understand how I was going to proceed from here. After looking at the solution guide and reading the question again, I see that I misunderstood the crucial line extract each link to a specific category page from the HTML, and still I find it difficult to completely understand the assignment. After asking around I was not the only one with this issue, and this is not the first time it has been quite difficult understand exactly what an assignment asks from us, which is a source to a lot of frustration and an immense time sink.

In this exercise which is also quite difficult to get your head around, there seems that is should be an example, but is is sadly missing, even if I read the file directly from the repository:

Ex. 7.2.13: Turn the dataset from wide to long so hourly data is now vertically stacked. Store this dataset in a dataframe called data. Name the column with hourly information hour_period. Your resulting dataframe should look something like this.

Geopandas: Map of Denmark

We are trying to make a map of Denmark with Geopandas. We have found a shapefile with the boundaries of DK, that we hope we can use. But we can't read the shapefile into Python. Our code is:
import geopandas as gpd
import pandas as pd

fp = '/Users/louiseankersen/anaconda3/lib/python3.6/site-packages/geopandas/datasets/dk_100km.shp '
map_df = gpd.read_file(fp)

map_df.head()

But Python won't read the path for the file. Does anyone have any suggestions to what we can do to read the shapefile into Python?

Specifying hyperparameters

In class/at lecture we used np.logspace(-4,4,50) (or 33 as last input) to specify the range of hyperparameters to search.

This is quite broad, and would then give us a hyperparameter that minimizes the MSE. In class ABN mentioned that we could then create a new range using np.logspace around the hyperparameter that gets returned from the first minimization problem.

My question is as follows:

Are we expected to go through this process and thereby narrow down the search for the best hyperparameter of them all?

If we were to do it, would we just set the original range to the new-found range (and comment that we found the new range via. iteration), or would we be expected copy all of the code and insert the new range in that code, so the code that gave the new range still would be in the notebook?

Thanks in advance.

Working with Jobindex in the exam

We are working with Jobindex.dk for our project and are trying to make a successful scraping. If any other groups are trying to scrape from Jobindex, we could meet and exchange knowledge about this.

assignment submission

Hi everyone,

I am logged on peergrade but I am not able to see where to submit assignment 1.
I am not sure if one of my group members has uploaded it yet. Is there a way to check if the assignment has already been submitted?

Thanks,
Paula

screen shot 2018-08-19 at 4 58 14 pm

Anyone using geopandas?

Is anyone using Geopandas and want to share information on how to install it properly and import it?

I am trying to install geopandas with:
pip install geopandas
it installs correctly, but when importing it using
import geopandas as gpd

I get this error:

ImportError Traceback (most recent call last)
in ()
----> 1 import geopandas as gpd

~\Anaconda3\lib\site-packages\geopandas_init_.py in ()
2 from geopandas.geodataframe import GeoDataFrame
3
----> 4 from geopandas.io.file import read_file
5 from geopandas.io.sql import read_postgis
6 from geopandas.tools import sjoin

~\Anaconda3\lib\site-packages\geopandas\io\file.py in ()
1 import os
2
----> 3 import fiona
4 import numpy as np
5 import six

~\Anaconda3\lib\site-packages\fiona_init_.py in ()
67 from six import string_types
68
---> 69 from fiona.collection import Collection, BytesCollection, vsi_path
70 from fiona._drivers import driver_count, GDALEnv
71 from fiona.drvsupport import supported_drivers

~\Anaconda3\lib\site-packages\fiona\collection.py in ()
7
8 from fiona import compat
----> 9 from fiona.ogrext import Iterator, ItemsIterator, KeysIterator
10 from fiona.ogrext import Session, WritingSession
11 from fiona.ogrext import (

ImportError: DLL load failed: Det angivne modul blev ikke fundet.

answers exc sesh 18

can the answers to exc session 18 be uploaded before the weekend?

Also: i can't even get the example running in jupyter notebook on windows 10. Nothing happens, except for that the kernel gets stuck.

Ex 12.2.3 - What on x-axis?

Ex.12.2.3: Make a plot with on the x-axis and the RMSE measures on the y-axis.

Missing a word about what to plot on the x-axis. Lambda?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.