Giter VIP home page Giter VIP logo

python-for-data-and-media-communication-gitbook's People

Contributors

chicoxyc avatar connorli96 avatar gitbook-bot avatar hupili avatar ivywze avatar mindyzhaominzhu avatar roytangrb avatar sunfeier avatar terezacai avatar zizhehu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-for-data-and-media-communication-gitbook's Issues

question about array

image
In the image, why is the highlight part '1,2,4' as a result of print? How come the result?
And I'm not quite understand 'Similarly, the first number is to index elements in this array, the second number is to index the sub-elements in each elements', could anyone help me out?

Questions and Feedback for Chapter 04

Feedback

  • Just partial feedback. I suppose we can add notes in the part “csv_reader” in week 04, because fresh learners may don’t understand the difference between virtual environment and their computer devices. If they just download the ‘name_list.csv’ , it cannot be found in terminal or jupyter notebook, then there will be FileNotFoundError. Thus, we can note that: “Before you operate this code, please place the “name_list.csv” under file “venv”, because jupyter notebook just operate in virtual environment so that the source also just from file “venv.”

An alternative way to calculate the break-even point of subscribed users to make profit in Example21 in Week3

The Example 21 in Week3 tutorial is helpful for us to learn how the For loop works. But I also find that the same result can be achieved with the While loop as well. Below are the codes that I write to share with you guys for reference and suggestions. You can also check the results in nbviewer here.

Fixed_Cost = 30000
Content_Cost = 70000
member_ff = 15
convert_rate = 0.1
ad_revenue_each_person = 1
subscribers=0
net_income=0.1*subscribers*member_ff+ad_revenue_each_person*subscribers-Fixed_Cost-Content_Cost
if net_income==0:
	print(subscribers)
else:
	while subscribers>=0:
		subscribers=subscribers+1
		if subscribers<50000:
			net_income=0.1*subscribers*member_ff+ad_revenue_each_person*subscribers-Fixed_Cost-Content_Cost
		else:
			net_income=0.1*subscribers*member_ff+ad_revenue_each_person*subscribers-Fixed_Cost-Content_Cost-0.1*(subscribers-50000)
		if net_income==0:
			break
print(subscribers)

issues about installing jupyter in CVA517

Here are the instruction of how to install jupyter generally, we typing following command to create virtual environment then install all dependencies and modules. But in lab, there are some problem:

pyvenv venv
source venv/bin/activate
pip3 install jupyter

screen shot 2018-10-01 at 6 28 09 pm

You will get the error of upgrade your pip version during install jupyter, just copy the command.

pip install --upgrade pip

Then, install jupyter, open the jupyter notebook

pip3 install jupyter
jupyter notebook

screen shot 2018-10-01 at 6 29 54 pm

then you will encounter another problem, whenever you execute what, it shows its in running and there is a sign saying that kernel starting, please wait... The reason can be found in the terminal
that when you install jupyter, there is an red line saying:

ipython 7.0.1 has requirement prompt-toolkit<2.1.0,>=2.0.0, but you'll have prompt-toolkit 1.0.15 which is incompatible.

screen shot 2018-10-01 at 6 29 06 pm

In oder to solve this, there is two solutions which can be found in jupyter_kernel issue#158

solution 1: pip install 'ipykernel<5.0.0'

solution 2: first downgraded ipython pip install -U ipython==6.5.0, then prompt-tookit pip install -U prompt-toolkit==1.0.15

Finally, open jupyter notebook, type something to test, it should work now!
screen shot 2018-10-01 at 6 33 56 pm

Minor Problems in Week5

  1. It is better explain to students Jupyter notebook would not present results on the screen except using "print()" command or type the variable in the new line. May causing confusion if we dont mention it.

  2. my_urls = data.find_all('a',attrs={'class':'post__title-link js-read-more'})
    I am wondering why the value here is 'post__title-link js-read-more'? I only inspect this:
    Why not is this one: class="post__title-link"

  3. One thing I can not clearly understand is the format part:
    my_url['href'].split('/blog')[-1]
    Can you try to explain a little bit how can this return the format we want?

for i in range(1,8): #format all pages urls
if i == 1:
page_url = url
else:
page_url = '{url_initial}page/{number}/'.format(url_initial = url,number=i)
#print(page_url)
Here using the if logic, if page ==1, then the page_url is url. If else, still need to be further formatted. My question is why we can not apply the same function to other pages? Aren't they sharing the same format?

Thanks for your hard work! Yes, this week is more difficult than before, but I think it is clear to follow. : )

A question about week-06

Week-06

image
In this part of week-06, it shows "by_link_text". But only when I change it to "by_partial_link_text", can it works.
image
I am confused about when should we use "partial".

CH05 Feedbacks

  • Operating system:windows
  • Python version:3
  • Hardware:
  • Internet access:Y
  • Jupyter notebook or not? Y
  • Which chapter of book?:05

Feedback1:
I got an Error when trying to install BeautifulSoup using the first codes mentioned as follows

1536734298 1

The output is as follows:

1536734257 1

Have you Google/ Stackover flow anything? Yes

Do they solve or partially solve your question? Yes, but in a different way

Scraper yields different results upon execution

Troubleshooting

Describe your environment

  • Operating system:
  • Python version:
  • Hardware:
  • Internet access:
  • Jupyter notebook or not? [Y/N]: Y
  • Which chapter of book?: ch5

Describe your question

Following are some permalinks for codes and data.

https://github.com/zhangjingwei0512/chapter5/blob/8981ed72e972d0a458adbbc6898f5def55cef9ce/further2.py

https://github.com/zhangjingwei0512/chapter5/blob/8981ed72e972d0a458adbbc6898f5def55cef9ce/further2.csv

https://github.com/zhangjingwei0512/chapter5/blob/8981ed72e972d0a458adbbc6898f5def55cef9ce/further2(2).csv

The minimum code (snippet) to reproduce the issue

a little question

in the Example 13:

test3 = 'python loves,'you''
test3.find('you')
8 #returns the first character where 'you' begins

Is that should be 8? I think it may be 14 instead?

Chapter 7 Working thread - pandas basics

  • @hupili outline.
  • @ChicoXYC prepare an updated openrice dataset of similar format. High Priority. Please store the scraping script and dataset in the scraper-example folders. Create new files and don't override previous openrice example.
  • @ChicoXYC include the URL and POI ID of each restaurant into the 20K dataset.
  • @ChicoXYC , address the TODO notes for pandas basics. Note that there is a bit duplicate between ch7 and ch8 (1D part). This is normal because most of the time you observe errors during analysis. So analysis skills can also help data cleaning.
  • @hupili, @ChicoXYC , inject some errors into the openrice dataset for data cleaning exercise.
  • @ChicoXYC First review and smoothen the content up to dataprep section

I have carefully went through the Chapter 2, the content is easy to understand, and I think the exercises are really helpful for new comers to better master basic knowledge about Python. Thank you for your hard work! :)

Troubleshooting

Describe your environment

  • Operating system:
  • Python version:
  • Hardware:
  • Internet access:
  • Jupyter notebook or not? [Y/N]:
  • Which chapter of book?:

Describe your question

Example: I get IOError when running my script to load files.

The minimum code (snippet) to reproduce the issue

Example:

open('path-to-a-file-not-exist')

Describe the efforts you have spent on this issue

Example:

Have you Google/ Stackover flow anything?

Do they solve or partially solve your question?

What is the closest answer you can find?

list index out of range

Troubleshooting

I wrote this programme to allocate cases for each five students. But it seems something goes wrong with the index on the line " print(list2[s])" and "print(list1[c:c+5])". How can I give value 0 to s and c, also change the values each time.

student_list =[
18421111,
18421112,
18421113,
18421114,
18421115,
18421116,
18421117,
18421118,
18421119,
18421120,
18421121,
18421122,
18421123,
18421124,
18421125,
18421126,
18421127,
18421128,
18421129,
18421130,
18421131,
18421132,
18421133,
18421134,
18421135,
18421136,
18421137,
18421138,
18421139,
18421140,
18421141,
18421142,
18421143,
18421144,
18421145,
18421146,
18421147,
18421148,
18421149,
18421150,
18421151,
18421152,
18421153,
18421154,
18421155,
18421156,
18421157,
18421158,
18421159,
18421160,
]
case_list =[
'case1 - build a calculator to evaluate your business model',
'case2 - build a automatic earthquake robot to broadcast the new earthquake',
'case3 - evaluate social media performance of a luxury brand',
'case4 - study movie blockbuster 'Dying to Survive'',
'case5 - invest your money like the Internet giant, Tencent',
'case6 - where are the 200,000 inferior vaccines flowing?',
'case7 - study classics, Who control the discourse power in 'Dream of the Red Chamber'',
'case8 - research about Didi-driver crimes in China',
'case9 - 'Me too' analysis',
'case10 - what is hip-hop in china?'
]

import random
random.shuffle(student_list)
list1=student_list
print(list1)
random.shuffle(case_list)
list2=case_list
print(list2)

s=0
c=0
for s in student_list:

print(list2[s])
s=s+1
for c in case_list:

print(list1[c:c+5])
c=c+5

Python linting and debugging

Python is a dynamic language which makes error checking at compile time hard. Most error is exposed at run time. However, some tools can still help us to catch most of the errors and try to write best practice code. "Linting" is a general concept found in all programming languages that refers to the process to identify potential errors and suboptimal practices at the writing time.

The step to get "twitter API" (week04)

  • Which chapter of book:week04

Preparation:Twitter Account (highly recommend to use Gmail Account to Sign up!)

Step 1

Go to https://apps.twitter.com/app/new and click 'Apply for a developer account'.

Step 2

Choose “Personal use” & Enter your desired Application Name, Primary country (hk) and so on.
The most important is : Describe in your own words what you are buiding

image

you have to answer each question as detailed as possible. the following is an example:

1. I’m using Twitter’s APIs to practice my data collection and analysis skills in the Big Data Analysis course. I am currently a postgraduate student in Hong Kong Baptist University, majoring in Communication, and normally collect users comment data for mass communication study research.
2. As for the methods and techniques I plan to conduct, here is our course GitHub open book. You can take it as reference. https://github.com/hupili/python-for-data-and-media-communication-gitbook/blob/master/notes-week-04.md#use-api-via-function-calls-to-other-modules-packages
3. I only use it to do data collection practice, will not use case to tweeting, retweeting or liking content.
4. Tweets will be displayed on our final project presentation for academic use.
5. Finally, I will comply with the Policies of twitter.
I am looking forward to your favorable reply.

image

and then, submit your application! ->verify your email

and then, your application may under review or! jump to a new page :
image

Install Python3 on Windows and Set Environment

  1. Click here to download Python 3.7(64-bits).
    If you need to install other versions of python, click here and go to the hyperlink provided.
  2. Remember to choose 'Customize installation' and click 'pip' when you are installing python. If you install the default version, you cannot import 'numpy' module in week2.
    image
    image

Connor's Feedback&Question on Chapter 2

Feedback

This Chapter is really wonderful. Although I've already got a bit of fundamental knowledge about python from other channel, I also receive new inspirations and flexible programming usage because it give more practical examples and transmit an important concept-use it in daily life.

Additionally, to be a fresh learner, I also have some suggestions and questions to this Chapter.

Suggestions & Questions

  • Questions
    In the Basic functions: Arrays, I try my best to understand the definition of "shape" but I'm still confused for it, especially in the under example.Hoping to receive a more detailed explanation of the operating rules. Thx!
>>> b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
>>> print(b.shape)
(2, 3)
>>> print(b[0, 0], b[0, 1], b[1, 0])
1 2 4
  • Suggestions
    The under example regards the concept of index . However, before this example, there is any introduction about index, which may cause some difficulties to 0 background learners.
>>> import numpy as np
>>> a = np.array([1, 2, 3])   # Create a rank 1 array
>>> print(a[0], a[1], a[2]) # index elements

Thus, I think there are two effective ways.

  1. Add the link to the 6th example of Chapter 3, which explain it exactly.
  2. Give simple explanation about the index.

Feedback & Questions in Week5

I have to admit that this chapter is really more difficult, but also useful, which help me to build logic thinking for scraper and how to achieve it in practical application step by step. The feedback and questions are listed below according to chapter order.

1. In Get data:

(1) You seem to have lost the underline inmy_title = myh1.text . Without underline '_', this code will face NameError.

(2) Still in this code, I cannot understand what Type(myh1) means ? Maybe we can change the myh1 to h2 such as my_h1 = data.find('h2') and get output '話癆特朗普', which might help others to understand the target that using tag and attributes to extract the data we want directly.

2. In Get author try 2

(1) How do we determine the tag_name? Just like the 'tr', I'm wondering the regulations because you use 'a' as tag_name in the latter function scrape_articles_urls_of_one_page

(2)This code seem likes a dictattrs={'class':"post__authors"}, why use this format, could you explain it more detailed, or is it just syntax rules?

Thanks for all your work and help, it's meaningful !

Week 3 Feedback

  1. The outputs in Example4 should contain no parentheses and no quotation marks.
    image
    Below is the right version:

    x=4
    y=6
    print('x!=y:',x!=y)
    x!=y: True

  2. Remember to add parentheses, and list2[1:5]means slicing list2 from index1 value to index5 value but does not include index5 value. Also, list2[:2] means slcing list2 from index0 value to index2 value but does not include index2 value. We had better explain the rule of slicing lists more clearly. It confused me when I saw the outputs in the first place.
    image

  3. Delete the command of the third line in example8.
    image

  4. In Example10, there is no key named 'Frank' or its corresponding value.
    image
    If we try to access the value of a key that does not exist in the list, an error will be reported as follows:
    image

  5. In Example22, the last line(i=i+1)should only have one indentation, otherwise the output will be 1 endlessly and never breaks:
    image
    image
    Below is the right version:
    image

  6. In the first line, the input('please input a int:') will produce a string, and the type of a string is never equal to the type of 1. Hence, ValueError will be always raised if you put either 2 or 2.2. The 2 and 2.2 you input are strings. Therefore, you need to add a int() function before the input(), because int('2.2') leads to a ValueError while int('2') can produce an int.
    image
    Also, in the last line, remember to add parenthesis: print (inputValue)
    The right version should be:
    image

A question about the codes to extract article url in week5 - [improve: list slicing]

For the example in week5 which extracts all article urls in http://initiumlab.com, all the tags including the article urls have been collected like below
image

Next, we need to get the strings after 'href' (e.g.'href="../blog/20170113-Sharing-With-Friends-Versus-Strangers/"'), and the codes are as below:

for my_url in my_urls:
    url ='{0}{1}'.format('http://initiumlab.com',my_url['href'][2:]) #format urls
        #print(url)
    article_urls.append(url)
article_urls

However, I do not quite understand what my_url['href'][2:] means here. 'my_url['href']' seems to be a function of finding the value through the corresponding key in a dictionary, but 'my_url' is a list element. [2:] seems to be a function of extracting part of words in string. I feel a little bit confused.

How to create a file in "example" repository

Troubleshooting

Describe your environment

  • Operating system:mac os
  • Python version:
  • Hardware:
  • Internet access:
  • Jupyter notebook or not? [Y/N]:
  • Which chapter of book?:week00

Describe your question

week00- gh desktop
How to create a file in "example" repository after create a new repo.
don't understand the "drag"part. Thank you!

2018-09-06 17 07 04

feedback for Charpter3

It seems that this week has large load of content! Just some humble suggestions:

  1. in the Str Comparison part we give an example: Name1 == Name3, then return the bool value is True. Is it necessary to explain the "equal" true meaning? Since if we try "Name1 is Name3", the result would be False. So the '==' just means the two items have equal content here, doesn't mean they are the same thing.

  2. in the List[] part:
    A little bit confused about "remove()" and "pop()"...What are the differences? Could you explain in additional lines?

  3. in the Dict{} part:

  • I noticed we talk about index for several times, like the list.insert(i,x), pop(i)...Maybe it's better to explain more on the index, better to accompany with demonstrating graph and let us see how are the 'identity numbers' match different items, like {1, 2, 3, 4}
    [0][1][2][3]
  • A typing error: unorder --> disorder
  • Sorry for my weak understanding... It seems that the str() format has no difference with the original one (Example10) So why we use str()? It’s better to illustrate~
  1. A tiny question:

seq = ['Chico', 'Ivy', 'Ri']
dict = dict.fromkeys(seq) #fromkeys()
print("New_dict : %s" % str(dict))
New_dict : {'Chico': None, 'Ivy': None, 'Ri': None}

#Why the seq be converted to a list? Is it achieved by fromkeys()?

  1. for i in range(1,11):

print(i2)
#Should we add a footnote that here i
2 can also write as (i * i) ? Maybe that will be more clear for students?

if number_of_users <= 50000:
... cost = 10000

Can we make a warm remind that the indent helps to define which statement is under the "if" control? It's so easy to make mistakes for new learners.

The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default).(1,10) means values from 1 to 11 (but not including 6)
#here why not including 6?? Could you explain a little bit?

i = 1
while i < 9:
print(i)
if i == 5:
break
>>> i = i + 1 # why this condition is under the 'break'? As a new learner, I will think all flows go from top to down and when encounter the 'break', the flow will stop there, so may need some explanation.

return 'My name is {self.name}, and I'm {self.age} years old'.format**(self=self)**

  • Can we show another format for readers to better understand?
    return 'My name is {0}, and I'm {1} years old'.format(self.name, self.age)'
    caz I don't understand here: format**(self=self)** ...
  1. It is better to tell readers how to call class function. For example, we should call from the class name.

@hupili Can you add some content on instance variables and class variables? I think the "class" part has not been fully developed so it may be confusing.
Also, in the final example, better to explain the relationship between class method (Account) and instance method (deposit and withdraw), and how to call these functions.

Thanks for your hard work!

Hatty's Feedback on Ch3

Hatty's feedback on Chapter 3

Before advice:

This chapter really surprises me because it is so fruitful, with full of knowledge points about statements, expressions, operators, functions, modules, methods, etc. and even though before reading Ch3 I learned parts of the same fundamental knowledge with a series of simple practices, it's still a tough work to finish all practice and check throughout this chapter.

Here are some bullet points I think might be useful for you:

  • In some parts, one or two sentences to describe what one knowledge point the chapter introduces can serve for real programming in practice. In other words, one or two sentences to provide its meanings.

  • Since this chapter is so substantial, fresh learner may feel their brains about to explode if they learn it at a time, so I think it can be better if we separate this chapter into three or four small sections by using, for example, emphasis fonts.

  • In the part of "List methods" and Example 7:

    • Description: more details and no “s” behind every verb
      Examples: separate examples for diverse methods with spaces.
    • Provide some practices behind every part of methods learning.
  • More description of While loop, and probably provide a comparison between If loop and While loop so as to allow students to learn about their functions better.

  • The exhibition of examples needs some hints. For examples 15 & 16, students may feel confused what the target is to write these while loop codes.

  • Repare two small bugs:

    • In example 13 question, the formula is "cost=1000+0.1×(number_of_users -50000)" .while in the answer : cost = 10000 + 0.1 * (number_of_users - 50000) .
    • In example 14,

The actual number of users we have now is 120,000.

should be blended into the question part with grey color font, which will be easier for learners to read.

`str` and `int` data type issue in json file

Troubleshooting

Describe your environment

  • Operating system:os
  • Python version:3
  • Hardware:
  • Internet access:
  • Jupyter notebook or not? [Y/N]:
  • Which chapter of book?:4

there is nothing print out @hupili

image
image

The minimum code (snippet) to reproduce the issue

import json

filename = 'population.json'
with open(filename) as f:
	pop_data = json.load(f)

	for pop_dict in pop_data:
		if pop_dict['Year'] == '2011':
			country_name = pop_dict['Country Name']
			population = pop_dict['Value']
			print(country_name + ": " + population)
[
	{
	"Country Code": "TZA", 
	"Country Name": "Tanzania", 
	"Value": 47570902.0, 
	"Year": 2011
	},
	{
	"Country Code": "TZA", 
	"Country Name": "Tanzania", 
	"Value": 49082997.0, 
	"Year": 2012
	}
]

Describe the efforts you have spent on this issue

Example:

Have you Google/ Stackover flow anything?

Do they solve or partially solve your question?

What is the closest answer you can find?

Instructions of Installing Jupyter Notebook on Windows

How to install virtual environment and Jupyter Notebook on Windows

Trouble description
The Jupiter tutorial(module-jupyter.md) lacks instructions for Windows 10 users like me, and yesterday it took me almost two hours to figure out how to install virtual environment and Jupyter Notebook on my laptop, as the codes to input are different than those in Linux. Hence, I have written down the instructions to install virtual environment and Jupyter. I think it will be helpful for Windows users if the following content can be added into the tutorial.

Instructions
You need to create virtual environment as well to use Jupyter Notebook if your operating system is Windows. However, the codes you need to input are a little bit different than those in Linux.

  1. Create a folder named 'venv'. You can place it wherever you like, be it disk C, D or E. In this case, I put it in disk D.

  2. Press Windows key+R, which shows you the 'RUN' box. Input cmd and click 'OK'.

image
image

  1. Input D: to change to disk D.
    image

  2. Input cd venv to go to the 'venv' folder.
    image

  3. Enter python -m venv test, so you create a virtual environment called 'test', and you can see the newly created 'test' folder.
    image
    image

  4. Enter cd test to go to the 'test' folder.
    image

  5. Input cd Scripts to go to the 'Scripts' folder.
    image

  6. Input activate.bat. Now you can see (test) appear in front of the command line prompt, it means you have entered the virtual environment!
    image

  7. Enter pip install jupyter to install jupyter.
    image

  8. Input jupyter notebook to launch jupyter notebook.
    image

  9. Press 'Ctrl+C' two times to quit the Jupyter notebook, and input deactivate to exit virtual environment.
    image

  10. So next time you need to enter the environment and launch jupyter notebook, here are all of the codes you need to input in cmd.exe in order.
    image

a little question

In Example 6, there are lines following:

test = [0]
print(test,'is',bool(test))
[0] is False

However, in my and my Python's opinion, the result should be true. Am I right?

week-00: gh-pages steps

@ChicoXYC here are two revisions.

Feedback for Chapter06

Before Feedback

Obviously, with further learning, the difficulty of our open book also increases gradually, especially in this Chapter, since there are many knowledge that we have not been touched before, such as Xpath, Selenium, CSS, etc. I really recommend other learners to obtain simple definition about this terms, which can help us to study effectively. Besides, sincerely thanks a lot for our TA @ChicoXYC for his explanation. After our discussion, my problems are solved completely, so I show my explanation and feedback below and hope this work can help others to learn it better.

  • 1: In Navigating

Code: element = browser.find_element_by_name("q")
Q(1): Why the element is("q") instead of others?

A(1): When we find element, there is a strategy that find an unique element to locate what we want. After my test, the name("q") can be changed to element = browser.find_element_by_id('lst-ib') , which also can work successfully.

Code:browser.execute_script("window.scrollTo(0,1200);")
Q(2): How to find accurate number directly? such as 1200.

A(2): Actually, the number cannot be found directly, we have to test many times to check the right number so that locate the page we need. You can try (0,300) or (0,600), it's funny.

  • 2: In CNN articles scraping

Code:browser.find_elements_by_xpath("//div[@id='summaryList_mixed']//div[@class='summaryBlock']")
Q: Why these two tags'summaryList_mixed'&'summaryBlock' can be ensured? What is the regular pattern?

A: Firstly, we can find the 'headline', 'date' and 'url' are what we need and all of them are hidden in 'summaryBlock'. However, due to there are 10 this kind of elements, we need to find upper level 'summaryList_mixed' to locate accurately. Then, we can use for loop to scrape all data we need.

  • 3: In scrape all pages

Code:browser.execute_script('window.scrollTo(0, document.body.scrollHeight/1.5);')
Q: Why here is different in compared to the above code browser.execute_script("window.scrollTo(0,1200);").

A: In this code, we cannot set a fixed number to locate page, as there are 10 pages(or more pages in future ) we need scrape. Each page has a different length, cause the length of the abstract and title of each page is different. Thus, (0, document.body.scrollHeight/1.5);') means scroll the page from bottom to top 1.5, this way can help us click the 'next' button in every page.

  • 4: A little advice

Maybe you can write the method of how to find path in Mac before Navigating part, which might help others work efficiently rather than wasting time to learn how to find path.

That's all my feedback, welcome to discuss together or point out my problems. I'm afraid of having wrong understanding to mislead other learners.

Chapter 1 working thread

https://github.com/hupili/python-for-data-and-media-communication-gitbook/blob/master/notes-week-01.md

Tick when resolved:


Use the following thread for discussion.

Collect challenging websites/ data sources to crawl

I realised one difficulty many groups encountered last time is unable to crawl some websites/ data sources they intended to. While it is impossible to enumerate potential cases and barriers, I decide to make more examples. @ChicoXYC please collect the crawling ideas from our past students that:

  • They want to crawl initially
  • They gave up in the end, due to unsolvable technical barriers

I will evaluate those ideas and make sample codes for those general issues.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.