Light

jonthegeek / wapir Goto Github PK

Web APIs with R

wapir's Introduction

Web APIs with R

This repository contains the source of an in-progress book, Web APIs with R. Please see the deployed book for the most up-to-date information about my writing process. I am writing this book in conjunction with the packages {beekeeper}, {nectar}, {rapid}, and anything else I realize would be helpful along the way.

I will eventually welcome suggestions, but right now I'm still working out the idea.

Code of Conduct

Please note that the wapir project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

wapir's People

Contributors

Stargazers

Watchers

Forkers

fmkerckhof jimrothstein matheus-rech

wapir's Issues

Explore: Post vs rvest sessions

https://www.etiennebacher.com/posts/2023-05-09-making-post-requests-with-r/

Can what he does there be done with rvest sessions? Which "feels" easier? Maybe do (a similar thing) both ways in the book to show how things really work under the hood?

Task: Find an API for ____

I need to find something that isn't in apis.guru (or whatever other directory I end up with), but that DOES have an API findable by searching their site and/or adding "developer" to the beginning of the URL.

I want to include this task, but defining it will be rough, especially if I create my own apis.guru-like site that's expandable (I'll add any example I actually find to that directory).

Maybe leave this fairly open-ended/unanswered, and just give tips? Use apis.guru to back-calculate what would have worked to find each API?

Tasks for "Find R packages that wrap APIs?"

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

Name remaining rvest sections & write Draft tickets

For these LOs:

Scrape data from websites that require you to log in.
Scrape content that requires interaction.
Automate web scraping processes.
Scrape data as part of a workflow.

Dig into rvest chapter to find other tasks

I think step 1 of my process on a chapter should be to start writing and see what happens. And/or work on the thing that it'll be about to find pain points. Probably that.

Tasks for "How can I do other things with APIs?"

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

Methods
Body
curl_translate()

Task: Fetch the full definitions of every API from apis.guru

Ones with : in the name have an additional element in their path. Walk through that discovery.

Tasks for "How can I fetch a lot of data from the web?" (pagination)

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

pagination
throttling
what else?

Tasks for "How can I fetch data from the web?" (no auth, GET)

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

Seek feedback on {rvest} LOs

Post on R4DS and/or social media about the draft LOs from #16. See if there are other things people would like to learn in this area.

This ticket is just for doing the posting. I'm making a separate ticket for processing responses.

R4DS
Mastodon
LinkedIn?
Other?

Task: How many candidates ran for President of the United States ("P") as independents ("IND") in 2000, and raised funds?

NEW RVEST TASKS

Write rvest tasks.

Draft: How can I scrape more complex data from a web page?

R4DS Book Club Repo

Create an R4DS repo for WAPIR.

Xpath Tutorial

Go through the W3schools Xpath Tutorial to make sure you grok it. Note things to highlight in the book as comments here. Don't close this until those comments are sorted out into tickets or whatever makes sense.

Tasks for "How can I process data from APIs?"

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

JSON basics
Advanced JSON
XML
image

How to Comment?

Appropriate method to comment on book's :

outline
text
code

How/where to make

minor comment,
major comment

A few remarks

Hi @jonthegeek, I saw your post on Mastodon but I don't have an account there so I'm just writing a few remarks here (only based on my cursory reading of chapter 2):

you could talk a bit about the package polite, probably when you talk about robots.txt
I find that the position of this chapter is strange. IMO scraping should be the last resort to get data from the web (related to you asking "Is it worth scraping?" in 2.4), so this chapter should be placed after the current chapter 8
I'd say dynamic scraping could have its own chapter where you also talk about making POST requests as an alternative to dynamic scraping.
depending on when you finish this book, you could talk about a draft PR in the rvest repo that brings dynamic webscraping to rvest (using chromote under the hood).

That's it, just my two cents, best of luck with your book 😃

Task: How many candidates ran for President of the United States ("P") as independents ("IND") in 2000, regardless of whether they raised funds?

Rendering bug

https://github.com/jonthegeek/wapir/actions/runs/5944998822/job/16123387117

I'm not sure yet why this didn't come up in the main check-in, but only in the nightly build.

Also I forgot there was a nightly build. Why?

Task: Fetch info about APIs from apis.guru

all_apis <- jsonlite::read_json("https://api.apis.guru/v2/list.json")

(discuss fromJSON vs read_json)

Task: (something about a response in XML)

Find an API that still returns XML, and work through parsing that data. Try to get it to match what we would get from JSON (a nested list), and then from there it's the same task.

Task: Find all APIs on apis.guru that are categorized as "open_data"

While not strictly NECESSARY, it's easiest to do this with rectangled data. That's the LOs I'm expecting here.

This tibblifys poorly. Don't go into tibblify yet here, and save this for a later discussion of pros and cons of tibblify.

Tasks for "How can I tell APIs who I am?"

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

api key
oauth2
other
other header stuff

Tasks for "How can I find APIs?"

Sort out 3-5 tasks/questions learners should be able to accomplish to prove they've mastered this learning objective.

Merge this LO into another and/or remove it entirely.

Draft: How can I scrape a table of data from a web page?

Initial {rvest} Learning Objectives

Notes are starting to coalesce into a chapter. Tidy that up into draft LOs.

Draft: Should I scrape this data?

Scraping Examples

Add an appendix with examples for scraping chapter, or maybe multiple appendices. That way you have control over the content + format, mostly at least. Or I guess consider putting it outside the book as raw html. In this repo or elsewhere? Deployed in github pages? Can the quarto site have static, unrelated html?

Test Build for PR

I don't want to push to prod when I PR, but I want to test that rendering works. It wouldn't hurt to have a dev site to check it out on. This would be a good reason to switch to something like netlify, and use a variable to switch where it deploys.

Task: (Image or video or whatever)

Find a stable example of an image from an API, and write a question around grabbing that.

Explore: Yahoo Finance example as session

I think this can work entirely in (released) rvest. Make it so. Also try to simplify the selectors toward future-proofing. There's little chance that ALL of those selectors are necessary to find the "table" divs.

Task: Create a tibble of all endpoints in the "fec.gov" API.

How can I handle API errors?

Just wanted to leave some notes here for the chapter How can I handle API errors?.

I made the tryr package to reflect some best practices.

You might also find the examples useful, there is a shiny app that uses the swagger page and also shows STDERR/STDOUT.

Great initiative! I will be happy to give feedback once you develop it further.

Process Feedback on scrape LOs

Process anything from #17 into tickets.

Explore: PDFs

The Web Data Science course outline specifically mentioned PDFs, and somehow I thought I'd avoid them... but I really should cover them. Within the rvest chapter? A separate chapter?

Probably try some scraping and see how much tooling there is to talk about. It might only be a ~section if there isn't much to say (other than "Good luck!")

https://www.copyright.gov/fair-use/fair-index.html has links to PDFs to try in this spike.

Code conventions

Include a section in the intro about how code is presented.

package::function vs library(package)
Pipes (specifically the base pipe)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.