Giter VIP home page Giter VIP logo

lcbo-api's Introduction

LCBO API

Hello πŸ‘‹, welcome to LCBO API πŸ™‚

If you find yourself here wondering "what's an LCBO API?", let me explain. In Ontario, Canada all beverage alcohol sales go through a government owned corporation called the Liquor Control Board of Ontario (LCBO) which handles retail and distribution of alcoholic beverages throughout the province. The LCBO has numerous retail stores and a website that hosts a catalog of every product, store, and even inventory levels. They publish a seasonal catalog with recipes, editorials, and other content called Food & Drink. They also contribute billions of dollars of revenue to our public healthcare system annually. It's a fascinating situation when you think about it, other places have similar systems but to my knowledge none have the breadth and depth of the LCBO. So, now you know what it is, pretty cool eh?

This might be interesting to you even if you don't live in Ontario, Canada, if:

  • You want access to a lot of data for learning or testing
  • You are curious to learn how some types of web crawlers work
  • You are curious to learn what a production Rails application spanning 10 years of development might look like

An important notice 🎁

Over the entire course of this project I have struggled massively with the idea of accepting financial support. On one hand LCBO API needed it, on the other I was weary of the complications it would cause. Well, now I have THE PERFECT solution to this problem!

I'm undergoing treatment for blood cancer right now, specifically diffuse large b-cell lymphoma. I'm going to write more about that soon somewhere else but during this past year people from everywhere have supported me in every way they could, and it has changed me. I want us to do something big to show that we care too!

If you've ever wanted to support this project in the past, please, make a donation to Hamilton Health Sciences on LCBO API's behalf, they are saving my life.

Hamilton Health Sciences
Donate to Hamilton Health Sciences

I am undergoing treatment at the Juravinski Cancer Center, but really you can choose any option, or leave the default one. It doesn't matter if the amount is small or large, they will notify me when you donate. I will tabulate a list to track the total, let's see how much we can raise!

Finally I'd like to make a special mention to my workplace, Crowdmark. They have been incredibly kind and understanding during all of this, and I quite literally would not have been able to do this without them. We are working tirelessly toward advancing the status quo of assessment in higher education. If you care about education and learning, I urge you to check us out.

Background

In the fall of 2008 I was a freshly minted web developer with a few years of experience under my belt. I was hungry for a challenge, and for some recognition. Apps were becoming a thing at the time and I wanted to build one, badly. I decided I wanted to build one that would require me to first build this API. I never did build that app πŸ˜†

Be kind

If you look into this codebase long enough you are likely to find moments of frustration, dead ends, confusing cruft, etc. I really hope you don't focus on that or on any negativity you might find. I'm not that person anymore, and I don't want you to be that person either. I am an open book on this! Open an issue and ask me a question, I will be as honest and respectful as possible, I only ask you do the same. πŸ™

License

I'm releasing this project under GNU GPLv3, I think this is the most fair and responsible option for a project like this. If you feel differently, open an issue and we can have a discussion in the open about it. I only ask, respectfully, that you do not reuse the branding and design. I'm fine with re-use of the documentation, but the styling, identity, and branding must be changed if you want to deploy your own siloed version of this app.

Have big dreams!

What if instead of a monolithic application trying to do everything in one style, in one place, we thought bigger? What if the crawler was a separate project again, responsible for collecting and normalizing data, others could build API nodes in whatever platforms they wished, those nodes would register with the data provider and receive updated data as it was made available and provide that data to users of all different types.

Instead of dozens of similar API servers trying to do the same thing, fighting for LCBO.com's resources we could focus our efforts on building value on this data instead of fighting for ownership over it. We could get other disciplines involved to generate new value beyond the obvious, get the craft beer and wine communities involved, build on this, make it bigger than just Ontario.

I don't know how feasible something like that would be, but I know if others are interested I would love to have these discussions.

Also, maybe we should consider charging corporate users a reasonable fee to access the API nodes, that money could be used to fund the hosting costs to keep things sustainable, it could also be used to fund support programs for people who can't drink, or who don't want to drink, or who want to drink less, to give back to our communities and to actually make a difference.

But I need others to help share the burden of maintaining and managing all of this, my and my family's health and happiness is priority #1, followed by my career, and then my friends and community. This project can't be my #1 time sink anymore, it's not sustainable for me, and it's not healthy for me. But if you are inspired by this message, please reach out to me, ideally in the open, but privately at first is okay too.

I hope this excites you!

I couldn't help myself, I wrote more about my ideas on this: doc/lcboapi-proposed.md

Getting started ✨

Now, with that out of the way, we can start getting into who I really did this for, and what got me excited and inspired to do this in the first place: the opportunity to learn and grow and to help others do the same. For those of you out there who are curious, let's see where this goes πŸ™‚

Running the Rails app πŸ’Ž

You can probably run the app directly on your host environment, it doesn't require anything too fancy as far as system dependencies are concerned. I develop on Apple hardware, if you do too, you may have success using Postgres.app, and Homebrew for installing Redis. Otherwise, you can use Docker.

If you have experience with another platform, please make a PR or an issue and we can work at adding your platform to the README.

Also, if what follows here makes no sense to you, open an issue, maybe we could do a screencast to demonstrate the process, or maybe someone out there who's good at that would take that on?

What I describe below is only one way to set up a development environment to run LCBO API on your computer. If others have improvements (there's room for many, an entrypoint script to bootstrap the dev database for instance) or even different approaches, like using Vagrant + VirtualBox, open an issue or a PR, I'm happy to add them.

If you want to help, I want to enable you.

First steps

Setting up config/secrets.yml and .env

First, you'll need to set up some configuration which is not provided in the public repository. The reason this is done is to protect private data such as API keys and secret tokens, but also because some developers may prefer slightly different settings for their personal preferences and things like that.

There are a couple files you'll need to create, config/secrets.yml, and .env. There are template versions in the repo under config/secrets.yml.example and .env.example, you can copy those files to get started:

cp config/secrets.yml.example config/secrets.yml
cp .env.example .env

If you are just wanting to boot the app and access it locally, you should be good to go at this point. If you want to be able to use the crawler and have it save a snapshot saved to Amazon S3, you'll need to add your AWS credentials and bucket to config/secrets.yml.

The rest of the settings either only really matter in a production environment, are not really used, or only matter if you don't like the default preference. As always, if you need clarification, open and issue and I'm happy to help.

Getting the app running for the first time

First, you'll need to install the Docker client for your system, you can find out about that here. Once you've installed Docker, you can get things started:

Next, you will need to build the containers:

docker-compose build

When that is done, you can boot up the whole thing by issuing:

docker-compose up

At this point, you don't have any data in the database, so if you load the app, http://localhost:3000, it won't do much, it serves data after all, and there's no data in it. So let's do something about that.

Go ahead and shut down the containers:

Ctrl-C

That means, press the Control + C keys simultaneously.

You can download an archive of the latest production database dump from my personal Amazon S3 account here. Please note that there are sensitive tables (emails, users, keys) and that data has been excluded from this file.

Download and extract the archive in the tmp directory of this project:

cd tmp
curl -O https://heycarsten.s3.amazonaws.com/lcboapi-2019-01-21.tgz
tar xzf lcboapi-2019-01-21.tgz
cd ..

The file is about 300MiB, so it might take a while to download depending on your connection speed (this happens on the line that starts with curl).

Once you've downloaded and extracted the database file, you can load the data into the database:

docker-compose run --rm app rake db:create
docker-compose run --rm app bash -c 'pv tmp/lcboapi-2019-01-21.sql | psql -q -h db -U $POSTGRES_USER $POSTGRES_DB > /dev/null'

The first line, ending in rake db:create will create the database schemas in Postgres for development and testing, the second line will load the database dump into the development database. The progress bar indicates how much of the data has been piped into the database, once that completes indexes will be built. This might take some time depending on your machine, it's a fair amount of data. Then you can fire up the app again:

docker-compose up

You can also safely delete the archive and extracted SQL file from your tmp directory too at this point.

If you're finding typing docker-compose over-and-over tedious, look into shell aliases

You can add an alias line to your shell profile like alias dc=docker-compose and then you can just type dc instead of having to type docker-compose every time. βœ…

Think of other aliases you could create to improve on this even further πŸ€”

Now, navigate to http://localhost:3000/products/438457

Boom. You've got LCBO API running on your computer! πŸ‘ πŸ‘ πŸ‘

Running the app from now on

When you're done working on the app, just issue Ctrl+C to shut everything down. The next time you want to work on it again, run docker-compose up and you're good to go!

Rebuilding (bundle install)

If you add a new gem to the Gemfile, you will need to re-install the packages and update the dependencies. Docker is pretty good at doing this, it can tell when Gemfile changes, and it knows to rebuild the app container for you.

Opening a Rails console

To fire up a Rails console and inspect objects inside the application:

docker-compose exec app rails c

Once that's running you can do stuff like:

Fetch the first product in the database:

Product.first

Find LCBO retail store #25 (my local store):

Store.find(25)

If you change code in the app, you'll need to issue the reload! command in the console to refresh the changes.

Running tests

Inside of the spec folder, you'll find the test suite for LCBO API. It is regrettably not comprehensive, but it's not too bad either. I have struggled my entire career to maintain test suites that I am satisfied with. We should improve these tests!

To run the test suite:

docker-compose exec app rspec

You'll see a bunch of green dots ., each one of those represents a passed test case. That's good. If a failure occurs you'll see a red F, that's bad... JUST KIDDING! Actually it's good! Tests give you the power to change things in an existing codebase and see if you cause any regressions to existing functionality. Of course nothing is perfect, but I can tell you without a doubt, from experience, tests are good.

As applications get bigger and bigger and more and more complex not having tests becomes a literal nightmare, it makes changing your application and adding features an extremely brittle process. Things like using languages with type systems and other various different programming paradimgs can go a long way to help with this too, but I am of the opinion that there really is no replacement for at least a solid acceptance test suite.

Crawler

This is the part of LCBO API that kind of makes the whole thing possible. Crawlers for complicated websites are hard to build and maintain. The first version of LCBO API had a full test suite for the crawler, when everything changed many years ago I gave up on that codebase and just build something as fast as I could in this one.

The crawler logic is located in lib/crawler.rb, from there you'll see all of the various tasks that happen in succession to encompass a complete crawl of the LCBO websites.

The parser logic is located in lib/lcbo.rb and all of the various files within lib/lcbo/*, this includes all of the various requests that need to happen, and the code responsible with turning the data from those requests into structured data that can ultimately go into the database.

Crawling politely

I designed the crawler to perform requests in serial, this is a very good approach to take when you are crawling one website. It's simple for one which is always the best route to take if you can, and it's polite. We could fire off n AWS Lambda jobs and crawl every page on LCBO.com in a few seconds, but that would be rude, and we'd probably DDoS their website momentarily, not good.

Manager app (/manager)

This encompasses an Ember application, when you sign up/in to LCBO API and generate API keys, this is what you're interacting with. It's quite outdated, I haven't tried building it. I've been using Ember since day zero so if you have any questions about this, make issues. I'd actually quite enjoy discussing this part of LCBO API and working with you all to improve it.

Static site (/static)

This contains a Middleman site, when you visit lcboapi.com this is what you're looking at. It also contains a very small (also outdated) React app which is the "Give it a try" thingy on the right of the homepage. It has it's own Gemfile and build script static/generate, when that is run it builds the site and syncs the changes into the public folder. In Rails apps the public folder is served as static content.

Dead ends

There are A LOT of dead ends in this codebase, branches that went 40-60-80% of the way to a feature and then stagnated, experiments, etc. As always if you find something that's got you like πŸ€” just file an issue and I'll respond as soon as I can.

It would also be cool if we could tie up some of the dead ends in here, I'd be super interested in finally getting JSON:API and GraphQL added in. The current API response design is from 2008!!! In a way I'm kind of surprised that nobody ever complains about it.

The #1 by far most requested feature I never implemented was categories, it's sort of in here, I forgot why I never shipped it, I don't remember what final stuff had to fall into place, but maybe that would be a good first thing to tackle? There's also Producers, and Origins which never quite got wrapped up.

I also have a whole bunch of other repos and little experiments I made over the years, I was always fascinated by the idea of inventory level prediction, somewhere there's a dataset dump analysis tool written in Go for analyzing the CSV dumps for a particular product inventory over a set of time. I'd be happy to release that stuff too if there's interest.

WIP (Work in progress)

I'm going to leave it here for now and I'm going to wait to hear back from you. I would love to keep adding to this base of knowledge in whatever way people want to see (screencasts, interviews, inline documentation, etc.) I also want you to do the same, if you're not sure, ask. ❀️

lcbo-api's People

Contributors

gilgen avatar heycarsten avatar minusfive avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lcbo-api's Issues

What does the manager app do?

Hi, I'd like to know what the manager app does or was intended to do, its features, and etc. I tried following your instructions to run it, but it does not work - probably because it is outdated. Leaving the errors here, just in case anything can be done. Thanks!

npm install does not build properly:

WARN invalid config loglevel="notice"
Can not download file from https://raw.githubusercontent.com/sass/node-sass-binaries/v2.1.1/win32-x64-node-10.15/binding.node

[email protected] postinstall C:\Users\zeesh\OneDrive\Desktop\Gettit\manager\node_modules\node-sass
node scripts/build.js

internal/modules/cjs/loader.js:583
throw err;
^

Error: Cannot find module 'C:\Users\zeesh\OneDrive\Desktop\Gettit\manager\node_modules\node-sass\node_modules\pangyp\bin\node-gyp'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:581:15)
at Function.Module._load (internal/modules/cjs/loader.js:507:25)
at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:743:3)
Build failed
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN [email protected] had bundled packages that do not match the required version(s). They have been replaced with non-bundled versions.

And then running ember server gives:
internal/modules/cjs/loader.js:583
throw err;
^

Error: Cannot find module 'internal/util/types'
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:581:15)
at Function.Module._load (internal/modules/cjs/loader.js:507:25)
at Module.require (internal/modules/cjs/loader.js:637:17)
at require (internal/modules/cjs/helpers.js:22:18)
at evalmachine.:44:31
at Object. (C:\Users\zeesh\OneDrive\Desktop\Gettit\manager\node_modules\ember-cli\node_modules\configstore\node_modules\graceful-fs\fs.js:11:1)
at Module._compile (internal/modules/cjs/loader.js:689:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
at Module.load (internal/modules/cjs/loader.js:599:32)
at tryModuleLoad (internal/modules/cjs/loader.js:538:12)

Help getting an api key for a student project?

Hello, I am a computer programming student at Niagara. I have three weeks to build a mobile app. My proposed app intergrates the listing features of the LCBO and some of the basic social features of untappd in a simple way. Thus, a user could browse their local LCBO for beers and see at a glance which they have tried and which not, as well as the ratings from untappd, and possibly a detail page with more information. The user could make a wish list before they go to the lcbo if they want. My idea is to use barcodes initially to associate LCBO beers with their counterparts on untappd, a slightly inexact process as they don't have the same barcode databases, but good enough to be getting on with for my purposes.

If some one has a server running and would not mind generating me API keys for testing it would save me a lot of time. I have no plans to publish the app at this point and wouldn't share the keys.

Please drop me a note at [email protected] if you would be willing to help.

Thanks,

Erik

Centralized crawling

I think it's a bad idea if we all run the crawler for our own purposes, it will DDOS the lcbo website. I'm looking for people to collaborate with, to run one instance of the crawler and share the results. Ideas?

Legacy datasets availability

Hello!

Are the legacy datasets still available by chance? I'm specifically looking for the ones above 2451 so I can run some analysis on the last year's worth of data. I'd be happy to look into making new ones available if there's interest.

Thanks! Nice to see this project going open source!

Integrate CircleCI

Would be nice to set up continuous integration for LCBO API. It will also make for an easier time bringing on new maintainers. CI is great, CircleCI is free for open source projects, LCBO API should use it πŸ™‚

Localization (fr-CA)

I always wanted to have both English and French Canadian versions of the content and docs on LCBO API, regrettably I'm not bilingual, anyways, if someone is curious to take this on hit me up ⚜️

Migrations are pending

Cannot render console from 172.18.0.1! Allowed networks: 127.0.0.1, ::1, 127.0.0.0/127.255.255.255
app_1 | (0.2ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC
app_1 | ↳ /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/log_subscriber.rb:98
app_1 |
app_1 | ActiveRecord::PendingMigrationError (
app_1 |
app_1 | Migrations are pending. To resolve this issue, run:
app_1 |
app_1 | bin/rails db:migrate RAILS_ENV=development
app_1 |
app_1 | ):

Request Verification

Users create API keys in order to gain access to LCBO API, this is a small step toward adding visibility to usage of the API. The API is targeted to be used server-side by web apps and backend apps. It's also intended to be used in the browser via CORS or JSON-P. Finally, it's intended to be used from

  • Server -> LCBO API
  • Browser -> LCBO API
  • Mobile App -> LCBO API

Keys have the following attributes:

  • Server keys are rate-limited to 8000 requests per hour, but also do not support CORS or JSON-P, they also don't require an Origin header. Use these when accessing LCBO API from your own web-server or backend.
  • Client keys are rate-limited to 800 requests per hour and support CORS and JSON-P, they require an Origin header to be present and are tied to a domain. Use these when accessing LCBO API from a mobile phone app or browser-based JavaScript application.

Auth Tokens have the following attributes:

  • They are rate-limited to 1000 requests per hour and support CORS but only when accessed from lcboapi.com, when accessed without an Origin they do not support CORS. Don't use auth-tokens for accessing LCBO API data endpoints.

Assess and improve Docker stuff

I've been reading Docker for Rails Developers on the recommendation of my friend and long time colleague Ben Moss, it's excellent and I see that I'm not doing things as well as I could around here, when I have time I'll update stuff with my new found knowledge

No Crawler information on running crawler

there is only a brief mention of the crawler but no instructions on how to run the crawler. if you could post the commands to run the crawler id be more then happy to update the read me with the information and a guide on how to use it.

Docker-Compose Build not working

I am trying to run the app following your instructions. Not sure if I am doing anything wrong, but Step 8/13 in docker-compose build returns multiple errors in fetching all the necessary files, and then finally returns:
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
ERROR: Service 'app' failed to build: The command '/bin/sh -c DEBIAN_FRONTEND=noninteractive apt-get -yqq update && apt-get -yqq install software-properties-common
apt-transport-https build-essential git-core openssl libssl-dev acl zip pv postgresql-client-$POSTGRES_VERSION libpq-dev nodejs yarn' returned a non-zero code: 100

Add Skylight to project

LCBO API was approved for the Skylight open source program! πŸŽ‰ That means the power of Skylight for free!

  • Add Skylight back to the app
  • Add badges to README
  • Add badge to homepage sponsors (tragically, it won't be online after Jan 15th... or will it πŸ€”)

Rate Limit Semantics

Yo, currently rate limiting works globally by IP or API Key, this was just a spike and is dumb, going forward rate limiting must meet the following requirements.

For V1 endpoints always rate limit by IP address to 300 requests per hour

  1. When the request is made, increment the hit_count for the request's IP.
  2. If the request has an API Key get its max_hits_per_hour, otherwise use the default for the given context.
  3. TODO: Elaborate more.

Decompressing lcboapi-2018-12-17.bz2 doesn't work

I tried the tar xzf command recommended in the wiki.

I also tried recommendations from stack overflow
bzip2 -d lcboapi-2018-12-17.bz2
bzip2 -dk lcboapi-2018-12-17.bz2
bunzip2 lcboapi-2018-12-17.bz2

None of these worked either. They returned the error "bzip2: lcboapi-2018-12-17.bz2 is not a bzip2 file."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.