Giter VIP home page Giter VIP logo

lambda-web-scraper-example's Introduction

A web scraper running on AWS Lambda

This is an example of a web scraper running on AWS Lambda and Lambda Layers. It assumes, that you have AWS CDK and Docker installed. The docker image relies on serverless-chrome.

Create a CDK app enviroment

cdk init --language python

install libraries

pip install aws-cdk.core
pip install aws-cdk.aws_lambda
pip install aws-cdk.aws_events_targets
pip install aws-cdk.aws_events

copy the files below to created app folder

run.sh
index.py
app.py
Dockerfile
lambda_app/lambda_app_stack.py

build the Docker image, the output will be stored in python/ folder

docker build -t myapp .
docker run -i -v `pwd`/python:/opt/ext -t myapp

create a S3 bucket for the assets

cdk bootstrap aws://your AWS ID/region

deploy to AWS

cdk deploy

NOTE

Please note, that by default not the newest version of Chromium is used. If you have a concern about it please update the project accordingly.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

lambda-web-scraper-example's People

Contributors

amazon-auto avatar jpmmota avatar kafka399 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lambda-web-scraper-example's Issues

Missing lambda environment variable PATH

After deploying the lambda function and trying to run a test, I've got the following error:

"Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home\n"

I was able to fix it by adding the PATH environment on lambda_app_stack.py file and now I am wondering if a Pull Request would be considered.

cdk deploy fails due to stack using assets

Following along the README I tried to deploy the template to an AWS-account after using

cdk bootstrap aws://my_id/us_east_2

which ended up in

 ⏳  Bootstrapping environment aws://my_id/us-east-2...
 ✅  Environment aws://my_id/us-east-2 bootstrapped (no changes).

but the cdk deploy ended up like this

❯ cdk deploy 
 ❌  lambdaapp failed: Error: This stack uses assets, so the toolkit stack must be deployed to the environment (Run "cdk bootstrap aws://unknown-account/unknown-region")
    at Object.addMetadataAssetsToManifest (/home/mhi/.nvm/versions/node/v14.16.0/lib/node_modules/aws-cdk/lib/assets.ts:27:11)
    at Object.deployStack (/home/mhi/.nvm/versions/node/v14.16.0/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:211:29)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at CdkToolkit.deploy (/home/mhi/.nvm/versions/node/v14.16.0/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:180:24)
    at initCommandLine (/home/mhi/.nvm/versions/node/v14.16.0/lib/node_modules/aws-cdk/bin/cdk.ts:208:9)
This stack uses assets, so the toolkit stack must be deployed to the environment (Run "cdk bootstrap aws://unknown-account/unknown-region")

Is that due to a wrong region that was chosen? The AWS Account ID I took from the My Security Credentials section in the AWS Management Console.

App name is not in valid format

File "/Users/ngjhn/Library/Python/3.7/lib/python/site-packages/jsii/_kernel/__init__.py", line 282, in create
  for iface in getattr(klass, "__jsii_ifaces__", [])
File "/Users/ngjhn/Library/Python/3.7/lib/python/site-packages/jsii/_kernel/providers/process.py", line 344, in create
  return self._process.send(request, CreateResponse)
File "/Users/ngjhn/Library/Python/3.7/lib/python/site-packages/jsii/_kernel/providers/process.py", line 326, in send
  raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: Stack name must match the regular expression: /^[A-Za-z][A-Za-z0-9-]*$/, got 'lambda_app'
Subprocess exited with error 1

In app.py the name should be lambda-app not lambda_app

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.