Giter VIP home page Giter VIP logo

bibici_nius's Introduction

Code Challenge - Bibicí Níus


Descrição

Bibici Nius is my solution to a coding challenge.


Coding challenge statement


This page consists a coding challenge for Data Engineering roles.


Purpose


Aim of this test is three fold:

  • Evaluate your coding abilities;
  • Judge your technical experince;
  • Understand how you design a solution


How you will be judged?


You will be scored on:

  • Coding standard, comments and style;
  • Overall solution design;
  • Appropriate use of source control


Intructions


Please create a free account in GCP

Candidate should put their test results on a public code repository hosted on Github.

Once test is completed please share the Github repository URL to hiring team so they can review your work.

You are building a backend application and no UI is required, input can be provided using a configuration file or command line


Challenge - News Content Collect and Store


Create a solution that crawls for articles from a news website, cleanses the response, stores in BigQuery (bonus) then makes it available to search via an API.


Details


Write an application to crawl an online news website, e.g. The Guardian or BBC using a crawler framework such as Scrapy . You can use a crawl framework of your choice and build the application in Python.


The appliction should cleanse the articles to obtain only information relevant to the news story, e.g. article text, author, headline, article url, etc. Use a framework such as Readability to cleanse the page of superfluous content such as advertising and html.


Store the data in BigQuery, for subsequent search and retrieval. Ensure the URL of the article is included to enable comparison to the original.


Bonus


Write an API that provides access to the content in BigQuery database. The user should be able to search for articles by keyword.



How to use?

  • Download the repo and run app.py
  • Click on the link that appears in the terminal output
  • In the search bar, add on the end of the URL /Article/*KEYWORD* (to search for a specific keyword) or /AllArticles (to search all articles in the database - around 30 articles).

Improvment opportunities


I didn't have enough time to learn how to:

  • Improve data access control in BigQuery.
  • Implement the input using a configuration file or command line:

I believe these would be the improvements for the next steps.


Autor

foto do autor
Vini Antunes

bibici_nius's People

Contributors

viniviniantunes avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.