Giter VIP home page Giter VIP logo

shop's People

Contributors

abij avatar asnare avatar barend avatar basvdl avatar bbrumi avatar bjgbeelen avatar friso avatar hgrif avatar jczuurmond avatar krisgeus avatar nielszeilemaker avatar poppash avatar rvacaru avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

alexrogalskiy

shop's Issues

Implement search functionality

  • Add search_handler to the webapp module (/search?q=TERM)
  • Add CatalogSearchResource to the service module (/api/catalog/search?q=TERM)
  • Update the web template as needed

Avro deserialization improvements

Originally commented by @barend on #23. New issue created to not lose this valuable information.

I'm not a huge fan of the class design with a `var` Schema. The underlying Java serialization kind of forces your hand here, and it's not easy to change. There are some unclear/undefined semantics here. For example, the `reader` field during deserialization:

1. the JVM creates a blank instance of this class, using a default (no-args) constructor that's not visible in this file (I think you inherit it from `Serializable` through some Scala magic, but I'm not even sure).
2. the field `reader` gets initialised to a new GenericDatumReader with a `null` schema. How does it respond to nulls? Can we be sure this behaviour doesn't change when the Avro library is updated?
3. the `readObject()` method runs and replaces the schema and reader variables.

Particularly bad client code could even find a way to invoke `writeObject` while `readObject` isn't done and attempt to toString a null value, but I suppose that's mostly hypothetical. You'd have to bend over backwards to force that error.

### Advice

- Java serialization is a lot trickier than it looks or was ever intended to be. I advise to **always** add a unit test for every serializeable class that pushes an instance through an ObjectOutputStream into an ObjectInputStream and makes sure that it gets reconstructed correctly.
- As an alternative to writeObject/readObject you can use writeReplace/readResolve to implement a [Memento Pattern](https://www.oodesign.com/memento-pattern.html), giving you more control over the serialised form. It's still a bit intricate, but it will let you define the class using `val`s for the schema and the reader.
- The book ["Effective Java (third edition)"](https://www.goodreads.com/book/show/34927404-effective-java) has detailed advice on how to handle this. This book is pretty wonderful and much recommended. It's so good, I lent my copy to someone at Intergamma and never got it back ๐Ÿ˜ž .
- Basically, there's a reason Java Serialization isn't very popular these days; avoid using it when you can (use JSON, or Protobuf, or anything else). Spark forces your hand here, it requires serialization in order to distribute executable program code to the worker nodes. Thankfully, most of the time the serializeable classes define only a stateless function (such as the udf below), and all these weird state and order-of-initialization issues don't apply. This class happens to be an exception to that rule.

Originally posted by @barend in #23

Strip missing images from the dataset

Some of the images we're using from Flickr have been removed by the original author. On load these are replaced with an "image has been removed" placeholder.

We should drop these from our dataset. To detect them:

  1. Request the image URL.
  2. If it's been removed a 302 redirect is issued to https://s.yimg.com/pw/images/en-us/photo_unavailable.png

Implement search suggestions functionality

The search box should have a suggestions dropdown. A partial implementation exists in CompletionResource.java.

  • Create the completion index in Elasticsearch as items are loaded
  • Finish the CompletionResource and enable it in Main.java
  • Proxy the completion service in webapp
  • Add JavaScript to drive a completion system in the web page
  • Feed item popularity into ES to update suggestion weight

Add test cases for the shop-api (service)

When investigating why some JSON was not accepted by the shop-api (service) it would have been useful to easily test the generated JSON against the Item class in CatalogItemResource.java to verify if the JSON will be accepted by the shop-api.

Ingest images from Flicker using Airflow with backfill

Now there is a download-category.py which downloads the files locally.
And an put-catagories.py to upload in ElasticSearch.

These can be combined in a nice Airflow DAG that is responsible for a certain time-period.
For example: images from 20-sept for a given category.

This could run for multiple days in the past.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.