Giter VIP home page Giter VIP logo

flash's Introduction

flash

Flash is highly optimised typeahead(search recommendation) server. We are building it with an aim to learn and teach various tradeoffs/practices. While building it, we will use different data structures/storage devices/protocols/deployment strategies and compare them from different aspects.

To build server:

git clone https://github.com/Shubham8287/flash
cd flash/server/src
go run server.go

To build server using Dockerfile:

  • docker build -t <image-name> .
  • docker run -dp <localport:exposedport> <image-name> #flag 'd' detached mode and flag 'p' is for port Try hitting "localhost:", you will recieve "Hello" msg.

Check if server is up:

curl localhost:8080/isAlive You should get response - Boss is always fine

To run actual Api hit - localhost:8080/find?prefix=yo you will get a response of top matching string starting with prefix "yo"

{"prefix":"yo","matches":["yo","yob","yobbo","yobboes","yobbos","yobi","yobs","yocco","yochel","yock","yocked","yockel","yockernut","yocking","yocks","yod","yode","yodel","yodeled","yodeler"]}

To profile:

for healthcheck route in localhost, I am getting response ~1-2ms.

Handy commands

  • go run ./... - to run server
  • lsof -i:[PORT] - to know PID of proccess running on PORT.
  • curl -w "@curl-format.txt" -s [URL] - to know response time of your request with curl
  • sudo pmap [PID] - check total memory used by PID
  • ab -n 100 -c 1 -k URL - use apache bench to make n request with c concurrency. -k is to keep tcp connection open.

Design perspective

  1. Expection with the server?

    keeping in mind the fact that we are designing for Humans. Average time at which humans respond is 200ms( interval b/w consecutive key stokes), So we need to keep this in mind and keep our latency < 200ms always and..... Amazon search recommendation api calls responds in around ~100ms.

  2. Where are we storing our dataset (strings which server will recommend)?

    Lookup time for Memory, SSD, and HDD is 100ns, 150us, 10 ms respective. It makes sense for us to keep all data in memory itself for low latency. We can do sharding on range bases, if data grows more than the size of 1 node. further we can use disk to take backup by taking periodic snapshot of data, for fault tolerance on poweroff. Later, we can also try to use SSD for storage and see how trade off b/w cost vs latency works.

  3. Which Data structure to choose for hold strings in memory?

    Our usecase is to fetch popular strings with same prefix. Trie looks intuitive here, but I feel we could perform better with hashmap as with hashmap it will be easy for us to shard, Maps are native for memory, and for every prefix as key, we can use a priority queue as value which contains all popular strings with same prefix as key. Doing this with trie will be little harder. But it's going to be interesting to compare both DS. Making an algorithm to scale them.

  4. Are we taking care of protocol too?

    yes, Idea is to make api call to server with every key stroke in search box and return all strings with same prefix as response, means server will be in heavy read load. We can use websockets to faster resonse from server and not setting up TCP connection for every keystroke but I feel using ws is going to be overkill as we don't need duplex connection. We can start with HTTP and do some optimisations like adding "Connection: keep-alive" header to tell server to not close connection for subsequent connections.

  5. Deployment Strategies.

    As latency is most important factor, We can try to deploy in various regions and using a load balancer to route to closes geolocation server. We need low latency, with high avaibility and less or eventual consistency that means it's fine if users in different region see different suggestions for their prefix. I feel deployment is dominant part in keeping latency low, We can do benchmarking after adding every new component(load balancer, sharding) to understand their affects.

  6. Why Go?

    Keeping perfomance in mind, I chose GO over any iterperative language. Rust might be even better choice, but as I don't have enough knowledge with Rust rigth now so may be We an migrate it later.

  7. Lot to add.

flash's People

Contributors

himanshuxmehra avatar riteshksingh1709 avatar shubham8287 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.