Giter VIP home page Giter VIP logo

example-app-store's Introduction

AI-powered app store search

This is a simple example to show how to build an AI-powered search engine for an app store using the Jina framework. It indexes and searches a subset of the 17K Mobile Strategy Games dataset from Kaggle.

Instructions

Prerequisites

  • You have a Mac or Linux system
  • You have Python 3.7 or later installed, and have some basic Python knowledge
  • You understand basic git and terminal usage

Clone this repo

git clone [email protected]:alexcg1/jina-app-store-example.git
cd jina-app-store-example

Create a virtual environment

We wouldn't want our project clashing with our system libraries, now would we?

virtualenv env --python=python3.8 # Python versions >= 3.7 work fine
source env/bin/activate

Install everything

Make sure you're in your virtual environment first!

pip install -r requirements.txt

Increase your swap space (optional)

We're dealing with big language models and quite long text passages. Macs can apparently dynamically allocate swap space, but on Manjaro Linux I manually created and activated a swapfile. Otherwise my computer with 16gb of RAM will just freeze up while indexing.

# Don't bother if you're on a Mac or have loads of memory
cd /tmp
dd if=/dev/zero of=swapfile bs=1M count=10240 status=progress
chmod 600 swapfile
mkswap swapfile
swapon swapfile

You'll need to do this after every reboot. Or you can read the instructions to mount it at startup.

Download dataset

python get_data.py

This command creates a directory called data and downloads the 17K Mobile Strategy Games dataset into it. It then shuffles it to ensure we get a diverse range of apps to search through.

๐Ÿ’ก Tip: We shuffle using a fixed random seed of 42, so every shuffle will be the same. Want a different shuffle? Change it in config.py

Index your data

python app.py -t index -n 1000

๐Ÿ’ก Tip: Use -n to specify number of apps to index

Search your data

app.py accepts an input query via a REST gateway:

python app.py -t query_restful

Start the front end

In another terminal:

git clone https://github.com/alexcg1/jina-app-store-frontend.git
cd jina-app-store-frontend
virtualenv env
source env/bin/activate
pip install -r requirements.txt
streamlit app.py

Then open http://localhost:8501 in your browser

Search from the terminal

curl --request POST -d '{"top_k":10,"mode":"search","data":["hello world"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/search'

Where hello world is your query.

The results should be a big chunk of JSON containing the matching apps. Or at least something close to matching. By default we're only indexing 1,000 apps from a list that's a few years old (since this is just an example) so don't be surprised if your search for a specific title doesn't come up.

๐Ÿ’ก Tip: For cleaner formatting, pipe the contents of the above command into jq by adding | jq to the end of the command.

FAQ

Why this dataset?

It contains a lot of metadata, including (working) links to icons. I want to build a nice front-end to show off the search experience so graphical assets are vital. Plus stuff like ratings, descriptions, the works.

The download/purchase buttons don't do anything

This is just a demo search engine. It has no functionality beyond that.

How can I change basic settings?

Edit backend/config.py or frontend/config.py

What are all these files?

After cloning, downloading the dataset and indexing data, you'll see a lot of files. We're only concerned about the backend folder since that's where all the Jina magic happens. Don't worry if you don't see all of these right away. Sometimes they'll only appear after downloading the dataset or indexing.

Filename What is it?
๐Ÿ“‚ data Folder for storing downloaded dataset
-- ๐Ÿ“„ appstore_games.csv Original dataset
-- ๐Ÿ“„ appstore_games_shuffled.csv Processed dataset that we'll index
๐Ÿ“‚ backend Folder to store backend side files
-- ๐Ÿ“„ config.py Basic config settings on backend side
-- ๐Ÿ“„ app.py Our main program file for backend side
-- ๐Ÿ“„ helper.py Helper functions
๐Ÿ“‚ frontend Folder to store frontend side files
-- ๐Ÿ“„ config.py Basic config settings on frontend side
-- ๐Ÿ“„ frontend.py Our main program file for frontend side
-- ๐Ÿ“„ helper.py Helper functions
๐Ÿ“‚ workspace Folder to store indexed data
๐Ÿ“„ get_data.py Script to retrieve dataset

You may also see several __pycache__ folders with .pyc files. Don't worry about these. They're explained here if you really want to know.

example-app-store's People

Contributors

alexcg1 avatar 0x000n3x4n avatar hanxiao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.