Giter VIP home page Giter VIP logo

keyword-price-api's Introduction

keyword-price-api

License: MIT Code style: black

Total alerts

a keyword analysis tool/API in flask: extracts keywords from e-commerce items in different categories and provides keywords/price distribution information with user’s feedback input;

Base Path: http://175.106.99.99:16758/

method url input format output format desc
GET /api/v1/results/keyword n/a JSON returns keywords' frequencies and relevant productIds for each category
GET /api/v1/results/price-spread n/a JSON returns a dictionary of img link of histogram depicting item's price distribution for each category
POST /api/v1/feedback JSON str reads and applies user's feedback for future GET requests requests

Typical Workflow

beforehand: upload target database to MongoDB

for i = 1 to repeat do:

{

  1. GET-KEYWORDS

    a. i=1: return keyword information for each category in its initial state

    b. i>=2: returns feedback-applied keyword information and most recent feedback file for #3(POST-FEEDBACK)

  2. GET-PRICESPREAD: (optional)

    a. i=1: returns a dictionary of item name and img link of histogram depicting item's price distribution for each category

    b. i>=2: after inputting feedback: returns histogram img links with feedback applied

  3. POST-FEEDBACK

    a. i=1: posts feedback information based on information from GET-KEYWORDS and GET-PRICESPREAD

    b. i>=2: posts a new feedback by APPENDING new options from the most recent feedback received from GET-KEYWORDS

    Note: Refer to Important Note

}

END IF: retrieved satisfactory information

API calls

[GET] get-keywords

URL: /api/v1/results/keyword

returns keywords' frequencies and productIds that included the keywords for each category

Responses

Successful Response

Code: 200
Content:

  • result: array of category-objs

    • category-objects

      attributes type desc uses
      categ str
      keywords list of keyword-objects see below
    • keyword-objects

      attributes type desc uses
      rank int rank based on appearance frequencies
      keyword str keyword name
      appearance int appearance frequencies *note: app.freq and product_list's value can differ
      product_list list of ints productId which had keyword in the title
  • previous-feedback: feedback json which was posted most recently ; see read-feedback for relevant data structure

e.g.

```
{
    "result": [
        {
            "categ": "portable_fan",
            "keywords": [
                {
                    "rank": 1,
                    "keyword": "fan",
                    "appearance": 5,
                    "product_list": [
                        "82497194884"
                    ]
                },
                {
                    "rank": 2,
                    "keyword": "paperclip",
                    "appearance": 1,
                    "product_list": [ 
                        "82497194884"
                    ]
                },
        },
        {
            "categ": "dustproof_mask",
            "keywords": [
                {
                    "rank": 1,
                    "keyword": "mask",
                    "appearance": 32,
                    "product_list": [
                        "27224642701",
                        "26883864380",
                        "25295068476",
                    ]
                },
            ]
        }
    "previous-feedback": [
        {
            "categ": "golfbag_set",
            "lprice": 60000,
            "hprice": 150000,
            "sub-cats": [
                "golfpouch",
                "bostonbag"
            ],
            "ignore": [
                "golfb",
                "bKakaofriends",
                "special",
                "basic"
            ],
            "effective": [
                "official_website",
                "high-quality",
                "PLACEHOLDER"
            ]
        }
    ]
}        
```

Sample Call:

curl --location --request GET 'http://175.106.99.99:16758/api/v1/results/price-spread'

Code Workflow

  1. Dataset Prep & Cleaning

    1. Generate the following dataframe obj from target DB in mongoDB dataprice
    2. current_cat acts as a pointer showing current item's lowest category level (most specific category level)
      1. i.e) iPhone is the lowest category from [Electronics -> Smartphones -> iPhones]
    3. title column is saved after basic string-cleaning
    i.e. 
    (AMAZON)(NEXT-M)<b>Kakaofriends golf</b> mask 세탁 30회 다회용mask (6395452)
                                    ⬇️
    AMAZON NEXT-M Kakaofriends golf mask 세탁 30회 다회용mask
    
    1. when sub-cats is given in feedback: merges lower category to current category
    i.e. 
    "categ": "golfbag_set",
    "sub-cats": [
        "golfpouch",
        "bostonbag"
    ],
    

    In this case,

    golfbag_set + bostonbag + golfpouch => merged into golfbag_set
    `current_cat` column value is changed to golfbag_set
    
  2. Keyword Extraction 1 [when no feedback is given]

    1. save all keywords that appeared in each category's item to dict{Category: Counter(keywords)}

      a. i.e. {Category1 : [('handbags',23), ('luxurious',12), ('pouches',5)]}

  3. Keyword Extraction 2 [when feedback is given]

    1. Iterate through all keywords that appeared in each category's item

      a. if price is inside the range [lprice, hprice] -> save keyword to gen_countobj

      b. if price is outside the range => save keyword to sus_countobj

    2. Update Counter object

      a. Remove gen_countobj keywords from sus_countobj

      b. Iterate the remaining keywords again to update ignore, effective.

  4. JSON Generation

  • generate a dumpable final JSON file based on dict from step 2
    Category A
        {
            "rank": (int),
            "keyword": (str),
            "appearance": (int),
            "product_list": list(int)
        }
    

[GET] get-pricespread


URL: /api/v1/results/price-spread

returns a dictionary of img link of histogram depicting item's price distribution for each category

Responses

Successful Response

Code: 200
Content:

KEY VALUE
category (str) img_link (str)

예시

{
    "portable_fan": "https://fkz-web-images.cdn.ntruss.com/price-spread/portable_fan.png",
    "mask": "https://fkz-web-images.cdn.ntruss.com/price-spread/mask.png",
    "laptop": "https://fkz-web-images.cdn.ntruss.com/price-spread/laptop.png",
    "showergown": "https://fkz-web-images.cdn.ntruss.com/price-spread/showergown.png",
    "waterbottle": "https://fkz-web-images.cdn.ntruss.com/price-spread/waterbottle.png",
    "large_umbrella": "https://fkz-web-images.cdn.ntruss.com/price-spread/large_umbrella.png",
    "macbook": "https://fkz-web-images.cdn.ntruss.com/price-spread/macbook.png"
}

Sample Call:

curl --location --request GET 'http://175.106.99.99:16758/api/v1/results/price-spread'

Code Workflow

for categ in all_categories_as_list:

  • generate a fresh DataFrame of rows in which lowest_category==categ in DB
  • generate a seaborn.histplot(histogram) from DataFrame's price column
  • Upload generated histograms to objective storage using boto3
  • save the links as dict{categ: img_link}

end for

return dict

[POST] read-feedback


URL: /api/v1/feedback

reads and applies user's feedback for future GET requests

Input Data Params

  • array of:
    • keyword-feedback object
    • attributes type desc uses
      categ str keyword name
      lprice int min value in genuine product's price range item keywords with the price inside genuine's range is removed from final results
      hprice int max value in genuine product's price range "
      sub-cats listOfStrings lower category names that will be merged into current category items in lower category will all belong under current keyword
      ignore listOfStrings keywords to ignore removed from final return
      effective listOfStrings keywords that seem promising/critical displayed even if app.freq is 0

i.e.

[
    {
        "categ": "golfbag_set",
        "lprice": 60000,
        "hprice": 150000,
        "sub-cats": [
            "golfpouch",
            "bostonbag"
        ],
        "ignore": [
            "golfb",
            "bKakaofriends",
            "special",
            "basic"
        ],
        "effective": [
            "official-website",
            "high-quality",
            "PLACEHOLDER"
        ]
    },
    {
        "categ": "hats",
        "lprice": 35000,
        "hprice": 55000,
        "sub-cats": [],    # they can be left as empty list
        "ignore": [],
        "effective": []
    }
]

Responses

Successful Response

Code: 200
Content: received feedback

Sample Call:

curl --location --request POST 'http://175.106.99.99:16758/api/v1/feedback' \
--data-raw '[
    {
        "categ": "hats",
        "lprice": 35000,
        "hprice": 55000,
        "sub-cats": [],
        "ignore": [],
        "effective": []
    },
]'

Important Note

For any iteration(i>=2) mentioned inTypical Code Workflow, you must include previous-feedback from [GET] get-keywords so all the previous options can be applied for the new iteration.

Therefore when writing new feedback files, we suggest you copy previous-feedback's content and edit accordingly.

i.e.

"previous-feedback": 
[
    {
        "categ": "golfbag_set",
        "lprice": 60000,
        "hprice": 150000,
        "sub-cats": [
            "golfpouch",
            "bostonbag"
        ],
        "ignore": [
            "golfb",
            "bKakaofriends",
            "special",
            "basic"
        ],
        "effective": [
            "official-website",
            "high-quality",
        ]
    }
]

updated feedback.json

[
    {
        "categ": "golfbag_set",
        "lprice": 60000,
        "hprice": 150000,
        "sub-cats": [],             # editted
        "ignore": [
            "golfb",
            "bKakaofriends",
            "special",
            "basic",
            "discount",                 # added
            "golf"                  # added
        ],
        "effective": [
            "official-website",
            "high-quality",
            "made_in_china",               # added
            "imitation"                   # added
        ]
    }
]
  • "sub-cats": remove options as lower categories were merged in the previous iteration
  • "ignore": added 'discount' and 'golf'
  • "effective": added "made_in_china" and "imitation"

Code Workflow

- read the POSTed json file
- save as `./cache/feedback.json` & `./cache/feedback.pkl` for future accesses

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.