Giter VIP home page Giter VIP logo

api-test's Introduction

api-test

This repository contains all the tools for testing and stressing the APIs of the AI Lab projects.

Usage

For more detailed information about the usage of Finesse locust tool, please refer to the FINESSE_USAGE.md file in the finesse repository.

Tools available

There are tools that can integrate with Python or a script to accurately calculate API statistics. Currently, the needs are to test the tool using JSON files containing questions and their page origin in order to establish an accuracy score. We also want to calculate request times and generate a statistical summary of all this data. That being said, we plan to test the APIs under different conditions in the near future. For example, with multiple simultaneous users or under special conditions. That's why it's worth researching tools, if they are scalable and well adapted with Python.

Decision

We've opted for Locust as our tool of choice. It's seamlessly compatible with Python, making it a natural fit due to its easy integration. Locust is an open-source load testing framework written in Python, designed to simulate numerous machines sending requests to a given system. It provides detailed insights into the system's performance and scalability. With its built-in UI and straightforward integration with Python scripts, Locust is user-friendly and accessible. It is popular and open source, with support from major tech companies such as Microsoft and Google

However, Locust's primary purpose is to conduct ongoing tests involving multiple machines and endpoints simultaneously. Our specific requirement involves running the accuracy test just once. Nevertheless, there's potential for future integration, especially for stress and load testing scenarios that involve repeated searches.

Alternatives Considered

Apache Bench (ab)

Apache Bench (ab) is a command-line tool for benchmarking HTTP servers. It is included with the Apache HTTP Server package and is designed for simplicity and ease of use.

Pros

  • Simple to use.
  • Good for basic testing.
  • Easy integration with test scripts.

Cons

  • May not be flexible enough for complex testing scenarios.
  • Less performant for heavy loads or advanced testing.

Siege

Siege is a load testing and benchmarking tool that simulates multiple users accessing a web server, enabling stress testing and performance evaluation.

Pros

  • Supports multiple concurrent users, making it suitable for load testing.
  • Allows for stress testing of web servers and applications.

Cons

  • Lack of documentation, some arguments are not documented in their wiki.
  • May have a steeper learning curve compared to simpler tools like Apache Bench.

api-test's People

Contributors

ibrahim-kabir avatar sonoflope avatar k-allagbe avatar

Stargazers

KIDI'S-TECH avatar

Watchers

 avatar Noureddine Meraihi avatar

api-test's Issues

Add caching functionality to cache Public Search API calls already made to save on API usage

Tasks

  • Implement a caching mechanism to store Bing API responses
  • Check if a request has already been made before making a new one
  • Use the cached response if available to reduce API usage

Acceptance Criteria

  • The caching functionality should successfully store and retrieve Bing API responses
  • Requests should be checked against the cache before making a new call
  • API usage should show a noticeable decrease after implementing the caching mechanism

Compare Finesse's Score Against a Public Search Engine for Accuracy Comparison

Summary

We aim to compare Finesse's Score with what a Bing search engine could produce. Bing search engines stands as our primary competitor, and it would be highly beneficial to ascertain whether our tool competes or even outperforms them. This evaluation would significantly enhance the accuracy assessment of Finesse.

Tasks

  • Research and select a public search engine to compare with
  • Update the script with the public search engine
  • Filter of user queries to test only those containing at least one inspection.canada.ca URL.
  • Test of user queries. Saving these queries in the results of the Finesse accuracy test.

Acceptance Criteria

  • Finesse's accuracy can be compared with the Bing search accuracy score in API tests and functions for markdown, CSV, and logging
  • The integration should not affect the performance or functionality of existing features
  • The comparison mechanism should clearly display the difference in accuracy between Finesse and the Bing search

Add URL Download Option for Query Files for `finesse_test` in `api-test`

Description

Currently, finesse_test.py in api-test supports using a local path for query files with --path. This change would enable specifying a URL to download query files directly but only when using finesse_test.py.

Tasks

  • Add a --url command line option to finesse_test.py for specifying a remote URL.
  • Implement functionality to download files from this URL to a temporary directory.
  • Ensure these downloaded files are used for query execution in finesse_test.py.
  • Update tests
  • Update docs

Acceptance Criteria

  • finesse_test.py accepts a remote URL with --url.
  • Files are successfully downloaded from the URL.

Flaw in `calculate_accuracy` in `api-test`

Description

The calculate_accuracy function compares results based on partial URL identifier:

Image

The full identifier includes the last 3 parts of the URL.

Image

Tasks

  • change the regex PATTERN to /[a-z]{3}/\d+/\d+$

Acceptance Criteria

  • Accuracy is calculated based on the last 3 parts of the URLs

Log Command Usage in `api-test`

Description

Implement functionality to log each usage of api-test, recording the exact command, date, time, results...

Tasks

  • Research different methods
  • Implement basic logging to record at least the command, date, time, results of api-test executions.
  • Update tests.
  • Update docs.

Acceptance Criteria

  • Every command execution in api-test is logged with details of the command, date, time, and results at least.

Additional Information

Other information we might consider logging:

  • User Information: Identity of the user running the tool.
  • Execution Duration: Time taken for the command to execute.
  • Error Messages: Errors or stack traces from failures.
  • Version Information: Version of api-test used.
  • API Calls: Details of API calls made and their responses.
  • Output Size: Size of the output files or data generated.
  • ...

Archiving all runs history of the Finesse Benchmark Tool to Datahub

Summary

The Finesse benchmark tool currently saves test data locally, which hinders archiving all runs history for comparison with historical performance data. Moreover, not all data is captured in the CSV or Markdown files, as there is an overflow of responses from Finesse. Hence, it would be beneficial to migrate this data storage to a database solution.

Tasks

  • Create a schema on Datahub
  • Modify the script to connect to Datahub and save data on table
  • Test the migration process thoroughly with test.

Acceptance Criteria

  • Ensure all test data from Finesse benchmarking is successfully migrated to a PostgreSQL database.
  • Develop a script or tool to automate the process of saving data to the PostgreSQL table.
  • Verify that all data is accurately captured and stored in the database without any loss or truncation.
  • Implement error handling mechanisms to address any potential issues during data migration.
  • Conduct thorough testing to validate the functionality and reliability of the PostgreSQL database for storing Finesse benchmark test data.
  • Document the migration process and provide clear instructions for future maintenance and updates.

Incorporate LlamaIndex Search In The Accuracy Testing of Finesse

Summary

Incorporate LlamaIndex into the testing framework to assess the accuracy and potential of the search engine being developed.

Tasks

  • Integrate LlamaIndex into the existing testing script
  • Run tests using LlamaIndex and gather relevant performance data

Acceptance Criteria

  • The LlamaIndex tool is successfully integrated into the testing framework
  • Performance data collected using LlamaIndex provides valuable insights for assessing the search engine's capabilities and areas for improvement

Test User Generated Questions

Summary

Develop a script to convert user-generated questions from a spreadsheet into JSON format in finesse-data, input the questions for testing in the Finesse Benchmark Tool, and finally, archive the data on the company's wiki page.

Tasks

  • Create a script to parse the spreadsheet data into a JSON file in finesse-data.
  • Input the user-generated questions into the Finesse Benchmark Tool for testing.
  • Archive the generated data on the team wiki page for future reference.

Acceptance Criteria

  • The script successfully converts the spreadsheet data into a JSON file.
  • The user-generated questions are accurately tested using the Finesse Benchmark Tool.
  • The data is correctly archived on the company's wiki page.

Investigate Discrepancy in Test Scores and Zero Scores Compared to a Public Search Engine

Summary

Investigate the reasons behind user-generated tests resulting in significantly lower scores compared to AI-generated tests. Additionally, look into why there are instances of zero scores or poor scores in comparison to Bing. Finally, ensure that the relevance and finesse of search results are on par or better than Bing's standards. Verify if all documents are indexed to eliminate any indexing issues.

Tasks

  • Analyze user-generated questions
  • Investigate reasons for zero scores and poor performance compared to Bing
  • Conduct a comparative analysis with Bing to benchmark search result relevance and finesse
  • Verify the indexing status of all documents to rule out indexing problems

Acceptance Criteria

  • User and AI-generated test scores are compared to identify variations
  • Clear reasons for zero scores or poor performance are outlined
  • Search relevance and finesse match or surpass Bing's standards
  • Assurance that all documents are correctly indexed to prevent indexing issues

Enhance Finesse Benchmark Tool with Cost Estimation and Data Visualization Features

Summary

The finesse benchmark tool currently provides an accuracy score and request time for each JSON file. However, it lacks the capability to generate cost estimates. This tool is responsible for comparing different search engines to discern the best from the worst, and one decisive factor is inevitably the cost incurred by using a tool. Additionally, the tool saves tests locally, and not all the test data is exported to CSV and MD files, such as the entirety of search engine responses. This can result in loss of results and traceability. Finally, it would be beneficial to add a data visualizer such as a graph. Therefore, we are enhancing this tool with the locust-dashboard, which enables us to implement all the aforementioned features.

Tasks:

  • #4
  • Integrate cost estimation feature into finesse benchmark tool.
  • Create Jupyter Notebooks to save data on md and csv file
  • Implement a data visualizer on the locust-dashboard, such as a graph, to enhance data analysis capabilities.

Acceptance Criteria

  • The finesse benchmark tool successfully generates cost estimates for each search engine.
  • All test data, including search engine responses, are saved on datahub
  • The implemented data visualizer effectively presents the benchmark results, enhancing data analysis capabilities.

Diagram

    sequenceDiagram
        User ->> Finesse_Tool: Start test
        Finesse_Tool ->> Finesse_Tool: Cost estimate
        Finesse_Tool ->> Datahub: Save test data
        Datahub ->> Locust_Dashboard: Retrieve data
        Locust_Dashboard -->> User: Visualization displayed
        Datahub ->> Jupyter_Notebook: Retrieve data
        Jupyter_Notebook -->> Finesse_Tool: Generate md or csv file
Loading

Add the api-test folder to the new repository

Summary

Over time, the project has become too large, and it is now more appropriate to separate it from the current repository, finesse-backend. By transferring the "api-test" folder to a new repository, we can better organize our resources and facilitate management and collaboration on this project. Additionally, we intend to test APIs from other projects besides Finesse in the future. Hence, it would be beneficial to establish a common ground for all API testing.

Tasks

  • Copy the api-test folder from the finesse-backend repository.

Acceptance Criteria:

  • The "api-test" folder is successfully added to the new repository.
  • All tools and resources within the "api-test" folder are intact and functional in the new repository.

Ensure Final Newline in .json File Generation Process

Summary

The current process responsible for generating .json files is not adding a final newline at the end of the file. This missing newline can cause issues with certain tools that rely on this newline for correct parsing. To prevent potential problems, we need to update the generation process to include a final newline in the .json files it generates.

Tasks

  • Investigate the current implementation of the .json file generation process
  • Modify the process to append a final newline character at the end of the generated .json files
  • Test the updated process to ensure the newline is correctly added and does not impact the existing functionality
  • Document the changes made to the .json file generation process

PR related

Acceptance Criteria

  • .json files generated by the process should include a final newline character at the end of the file
  • The updated process should not introduce any regressions or impact existing functionality
  • Documentation should be updated to reflect the changes made to the .json file generation process

Gather and prepare the test data for Nachet

Description πŸš€

To be able to test Nachet's model, the data needs to be accessible and easily retrievable. Therefore, a script to get and organize (if needed) all the data before testing is important to the test workflows (similar to the jsonreader.py for Finesse).

Expected BehaviorπŸ“ˆ

The scripts will retrieve the image from the tests folder in the data storage (blob storage or database) and the image used by the user to build a large set of data to test the models with.

Steps by Steps πŸ“‹

  1. Get a connection to the data storage
  2. Retrieve the test images used to train and test the models
  3. Retrieve the image uploaded by our users
  4. Categorize the image into two distinct categories (user-images, test-images)
  5. Indicate to the users that the data gathering is done and the application is ready for testing

Effort and Impact πŸƒ

This issue seems to be a high effort due to the data gathering effort for the user-images. It is also high impact since the test data are essential for testing Nachet's models.

Acceptance Criteria βœ…

  • The script generates a large data set divided into two categories:
    • user-images
    • test-images
  • The script uses the Nachet database integration to retrieve the images
  • The script notifies the user when the process is done

Additional Context πŸ“Œ

Since this is also the first step for creating the Nachet testing application, it is expected that this task will also include the creation and the structure of Nachet's folder and documentation.

As a data scientist, I want to be able to generate automated test results for Nachet Interactive's Models

Description πŸš€

As Nachet Interactive progresses, a standard way to test and compare the performance of the various models used becomes necessary to provide good data and value to the data scientist. These automated tests will help them in their decision making developing new models for the application. The accuracy objective of the models described in our milestones is 90%, having these tests will help provide a good overview of the models and find the most performant one.

Steps by Steps πŸ“‹

  • Defined the objectives of the test
  • Define relevant metrics
  • Prepare test data from the blob storage
  • Implement the automated tests
  • Execute tests
  • Documents and communicate
  • Produce test reports

Acceptance Criteria βœ…

  • All test data, including models' results, are recorded
  • The implemented data visualizer effectively presents the results of the tests, enhancing data analysis capabilities.

Tasks πŸ› οΈ

  • #18
  • Start the command line testing application for Nachet
  • Refactor Finesse functions into more general ones that can do work for Finesse and Nachets
  • Record all tests run
  • Builds tools to create report tests
  • Build and maintain testing documentation (Wiki, GitHub, etc.)

Update Finesse benchmarking script to skip specific files

Summary

In order to manage the new attributes added in ai-cfia/finesse-data#9, we need to modify the Finesse benchmarking script to be able to skip specific files based on certain criteria.

Tasks

  • Modify the Finesse benchmarking script to incorporate the skipping logic
  • Test the script with the new skipping functionality

Acceptance Criteria

  • The Finesse benchmarking script should successfully skip files based on the identified criteria
  • The modified script should not affect the existing functionality of the benchmarking process

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.