The api-test from ai-cfia

Add caching functionality to cache Public Search API calls already made to save on API usage

Tasks

Implement a caching mechanism to store Bing API responses
Check if a request has already been made before making a new one
Use the cached response if available to reduce API usage

Acceptance Criteria

The caching functionality should successfully store and retrieve Bing API responses
Requests should be checked against the cache before making a new call
API usage should show a noticeable decrease after implementing the caching mechanism

Compare Finesse's Score Against a Public Search Engine for Accuracy Comparison

Summary

We aim to compare Finesse's Score with what a Bing search engine could produce. Bing search engines stands as our primary competitor, and it would be highly beneficial to ascertain whether our tool competes or even outperforms them. This evaluation would significantly enhance the accuracy assessment of Finesse.

Tasks

Research and select a public search engine to compare with
Update the script with the public search engine
Filter of user queries to test only those containing at least one inspection.canada.ca URL.
Test of user queries. Saving these queries in the results of the Finesse accuracy test.

Acceptance Criteria

Finesse's accuracy can be compared with the Bing search accuracy score in API tests and functions for markdown, CSV, and logging
The integration should not affect the performance or functionality of existing features
The comparison mechanism should clearly display the difference in accuracy between Finesse and the Bing search

Add URL Download Option for Query Files for `finesse_test` in `api-test`

Description

Currently, finesse_test.py in api-test supports using a local path for query files with --path. This change would enable specifying a URL to download query files directly but only when using finesse_test.py.

Tasks

Add a --url command line option to finesse_test.py for specifying a remote URL.
Implement functionality to download files from this URL to a temporary directory.
Ensure these downloaded files are used for query execution in finesse_test.py.
Update tests
Update docs

Acceptance Criteria

finesse_test.py accepts a remote URL with --url.
Files are successfully downloaded from the URL.

Flaw in `calculate_accuracy` in `api-test`

Description

The calculate_accuracy function compares results based on partial URL identifier:

The full identifier includes the last 3 parts of the URL.

Tasks

change the regex PATTERN to /[a-z]{3}/\d+/\d+$

Acceptance Criteria

Accuracy is calculated based on the last 3 parts of the URLs

Log Command Usage in `api-test`

Description

Implement functionality to log each usage of api-test, recording the exact command, date, time, results...

Tasks

Research different methods
Implement basic logging to record at least the command, date, time, results of api-test executions.
Update tests.
Update docs.

Acceptance Criteria

Every command execution in api-test is logged with details of the command, date, time, and results at least.

Additional Information

Other information we might consider logging:

User Information: Identity of the user running the tool.
Execution Duration: Time taken for the command to execute.
Error Messages: Errors or stack traces from failures.
Version Information: Version of api-test used.
API Calls: Details of API calls made and their responses.
Output Size: Size of the output files or data generated.
...

Archiving all runs history of the Finesse Benchmark Tool to Datahub

Summary

The Finesse benchmark tool currently saves test data locally, which hinders archiving all runs history for comparison with historical performance data. Moreover, not all data is captured in the CSV or Markdown files, as there is an overflow of responses from Finesse. Hence, it would be beneficial to migrate this data storage to a database solution.

Tasks

Create a schema on Datahub
Modify the script to connect to Datahub and save data on table
Test the migration process thoroughly with test.

Acceptance Criteria

Ensure all test data from Finesse benchmarking is successfully migrated to a PostgreSQL database.
Develop a script or tool to automate the process of saving data to the PostgreSQL table.
Verify that all data is accurately captured and stored in the database without any loss or truncation.
Implement error handling mechanisms to address any potential issues during data migration.
Conduct thorough testing to validate the functionality and reliability of the PostgreSQL database for storing Finesse benchmark test data.
Document the migration process and provide clear instructions for future maintenance and updates.

Add the `canada.ca` search to the search options in `api-test`

Description

It would be great to be able to compare our search engine to https://www.canada.ca/en/food-inspection-agency/search.html

Incorporate LlamaIndex Search In The Accuracy Testing of Finesse

Summary

Incorporate LlamaIndex into the testing framework to assess the accuracy and potential of the search engine being developed.

Tasks

Integrate LlamaIndex into the existing testing script
Run tests using LlamaIndex and gather relevant performance data

Acceptance Criteria

The LlamaIndex tool is successfully integrated into the testing framework
Performance data collected using LlamaIndex provides valuable insights for assessing the search engine's capabilities and areas for improvement

Test User Generated Questions

Summary

Develop a script to convert user-generated questions from a spreadsheet into JSON format in finesse-data, input the questions for testing in the Finesse Benchmark Tool, and finally, archive the data on the company's wiki page.

Tasks

Create a script to parse the spreadsheet data into a JSON file in finesse-data.
Input the user-generated questions into the Finesse Benchmark Tool for testing.
Archive the generated data on the team wiki page for future reference.

Acceptance Criteria

The script successfully converts the spreadsheet data into a JSON file.
The user-generated questions are accurately tested using the Finesse Benchmark Tool.
The data is correctly archived on the company's wiki page.

Investigate Discrepancy in Test Scores and Zero Scores Compared to a Public Search Engine

Summary

Investigate the reasons behind user-generated tests resulting in significantly lower scores compared to AI-generated tests. Additionally, look into why there are instances of zero scores or poor scores in comparison to Bing. Finally, ensure that the relevance and finesse of search results are on par or better than Bing's standards. Verify if all documents are indexed to eliminate any indexing issues.

Tasks

Analyze user-generated questions
Investigate reasons for zero scores and poor performance compared to Bing
Conduct a comparative analysis with Bing to benchmark search result relevance and finesse
Verify the indexing status of all documents to rule out indexing problems

Acceptance Criteria

User and AI-generated test scores are compared to identify variations
Clear reasons for zero scores or poor performance are outlined
Search relevance and finesse match or surpass Bing's standards
Assurance that all documents are correctly indexed to prevent indexing issues

Finesse Response Times Varies a Lot

Investigate why
Apply the modifications

Enhance Finesse Benchmark Tool with Cost Estimation and Data Visualization Features

Summary

The finesse benchmark tool currently provides an accuracy score and request time for each JSON file. However, it lacks the capability to generate cost estimates. This tool is responsible for comparing different search engines to discern the best from the worst, and one decisive factor is inevitably the cost incurred by using a tool. Additionally, the tool saves tests locally, and not all the test data is exported to CSV and MD files, such as the entirety of search engine responses. This can result in loss of results and traceability. Finally, it would be beneficial to add a data visualizer such as a graph. Therefore, we are enhancing this tool with the locust-dashboard, which enables us to implement all the aforementioned features.

Tasks:

#4
Integrate cost estimation feature into finesse benchmark tool.
Create Jupyter Notebooks to save data on md and csv file
Implement a data visualizer on the locust-dashboard, such as a graph, to enhance data analysis capabilities.

Acceptance Criteria

The finesse benchmark tool successfully generates cost estimates for each search engine.
All test data, including search engine responses, are saved on datahub
The implemented data visualizer effectively presents the benchmark results, enhancing data analysis capabilities.

Diagram

    sequenceDiagram
        User ->> Finesse_Tool: Start test
        Finesse_Tool ->> Finesse_Tool: Cost estimate
        Finesse_Tool ->> Datahub: Save test data
        Datahub ->> Locust_Dashboard: Retrieve data
        Locust_Dashboard -->> User: Visualization displayed
        Datahub ->> Jupyter_Notebook: Retrieve data
        Jupyter_Notebook -->> Finesse_Tool: Generate md or csv file

Add the api-test folder to the new repository

Summary

Over time, the project has become too large, and it is now more appropriate to separate it from the current repository, finesse-backend. By transferring the "api-test" folder to a new repository, we can better organize our resources and facilitate management and collaboration on this project. Additionally, we intend to test APIs from other projects besides Finesse in the future. Hence, it would be beneficial to establish a common ground for all API testing.

Tasks

Copy the api-test folder from the finesse-backend repository.

Acceptance Criteria:

The "api-test" folder is successfully added to the new repository.
All tools and resources within the "api-test" folder are intact and functional in the new repository.

Ensure Final Newline in .json File Generation Process

Summary

The current process responsible for generating .json files is not adding a final newline at the end of the file. This missing newline can cause issues with certain tools that rely on this newline for correct parsing. To prevent potential problems, we need to update the generation process to include a final newline in the .json files it generates.

Tasks

Investigate the current implementation of the .json file generation process
Modify the process to append a final newline character at the end of the generated .json files
Test the updated process to ensure the newline is correctly added and does not impact the existing functionality
Document the changes made to the .json file generation process

PR related

Acceptance Criteria

.json files generated by the process should include a final newline character at the end of the file
The updated process should not introduce any regressions or impact existing functionality
Documentation should be updated to reflect the changes made to the .json file generation process

Gather and prepare the test data for Nachet

Description 🚀

To be able to test Nachet's model, the data needs to be accessible and easily retrievable. Therefore, a script to get and organize (if needed) all the data before testing is important to the test workflows (similar to the jsonreader.py for Finesse).

Expected Behavior📈

The scripts will retrieve the image from the tests folder in the data storage (blob storage or database) and the image used by the user to build a large set of data to test the models with.

Steps by Steps 📋

Get a connection to the data storage
Retrieve the test images used to train and test the models
Retrieve the image uploaded by our users
Categorize the image into two distinct categories (user-images, test-images)
Indicate to the users that the data gathering is done and the application is ready for testing

Effort and Impact 🏃

This issue seems to be a high effort due to the data gathering effort for the user-images. It is also high impact since the test data are essential for testing Nachet's models.

Acceptance Criteria ✅

The script generates a large data set divided into two categories:
- user-images
- test-images
The script uses the Nachet database integration to retrieve the images
The script notifies the user when the process is done

Additional Context 📌

Since this is also the first step for creating the Nachet testing application, it is expected that this task will also include the creation and the structure of Nachet's folder and documentation.

As a data scientist, I want to be able to generate automated test results for Nachet Interactive's Models

Description 🚀

As Nachet Interactive progresses, a standard way to test and compare the performance of the various models used becomes necessary to provide good data and value to the data scientist. These automated tests will help them in their decision making developing new models for the application. The accuracy objective of the models described in our milestones is 90%, having these tests will help provide a good overview of the models and find the most performant one.

Steps by Steps 📋

Defined the objectives of the test
Define relevant metrics
Prepare test data from the blob storage
Implement the automated tests
Execute tests
Documents and communicate
Produce test reports

Acceptance Criteria ✅

All test data, including models' results, are recorded
The implemented data visualizer effectively presents the results of the tests, enhancing data analysis capabilities.

Tasks 🛠️

#18
Start the command line testing application for Nachet
Refactor Finesse functions into more general ones that can do work for Finesse and Nachets
Record all tests run
Builds tools to create report tests
Build and maintain testing documentation (Wiki, GitHub, etc.)

Update Finesse benchmarking script to skip specific files

Summary

In order to manage the new attributes added in ai-cfia/finesse-data#9, we need to modify the Finesse benchmarking script to be able to skip specific files based on certain criteria.

Tasks

Modify the Finesse benchmarking script to incorporate the skipping logic
Test the script with the new skipping functionality

Acceptance Criteria

The Finesse benchmarking script should successfully skip files based on the identified criteria
The modified script should not affect the existing functionality of the benchmarking process

ai-cfia / api-test Goto Github PK

api-test's Introduction

api-test

Usage

Tools available

Decision

Alternatives Considered

Apache Bench (ab)

Siege

api-test's People

Contributors

Stargazers

Watchers

api-test's Issues

Tasks

Acceptance Criteria

Summary

Tasks

Acceptance Criteria

Description

Tasks

Acceptance Criteria

Description

Tasks

Acceptance Criteria

Description

Tasks

Acceptance Criteria

Additional Information

Summary

Tasks

Acceptance Criteria

Description

Summary

Tasks

Acceptance Criteria

Summary

Tasks

Acceptance Criteria

Summary

Tasks

Acceptance Criteria

Summary

Tasks:

Acceptance Criteria

Diagram

Summary

Tasks

Acceptance Criteria:

Summary

Tasks

PR related

Acceptance Criteria

Description 🚀

Expected Behavior📈

Steps by Steps 📋

Effort and Impact 🏃

Acceptance Criteria ✅

Additional Context 📌

Description 🚀

Steps by Steps 📋

Acceptance Criteria ✅

Tasks 🛠️

Summary

Tasks

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org