Evaluate AI

Unlocking AI Transparency: Empowering Trust Through Precision Evaluation.

By Team "Hackatlopi", Junction 2023, addressing the Outokumpu Sustainable AI challenge

The goals of our solution

We aim to built more advanced and sustainable AI experiences by echieving what is not sufficiently provided by any other tools:

Evaluations of the environmental impact of training and deploying LLMs*
Evaluations of LLMs’ interpretability and explainability*
Ways to check with AI if information generated by AI is correct or wrong

*features partially under development

How do we plan to achieve them?

A comprehensive solution designed to assess the reliability, interpretability, and resource utilization of any Large Language Model (LLM) tool currently in use. This tool aims to provide a thorough evaluation, ensuring that the LLM's trustworthiness is upheld, its interpretability is clear, and it optimally utilizes resources in a production environment, therefore prooving long-term planning.

The tool helps to test the trustworthiness and sustainability of an LLM model based on the following criteria:

Explainability
Reproducibility
Fairness
Factuality and precision*
CPU use / computer resources usage*
Query response time

While building the prototype, we inspired from such resources as:

Conceptual frameworks (AI Verify, Vertex AI)
Fact-checking solutions (EvalAI, Factinsect)
Analysis and MLOps (Comet ML, Snorkel AI, W&B)

*features partially under development

Our prototype

How did we built it?

Our team was pleased to have a wide range of diverse specialists, starting from full-stack development and AI/ML, and ending with project management and business. We successfully used collaboration tools and streamlined our team work.

The tech stack we used consists of:

Python - as our main programming language
Llama Index - for deeper LLM understanding and insights
OpenAI tools - to power the intelligence and decision making
Docker - for making it scalable
Vue (with Tailwind) - for beautiful design

Our future roadmap

Develop the feature that would generate suggestions on how to improve LLM models tested with our tool.
Improve UI and front-end side of the tool, so that it is easily accessible and usable by larger audiences.
Add and improve the feature that helps to analyze physical metrics of LLM models, more specifically GPU, CPU consumption.
Test the existing tool with at least 20 LLM models to understand the efficiency of the built tool. Make improvements based on the conclusions from testing.

Additional resources

Some more cool resources about our project:

Video demo of our prototype

oakw / junction-23-hackatlopi Goto Github PK

junction-23-hackatlopi's Introduction

Evaluate AI

Unlocking AI Transparency: Empowering Trust Through Precision Evaluation.

The goals of our solution

How do we plan to achieve them?

Our prototype

How did we built it?

Our future roadmap

Additional resources

Thank you

junction-23-hackatlopi's People

Contributors

Stargazers

Watchers

junction-23-hackatlopi's Issues

Add evaluation functionality

Initial version of ORIGINAL APP

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent