Topic: ai-safety Goto Github

Some thing interesting about ai-safety

👇 Here are 96 public repositories matching this topic...

agencyenterprise / promptinject

ai-safety,PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

Organization: agencyenterprise

ai-safety language-models ml-safety agi ai-alignment agi-alignment adversarial-attacks gpt-3 large-language-models machine-learning

ai4ce / flat

ai-safety,[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory

Organization: ai4ce

Home Page: https://ai4ce.github.io/FLAT/

deep-learning point-cloud lidar adversarial-attacks 3d-object-detection ai-safety trustworthy-ai trustworthy-machine-learning 3d-perception robotics

batsresearch / cross-lingual-detox

ai-safety,Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages"

Organization: batsresearch

Home Page: https://arxiv.org/abs/2406.16235

ai-safety mechanistic-interpretability multilingual-nlp nlp cross-lingual-transfer generalization

cure-lab / contranet

ai-safety,This is the official implementation of ContraNet (NDSS2022).

Organization: cure-lab

adversarial-attacks ai-safety defense

dit7ya / awesome-ai-alignment

ai-safety,A curated list of awesome resources for Artificial Intelligence Alignment research

User: dit7ya

awesome awesome-list ai-safety ai-alignment

dlmacedo / distinction-maximization-loss

ai-safety,A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increase inference time) without repetitive model training, hyperparameter tuning, or collecting additional data.

User: dlmacedo

classification deep-learning machine-learning open-set-recognition out-of-distribution-detection pytorch robust-machine-learning trustworthy-ai trustworthy-machine-learning uncertainty-estimation

dlmacedo / entropic-out-of-distribution-detection

ai-safety,A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference time) and detection without classification accuracy drop, hyperparameter tuning, or collecting additional data.

User: dlmacedo

pytorch deep-learning out-of-distribution-detection out-of-distribution machine-learning trustworthy-ai ai-safety anomaly-detection novelty-detection robust-machine-learning trustworthy-machine-learning ood ood-detection osr open-set-recognition open-set

dynaroars / neuralsat

ai-safety,DPLL(T)-based Verification tool for DNNs

Organization: dynaroars

abstraction adversarial-attacks ai-assurance ai-safety dnn-verification dpll robustness robustness-verification sat-solver software-verification

ezgikorkmaz / adversarial-reinforcement-learning

ai-safety,Reading list for adversarial perspective and robustness in deep reinforcement learning.

User: ezgikorkmaz

adversarial-attacks robust-machine-learning deep-reinforcement-learning adversarial-reinforcement-learning robust-reinforcement-learning ai-safety machine-learning-safety reinforcement-learning-safety safe-reinforcement-learning adversarial-policies

giskard-ai / awesome-ai-safety

ai-safety,📚 A curated list of papers & technical articles on AI Quality & Safety

Organization: giskard-ai

Home Page: https://giskard.ai

ai ai-alignment ai-safety artificial-intelligence llm llmops machine-learning ml mlops natural-language-processing

giskard-ai / giskard

ai-safety,🐢 Open-Source Evaluation & Testing for LLMs and ML models

Organization: giskard-ai

Home Page: https://docs.giskard.ai

mlops ml-validation ml-testing ai-testing ai-safety ml-safety llmops ethical-artificial-intelligence responsible-ai fairness-ai

google-research-datasets / aart-ai-safety-dataset

ai-safety,AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Organization: google-research-datasets

Home Page: https://arxiv.org/abs/2311.08592

ai-safety ml-fairness responsible-ai responsible-ml

hendrycks / ethics

ai-safety,Aligning AI With Shared Human Values (ICLR 2021)

User: hendrycks

ai-safety machine-ethics ethical-ai gpt-3 ml-safety

iqtlabs / daisybell

ai-safety,Scan your AI/ML models for problems before you put them into production.

Organization: iqtlabs

bias-correction bias-detection model-poison cybersecurity ai-alignment ai-assurance ai-safety

jakobovski / ai-safety-cheatsheet

ai-safety,A compilation of AI safety ideas, problems, and solutions.

User: jakobovski

agi artificial-intelligence ai-safety agi-safety alignment

jehumtine / lawlia

ai-safety,LAWLIA is an open-source computational legal framework designed to revolutionize legal reasoning and analysis. It combines the power of large language models with a structured syntactical grammar to facilitate precise legal assessments, truth values, and verdicts. LAWLIA is the future of computational jurisprudence

User: jehumtine

agents ai computational-law computational-linguistics large-language-models law legal-agent legal-framework legal-system legal-analysis

johnsnowlabs / langtest

ai-safety,Deliver safe & effective language models

Organization: johnsnowlabs

Home Page: http://langtest.org/

benchmarks ethics-in-ai large-language-models ml-safety ml-testing mlops model-assessment nlp responsible-ai llm-test

jphall663 / awesome-machine-learning-interpretability

ai-safety,A curated list of awesome responsible machine learning resources.

User: jphall663

fairness xai interpretability transparency machine-learning data-science python r awesome awesome-list

lancopku / avg-avg

ai-safety,[Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Organization: lancopku

ai-safety natural-language-processing ood-detection robust-machine-learning trustworthy-machine-learning

lancopku / dan

ai-safety,[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks

Organization: lancopku

ai-safety backdoor-attacks backdoor-defense natural-language-processing

lets-make-safe-ai / make-safe-ai

ai-safety,How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

User: lets-make-safe-ai

agi ai ai-safety artificial-general-intelligence artificial-intelligence ai-alignment

luanademi / toumei

ai-safety,An interpretability library for pytorch

User: luanademi

Home Page: https://luanademi.github.io/toumei/

interpretability python pytorch feature-visualization deep-learning transformer modularity ai-safety

mccaffary / agi-safety-governance-practices

ai-safety,Analysis of the survey "Towards best practices in AGI safety and governance: A survey of expert opinion"

User: mccaffary

ai-governance ai-safety artificial-intelligence artificial-intelligence-governance artificial-intelligence-safety expert-survey

megvii-research / fssd_ood_detection

ai-safety,Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)

Organization: megvii-research

ai-safety anomaly anomaly-detection ood-detection out-of-distribution-detection

microsoft / safenlp

ai-safety,Safety Score for Pre-Trained Language Models

Organization: microsoft

ai-safety fairness-ai nlp

moonwatcher-ai / moonwatcher

ai-safety,Evaluation & testing framework for computer vision models

Organization: moonwatcher-ai

Home Page: https://www.moonwatcher.ai/

ai-safety ai-security ethical-artificial-intelligence ml-safety ml-testing ml-validation mlops trustworthy-ai computer-vision

normster / llm_rules

ai-safety,RuLES: a benchmark for evaluating rule-following in language models

User: normster

Home Page: https://eecs.berkeley.edu/~normanmu/llm_rules

ai-safety ai-security gpt-4

ongov / ai-principles

ai-safety,Alpha principles for the ethical use of AI and Data Driven Technologies in Ontario | Proposition de principes pour une utilisation éthique des technologies axées sur les données en Ontario

Organization: ongov

ai ai-safety artifical-intelligence data-driven-decisions ethical-artificial-intelligence government machine-learning ml open-government

pair-code / farsight

ai-safety,In situ interactive widgets for responsible AI 🌱

Organization: pair-code

Home Page: https://pair-code.github.io/farsight/

ai ai-safety chatgpt chrome-extension gemini gemini-pro gpt-4 jupyter-notebook llm notebook

phelps-sg / llm-cooperation

ai-safety,Code and materials for the paper S. Phelps and Y. I. Russell, Investigating Emergent Goal-Like Behaviour in Large Language Models Using Experimental Economics, working paper, arXiv:2305.07970, May 2023

User: phelps-sg

economics experimental-economics gametheory gpt-3 llm prisoners-dilemma ai-safety ai-alignment behavioral-economics gpt-4

pku-alignment / beavertails

ai-safety,BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).

Organization: pku-alignment

Home Page: https://sites.google.com/view/pku-beavertails

ai-safety human-feedback human-feedback-data language-model large-language-model llm llms rlhf safe-rlhf safety beaver datasets gpt llama

pku-alignment / safe-rlhf

ai-safety,Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Organization: pku-alignment

Home Page: https://pku-beaver.github.io

ai-safety alpaca datasets deepspeed large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback

pku-yuangroup / hallucination-attack

ai-safety,Attack to induce LLMs within hallucinations

Organization: pku-yuangroup

Home Page: http://arxiv.org/abs/2310.01469

adversarial-attacks llm hallucinations machine-learning nlp llm-safety ai-safety deep-learning

qroa / qroa

ai-safety,QROA: A Black-Box Query-Response Optimization Attack on LLMs

User: qroa

adversarial-attacks ai-safety black-box-attacks black-box-optimization jailbreak llm q-learning reinforcement-learning

riceissa / aiwatch

ai-safety,Website to track people, organizations, and products (tools, websites, etc.) in AI safety

User: riceissa

Home Page: https://aiwatch.issarice.com/

aisafety ai-safety php database dataset data-portal ai-alignment mysql

ryoungj / toolemu

ai-safety,A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

User: ryoungj

Home Page: https://toolemu.com/

agent ai-safety language-agent language-model large-language-models prompt-engineering

safeailab / rain

ai-safety,[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

Organization: safeailab

Home Page: https://arxiv.org/abs/2309.07124

ai-safety alignment large-language-models

shengranhu / thought-cloning

ai-safety,[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

User: shengranhu

Home Page: https://www.shengranhu.com/ThoughtCloning/

ai-safety artificial-intelligence deep-learning imitation-learning reinforcement-learning pytorch

stampyai / stampy-ui

ai-safety,AI Safety Q&A web frontend

Organization: stampyai

Home Page: https://aisafety.info

ai-safety

tamlhp / awesome-privex

ai-safety,Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)

User: tamlhp

Home Page: https://awesome-privex.github.io/

awesome explainable-ai explanation model-explanation xai privacy-preserving-explainable-ai privacy-preserving-explanation privacy-preserving-model-explanation privacy-preserving-xai privex

tigerlab-ai / tiger

ai-safety,Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

User: tigerlab-ai

Home Page: https://www.tigerlab.ai

classification fine-tuning llm llm-training rag ai-safety data-augmentation large-language-models aisafety

tomekkorbak / pretraining-with-human-feedback

ai-safety,Code accompanying the paper Pretraining Language Models with Human Preferences

User: tomekkorbak

Home Page: https://arxiv.org/abs/2302.08582

ai-alignment ai-safety decision-transformers gpt language-models pretraining reinforcement-learning rlhf

vdlad / remarkable-robustness-of-llms

ai-safety,Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"

User: vdlad

Home Page: https://arxiv.org/pdf/2406.19384

ai-safety interpretability interpretability-and-explainability machine-learning

wesg52 / sparse-probing-paper

ai-safety,Sparse probing paper full code.

User: wesg52

Home Page: https://arxiv.org/abs/2305.01610

ai-alignment ai-safety interpretability mechanistic-interpretability

wesg52 / universal-neurons

ai-safety,Universal Neurons in GPT2 Language Models

User: wesg52

ai-safety interpretability llm mechanistic-interpretability

windvchen / diffattack

ai-safety,An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.

User: windvchen

ai-safety diffusion-models unrestricted-attacks adverarial-attacks transferable-attacks imperceptible-attacks diffusion-adversarial-attack

windvchen / vco-ap

ai-safety,A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.

User: windvchen

adversarial-attacks adversarial-patches ai-safety physical-adversarial-attacks physical-attacks object-detection oriented-object-detection remote-sensing

yardenas / la-mbda

ai-safety,LAMBDA is a model-based reinforcement learning agent that uses Bayesian world models for safe policy optimization

User: yardenas

model-based-reinforcement-learning ai-safety machine-learning reinforcement-learning constrained-optimization deep-learning safe-reinforcement-learning

yyy01 / pac

ai-safety,The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

User: yyy01

ai-safety data-contamination large-language-models membership-inference-attack nlp machine-learning

zhoumingyi / modelobfuscator

ai-safety,Code for our paper "Modelobfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems" that has been published by ISSTA'23

User: zhoumingyi

ai-safety deep-learning obfuscation

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.