Xiaoyu Zhang's Projects
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Acm_template
arm-neon
An MLIR-Based Ideas Landing Project
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Golang 算法竞赛模板库 | Solutions to Codeforces by Go 💭💡🎈
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendation, etc.
A CPU tool for benchmarking the peak of floating points
AlexeyAB-DarkNet源码解析
face recognition
Transformer related optimization, including BERT, GPT
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
GAN
www.giantpandacv.com
how to learn PyTorch and OneFlow
how to optimize some algorithm in cuda.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
paper implement
opencv
Keras-Semantic-Segmentation
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)