GentleCold's Blog

pruning技术调研

Basic * struct / unstruct / semi-struct THINK: THINNER KEY CACHE BY QUERY-DRIVEN PRUNING * 结构化稀疏 hidden_size维度的稀疏，用mask * 双kv cache缓存(已剪枝缓存/未剪枝缓存) Mustafar: Promoting Unstructured Sparsity f

2025-09-08

笔记

#LLM #Pruning

以nano-vllm和qwen3为例详解大模型推理过程

源码仅1200行纯python，十分推荐观看： https://github.com/GeeeekExplorer/nano-vllm 另外关于vllm的逻辑： https://www.aleksagordic.com/blog/vllm 1. qwen3模型结构和推理过程(prefill) 1.1 分词器分词器的作用是将文本(str)编码为整数序列(list[int]) 需要预先训练

2025-09-02

笔记

#LLM

CMU10414-Fall2022课程笔记

课程笔记 softmax * 监督学习/无监督学习 * 假设函数/损失函数/优化方法 * 有些err函数是不可微分的，所以用softmax(激活函数，引入非线性层)->交叉熵(-log)作为损失函数 * 转换为优化问题，使用梯度下降/随机梯度下降设mmm为样本数，nnn为特征数，kkk为分类数 h(x)h(x)h(x)为假设函数，hy(x)h_y(x)hy(x)为在la

2025-06-19

笔记

#笔记 #CMU #深度学习

CS336-Spring2025课程笔记

课程笔记 overview * prefill: compute-bound / decode: memory bound * scaling laws: * tokenizer: https://tiktokenizer.vercel.app/ * byte pair encoding(BPE) resource counting * float32 / float16 /

2025-06-19

笔记

#笔记 #LLM

VLLM测试

1. 数据集 imdb影评情感分析数据集：http://ai.stanford.edu/~amaas/data/sentiment/ csv文件，格式类似如下： reviewsentimenttext…postivetext…negtive2. 测试使用模型：NousResearch/Hermes-3-Llama-3.1-8B 使用显卡：单张H800 模型最大上下文限制为(prompt

2025-06-19

实验

#VLLM

VLLM与大模型推理框架

VLLM VLLM v1 代码整体流程，代码版本v0.8.5 调度部分，先调度running队列，再调度waiting队列。 https://zhuanlan.zhihu.com/p/1908153627639551302 关于抢占，抢占只是释放block不再进行运算，实际等到根据LRU策略去替换block时才会真正抢占。 KV Cache 当前Q乘缓存的K，再乘缓存的V，得到

2025-05-21

笔记

#VLLM

Self-Tuning Query Scheduling论文浅读

论文为：Self-Tuning Query Scheduling for Analytical Workloads sigmod 2021 https://15721.courses.cs.cmu.edu/spring2024/papers/08-scheduling/wagner-sigmod21.pdf 1 Introduction 本文提出一种自适应的调度优化策略，在高负载下的查询分析

2024-12-25

论文阅读

Rusty

start from zero with rust! Road Map Begin: * https://course.rs/ * https://practice.course.rs/

2024-11-19

笔记

#笔记 #Rust

CMU15721-Spring2024课程笔记

some papers that worth to read: // todo Overview * data cubes -> data warehouses -> shared-disk -> lakehouse * ETL tool * push query to data / pull data to query * shared-nothing / shared-disk

2024-11-19

笔记

#笔记 #CMU #数据库

Amazon MemoryDB论文浅读

论文为：Amazon MemoryDB: A Fast and Durable Memory-First Cloud Database. SIGMOD-Companion ’24, June 9–15, 2024, Santiago, AA, Chile 1 Introduction 对于许多实时应用程序，如金融、广告和物联网（IoT）应用程序，快速响应时间至关重要，现代键值存储可以为每台机器

2024-09-17

论文阅读