GentleCold's Blog

Foyer技术要点分析

资料： * Foyer: A Hybrid Cache in Rust - Past, Present and Future * Foyer docs.rs API * Foyer GitHub README 链接： * https://blog.mrcroxx.com/posts/foyer-a-hybrid-cache-in-rust-past-present-and-futur

2026-05-08

笔记

#Rust #Cache #Storage

HaS论文调研

论文：HaS: Accelerating RAG through Homology-Aware Speculative Retrieval 版本：arXiv:2604.20452v1, 2026-04-22 1. 背景 HaS讨论的是RAG系统里的检索延迟问题。很多LLM推理优化关注prefill、decode、KV cache和attention kernel，但在真实RAG系统里，检索

2026-05-08

笔记

#LLM #RAG #Retrieval

InfoFlow KV论文调研

论文：InfoFlow KV: Information-Flow-Aware KV Recomputation for Long Context 版本：arXiv:2603.05353v1, 2026-03-05 1. 背景 InfoFlow KV讨论的是长上下文RAG推理里的KV cache预计算和选择性重计算问题。在RAG里，系统经常需要把大量检索文档拼到prompt前面。上下文可以达

2026-05-08

笔记

#KV Cache #LLM #Long Context

ScaleEvict论文调研

论文：ScaleEvict: Altruistic Eviction for RDMA-enabled Distributed Storage Engines 作者：Till Steinert, Muhammad El-Hindi, Tobias Ziegler, Viktor Leis, Carsten Binnig 发表：DaMoN’26, 2026-05-31 至 2026-06-05

2026-05-08

笔记

#Cache #RDMA #Distributed Storage

ASL论文调研

论文：Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference 版本：arXiv:2601.07667v2, 2026-04-16 1. 背景 ASL讨论的是长上下文LLM推理中的layer-wise token pruning问题。它的直接上下文是FastKV、GemFilter、PyramidInfer这类

2026-05-08

笔记

#KV Cache #LLM #Long Context

FastKV论文调研

论文：FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration 版本：arXiv:2502.01068v7, 2026-04-20 代码：https://github.com/dongwonjo/FastKV 1. 背景长上下文LLM推理的成本主要来自

2026-05-08

笔记

#KV Cache #LLM #Long Context

HTTP/1.1、HTTP/2 与 gRPC 原理笔记

1. HTTP/1.1 vs HTTP/2 1.1 HTTP/1.1 的核心瓶颈加载一个网页需要 HTML + 10个CSS + 20个JS：连接1: [请求HTML ]──[响应HTML ] 连接2: [请求CSS1 ]──[响应CSS1 ] 连接3: [请求CSS2 ]──[响应CSS2 ] ...（浏览器最多同时开6个TCP连接，其余排队等待）队头阻塞（Head-of-Line B

2026-04-14

笔记

#分布式系统 #gRPC #网络 #HTTP

分布式系统与AI基础设施笔记

1. Ray 与 Ray Data 1.1 Ray Ray 是一个分布式计算框架，专为 Python 设计，核心目标是让单机代码轻松扩展到集群。核心抽象： * Task：无状态函数，@ray.remote 装饰后异步并行执行 * Actor：有状态对象，分布式进程，维护内部状态 * Object Store：共享内存对象存储，跨进程/节点零拷贝传输 @ray.remote def p

2026-04-13

笔记

#分布式系统 #Ray #gRPC #微服务 #RAG #消息队列

从NVMe磁盘安装到GDS支持

1. 安装前准备首先检查哪些PCIe插槽是空的： sudo dmidecode -t slot 大概确定好要插的位置服务器关机、断电、拆机，将NVMe插入PCIe插槽 2. 磁盘初始化 lspci检查安装是否被识别：可见其型号为：Intel Corporation NVMe Datacenter SSD [Optane]，与GPU0最接近安装对应工具Intel mas工具

2026-04-10

Linux

#Linux #nvme

VLLM KV Connector解析

初始化 vllmConfig.__post_init__()会初始化KVTransferConfig，然后在scheduler/worker侧根据kv_connector类型实例化对应的connector（KVConnectorFactory.create_connector）对于offloading connector，首先初始化spec（OffloadingSpecFactory.crea

2026-03-09

笔记

#VLLM