CMU15721-Spring2024课程笔记

本文最后更新于 2025年7月22日晚上

some papers that worth to read:

// todo

Overview

storage model:
- n-ary: store all the attributes for a single tuple contiguously
- decomposition: store a single attribute for all tuples contiguously
- partition attributes across(PAX): hybrid, bertically partion attributes
  - using column chunks
open-source: parquet / orc / arrow
encoding:
- dictionary compression for column
- zstd for block compression
- zone maps / bloom filters for filters
nested data in columns:
- shredding
- length + presence

three optimizations:
- data parallelization(vectorization)
- task parallelization(multi-threading)
- code specialization(pre compile / JIT)
process model
- iterator model
- materialization model
- vectorized / batch model
  - may contain tuples that do not satisfy filters
    - solution: offset or bitmaps
processing direction
- top to bottom(pull)(iterator model)
  - easy to control output
  - additional overhead because ‘Next()’
- bottom to top(push)
  - allow tighter control
  - may not control intermediate result sizes
  - difficult to implement some operators (sort merge join)

笔记

#笔记 #CMU #数据库

CMU15721-Spring2024课程笔记

https://gentlecold.top/20241119/cmu15721-note/

作者

GentleCold

发布于

2024年11月19日

许可协议