CMU15721-Spring2024课程笔记
本文最后更新于 2024年11月24日 凌晨
some papers that worth to read:
// todo
Overview
- data cubes -> data warehouses -> shared-disk -> lakehouse
- ETL tool
- push query to data / pull data to query
- shared-nothing / shared-disk
Data Formats
- storage model:
- n-ary: store all the attributes for a single tuple contiguously
- decomposition: store a single attribute for all tuples contiguously
- partition attributes across(PAX): hybrid, bertically partion attributes
- using column chunks
- open-source: parquet / orc / arrow
- encoding:
- dictionary compression for column
- zstd for block compression
- zone maps / bloom filters for filters
- nested data in columns:
- shredding
- length + presence
CMU15721-Spring2024课程笔记
https://gentlecold.top/20241119/cmu15721-note/