🍊柑橘 RSS 阅读器 - 鸿蒙首款原生 RSS 阅读器

源码仓库：https://github.com/cmu-db/noisepage

该项目于2023年在Github Archive，但我看有不少CMU的论文还是基于NoisePage上面改进，所以依然值得去剖析相关源码

编译环境

使用GCC9.3/Clang 8.0编译，CCCache作为缓存，测试系统为Ubuntu 20.04(Focal)，使用CMake + Ninja编译套件

JIT环境是LLVM的MCJIT(ORC JIT要等LLVM14以后才有)

使用Jenkinss作为CI

提供DockerFile用于测试

third_party(第三方依赖)

libpg_query 用于SQL解析，这没的说

libcuckoo 高性能HashTable库

Google的FlatBuffers，用于定义 Apache Arrow项目中的消息格式规范generated文件夹里面的代码由FlatBuffers的Compiler生成

而BW-Tree则是团队对于数据存储的实现，这里贴一段B+树和BW-Tree的不同：

特性	B+树	BW-Tree
并发机制	需加锁	无锁（基于CAS）
更新方式	原地修改	Delta节点追加 + 写时复制
崩溃恢复	较复杂	天然支持版本追踪与回滚
写放大	一定程度存在	减少写放大（无页面迁移）
查询效率（高并发）	可受锁影响	查询需遍历Delta链，视情况而定

utils

有三个文件夹，分别是execution, include, runner

runner

关于运行时（Runner）的基础配置

execution

运行环境配置，主要是关于LLVM的部分

从参数可以看出，支持打印AST和ByteCode

有一个名为TPL的Compiler

llvm::cl::OptionCategory TPL_OPTIONS_CATEGORY("TPL Compiler Options", "Options for controlling the TPL compilation process.");  // NOLINT
llvm::cl::opt<bool> PRINT_AST("print-ast", llvm::cl::desc("Print the programs AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRINT_TBC("print-tbc", llvm::cl::desc("Print the generated TPL Bytecode"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRETTY_PRINT("pretty-print", llvm::cl::desc("Pretty-print the source from the parsed AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> IS_SQL("sql", llvm::cl::desc("Is the input a SQL query?"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> TPCH("tpch", llvm::cl::desc("Should the TPCH database be loaded? Requires '-schema' and '-data' directories."), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> DATA_DIR("data", llvm::cl::desc("Where to find data files of tables to load"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> INPUT_FILE(llvm::cl::Positional, llvm::cl::desc("<input file>"), llvm::cl::init(""), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> OUTPUT_NAME("output-name", llvm::cl::desc("Print the output name"), llvm::cl::init("schema10"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> HANDLERS_PATH("handlers-path", llvm::cl::desc("Path to the bytecode handlers bitcode file"), llvm::cl::init("./bytecode_handlers_ir.bc"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT

在gen_opt_bc.cpp中，主要关于LLVM Bitcode的处理

// This executable reads the unoptimized bitcode file and:
// 1. Converts to an LLVM Module
// 2. Removes the static global variable
// 3. Modifies linkage types of all defined functions to LinkOnce
// 4. Cleans up function arguments
// 5. Writes out optimized module as bitcode file

有一个函数用于清除LLVM全局标记，我觉得可以记下

void RemoveGlobalUses(llvm::Module *module) {
  // When we created the original bitcode file, we forced all functions to be
  // generated by storing their address in a global variable. We delete this
  // variable now so the final binary can be made smaller by eliminating unused
  // ops.
  auto var = module->getGlobalVariable(GLOBAL_VAR_NAME);
  if (var != nullptr) {
    var->replaceAllUsesWith(llvm::UndefValue::get(var->getType()));
    var->eraseFromParent();
  }
  // Clang created a global variable holding all force-used items. Delete it.
  auto used = module->getGlobalVariable(LLVM_COMPILED_USED);
  if (used != nullptr) {
    used->eraseFromParent();
  }
}

在table_generator子文件夹下面是关于表的生成与读取

表的生成基于C++模板

还携带一个GenerateTestTables()的样例

test

下面有多个文件夹对应不同的测试，似乎使用CTest作为测试套件

binder用于检查SQL与执行计划的解析，包括了CTE的Dependency Graph和Struct Statement Test

TEST_F(BinderCorrectnessTest, SelectStatementComplexTest) {
  // Test regular table name
  BINDER_LOG_DEBUG("Parsing sql query");
  std::string select_sql =
      "SELECT A.A1, B.B2 FROM A INNER JOIN b ON a.a1 = b.b1 WHERE a1 < 100 "
      "GROUP BY A.a1, B.b2 HAVING a1 > 50 ORDER BY a1";
  auto parse_tree = parser::PostgresParser::BuildParseTree(select_sql);
  auto statement = parse_tree->GetStatements()[0];
  binder_->BindNameToNode(common::ManagedPointer(parse_tree), nullptr, nullptr);
  auto select_stmt = statement.CastManagedPointerTo<parser::SelectStatement>();
  EXPECT_EQ(0, select_stmt->GetDepth());
  // Check select_list
  BINDER_LOG_DEBUG("Checking select list");
  auto col_expr = select_stmt->GetSelectColumns()[0].CastManagedPointerTo<parser::ColumnValueExpression>();
  EXPECT_EQ(col_expr->GetDatabaseOid(), db_oid_);              // A.a1
  EXPECT_EQ(col_expr->GetTableOid(), table_a_oid_);            // A.a1
  EXPECT_EQ(col_expr->GetColumnOid(), catalog::col_oid_t(1));  // A.a1; columns are indexed from 1
  EXPECT_EQ(execution::sql::SqlTypeId::Integer, col_expr->GetReturnValueType());
  EXPECT_EQ(0, col_expr->GetDepth());

此外就不一一列举了

src(源码)

目录基本和Test差不多

接下来我挑选几个我感兴趣的部分看看

Binder

以访问者模式实现SQL的解析

namespace noisepage {
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::AggregateExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::CaseExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ColumnValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ComparisonExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConjunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConstantValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DefaultValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DerivedValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::FunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}

Execution

execution下面还有诸多子一级文件夹

vm文件夹下边有很多关于bytecode的文件(感觉是Noispage有自己的一套ByteCode体系?)

BytecodeEmitter的部分代码如下

void BytecodeEmitter::EmitDeref(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Deref1 || bytecode == Bytecode::Deref2 || bytecode == Bytecode::Deref4 ||
                       bytecode == Bytecode::Deref8,
                   "Bytecode is not a Deref code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitDerefN(LocalVar dest, LocalVar src, uint32_t len) {
  EmitAll(Bytecode::DerefN, dest, src, len);
}
void BytecodeEmitter::EmitAssign(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Assign1 || bytecode == Bytecode::Assign2 || bytecode == Bytecode::Assign4 ||
                       bytecode == Bytecode::Assign8,
                   "Bytecode is not an Assign code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitAssignImm1(LocalVar dest, int8_t val) { EmitAll(Bytecode::AssignImm1, dest, val); }
void BytecodeEmitter::EmitAssignImm2(LocalVar dest, int16_t val) { EmitAll(Bytecode::AssignImm2, dest, val); }
void BytecodeEmitter::EmitAssignImm4(LocalVar dest, int32_t val) { EmitAll(Bytecode::AssignImm4, dest, val); }
void BytecodeEmitter::EmitAssignImm8(LocalVar dest, int64_t val) { EmitAll(Bytecode::AssignImm8, dest, val); }
void BytecodeEmitter::EmitAssignImm4F(LocalVar dest, float val) { EmitAll(Bytecode::AssignImm4F, dest, val); }
void BytecodeEmitter::EmitAssignImm8F(LocalVar dest, double val) { EmitAll(Bytecode::AssignImm8F, dest, val); }
void BytecodeEmitter::EmitUnaryOp(Bytecode bytecode, LocalVar dest, LocalVar input) { EmitAll(bytecode, dest, input); }
void BytecodeEmitter::EmitBinaryOp(Bytecode bytecode, LocalVar dest, LocalVar lhs, LocalVar rhs) {
  EmitAll(bytecode, dest, lhs, rhs);
}

而在llvm_engine.cpp中,对于类型Type有明确的映射(我想也可以完成ByteCode往LLVM IR的映射?)

class LLVMEngine::TypeMap {
 public:
  explicit TypeMap(llvm::Module *module) : module_(module) {
    llvm::LLVMContext &ctx = module->getContext();
    type_map_["nil"] = llvm::Type::getVoidTy(ctx);
    type_map_["bool"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["int32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["int64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["int128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["uint8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["uint16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["uint32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["uint64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["uint128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["float32"] = llvm::Type::getFloatTy(ctx);
    type_map_["float64"] = llvm::Type::getDoubleTy(ctx);
    type_map_["string"] = llvm::Type::getInt8PtrTy(ctx);
  }
  /** No copying or moving this class. */
  DISALLOW_COPY_AND_MOVE(TypeMap);
  llvm::Type *VoidType() { return type_map_["nil"]; }
  llvm::Type *BoolType() { return type_map_["bool"]; }
  llvm::Type *Int8Type() { return type_map_["int8"]; }
  llvm::Type *Int16Type() { return type_map_["int16"]; }
  llvm::Type *Int32Type() { return type_map_["int32"]; }
  llvm::Type *Int64Type() { return type_map_["int64"]; }
  llvm::Type *UInt8Type() { return type_map_["uint8"]; }
  llvm::Type *UInt16Type() { return type_map_["uint16"]; }
  llvm::Type *UInt32Type() { return type_map_["uint32"]; }
  llvm::Type *UInt64Type() { return type_map_["uint64"]; }
  llvm::Type *Float32Type() { return type_map_["float32"]; }
  llvm::Type *Float64Type() { return type_map_["float64"]; }
  llvm::Type *StringType() { return type_map_["string"]; }
  llvm::Type *GetLLVMType(const ast::Type *type);
 private:
  // Given a non-primitive builtin type, convert it to an LLVM type
  llvm::Type *GetLLVMTypeForBuiltin(const ast::BuiltinType *builtin_type);
  // Given a struct type, convert it into an equivalent LLVM struct type
  llvm::StructType *GetLLVMStructType(const ast::StructType *struct_type);
  // Given a TPL function type, convert it into an equivalent LLVM function type
  llvm::FunctionType *GetLLVMFunctionType(const ast::FunctionType *func_type);
 private:
  llvm::Module *module_;
  std::unordered_map<std::string, llvm::Type *> type_map_;
};

从AST完成向LLVM Struct的构建

llvm::StructType *LLVMEngine::TypeMap::GetLLVMStructType(const ast::StructType *struct_type) {
  // Collect the fields here
  llvm::SmallVector<llvm::Type *, 8> fields;
  for (const auto &field : struct_type->GetAllFields()) {
    fields.push_back(GetLLVMType(field.type_));
  }
  return llvm::StructType::create(fields);
}

从下面关于LLVM的TargetMachine可以看到,要想向量化操作还是依赖编译器的自动向量化

LVMEngine::CompiledModuleBuilder::CompiledModuleBuilder(const CompilerOptions &options,
                                                         const BytecodeModule &tpl_module)
    : options_(options),
      tpl_module_(tpl_module),
      target_machine_(nullptr),
      context_(std::make_unique<llvm::LLVMContext>()),
      llvm_module_(nullptr),
      type_map_(nullptr) {
  //
  // We need to create a suitable TargetMachine for LLVM to before we can JIT
  // TPL programs. At the moment, we rely on LLVM to discover all CPU features
  // e.g., AVX2 or AVX512, and we make no assumptions about symbol relocations.
  //
  // TODO(pmenon): This may change with LLVM8 that comes with
  // TargetMachineBuilders
  // TODO(pmenon): Alter the flags as need be
  //
  const std::string target_triple = llvm::sys::getProcessTriple();
  {
    std::string error;
    auto *target = llvm::TargetRegistry::lookupTarget(target_triple, error);
    if (target == nullptr) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find target with target_triple {}", target_triple);
      return;
    }
    // Collect CPU features
    llvm::StringMap<bool> feature_map;
    if (bool success = llvm::sys::getHostCPUFeatures(feature_map); !success) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find all CPU features");
      return;
    }
    llvm::SubtargetFeatures target_features;
    for (const auto &entry : feature_map) {
      target_features.AddFeature(entry.getKey(), entry.getValue());
    }
    EXECUTION_LOG_TRACE("LLVM: Discovered CPU features: {}", target_features.getString());
    // Both relocation=PIC or JIT=true work. Use the latter for now.
    llvm::TargetOptions target_options;
    llvm::Optional<llvm::Reloc::Model> reloc;
    const llvm::CodeGenOpt::Level opt_level = llvm::CodeGenOpt::Aggressive;
    target_machine_.reset(target->createTargetMachine(target_triple, llvm::sys::getHostCPUName(),
                                                      target_features.getString(), target_options, reloc, {}, opt_level,
                                                      true));
    NOISEPAGE_ASSERT(target_machine_ != nullptr, "LLVM: Unable to find a suitable target machine!");
  }
  //
  // We've built a TargetMachine we use to generate machine code. Now, we
  // load the pre-compiled bytecode module containing all the TPL bytecode
  // logic. We add the functions we're about to compile into this module. This
  // module forms the unit of JIT.
  //
  {
    auto memory_buffer = llvm::MemoryBuffer::getFile(GetEngineSettings()->GetBytecodeHandlersBcPath());
    if (auto error = memory_buffer.getError()) {
      EXECUTION_LOG_ERROR("There was an error loading the handler bytecode: {}", error.message());
    }
    auto module = llvm::parseBitcodeFile(*(memory_buffer.get()), *context_);
    if (!module) {
      auto error = llvm::toString(module.takeError());
      EXECUTION_LOG_ERROR("{}", error);
      throw std::runtime_error(error);
    }
    llvm_module_ = std::move(module.get());
    llvm_module_->setModuleIdentifier(tpl_module.GetName());
    llvm_module_->setSourceFileName(tpl_module.GetName() + ".tpl");
    llvm_module_->setDataLayout(target_machine_->createDataLayout());
    llvm_module_->setTargetTriple(target_triple);
  }
  type_map_ = std::make_unique<TypeMap>(llvm_module_.get());
}

不太理解为什么还需要重新生成CFG,也许是发现函数过于复杂要拆分?我看更多是关于分支和LLVM Block联系的相关代码

void LLVMEngine::CompiledModuleBuilder::BuildSimpleCFG(const FunctionInfo &func_info,
                                                       std::map<std::size_t, llvm::BasicBlock *> *blocks) {
  // Before we can generate LLVM IR, we need to build a control-flow graph (CFG) for the function.
  // We do this construction directly from the TPL bytecode using a vanilla DFS and produce an
  // ordered map ('blocks') from bytecode position to an LLVM basic block. Each entry in the map
  // indicates the start of a basic block.
  // We use this vector as a stack for DFS traversal
  llvm::SmallVector<std::size_t, 16> bb_begin_positions = {0};
  for (auto iter = tpl_module_.GetBytecodeForFunction(func_info); !bb_begin_positions.empty();) {
    std::size_t begin_pos = bb_begin_positions.back();
    bb_begin_positions.pop_back();
    // We're at what we think is the start of a new basic block. Scan it until we find a terminal
    // instruction. Once we do,
    for (iter.SetPosition(begin_pos); !iter.Done(); iter.Advance()) {
      Bytecode bytecode = iter.CurrentBytecode();
      // If the bytecode isn't a terminal for the block, continue until we reach one
      if (!Bytecodes::IsTerminal(bytecode)) {
        continue;
      }
      // Return?
      if (Bytecodes::IsReturn(bytecode)) {
        break;
      }
      // Unconditional branch?
      if (Bytecodes::IsUnconditionalJump(bytecode)) {
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 0) + iter.GetJumpOffsetOperand(0);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          (*blocks)[branch_target_pos] = nullptr;
          bb_begin_positions.push_back(branch_target_pos);
        }
        break;
      }
      // Conditional branch?
      if (Bytecodes::IsConditionalJump(bytecode)) {
        std::size_t fallthrough_pos = iter.GetPosition() + iter.CurrentBytecodeSize();
        if (blocks->find(fallthrough_pos) == blocks->end()) {
          bb_begin_positions.push_back(fallthrough_pos);
          (*blocks)[fallthrough_pos] = nullptr;
        }
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 1) + iter.GetJumpOffsetOperand(1);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          bb_begin_positions.push_back(branch_target_pos);
          (*blocks)[branch_target_pos] = nullptr;
        }
        break;
      }
    }
  }
}

optimize()处放上LLVM的相关优化

void LLVMEngine::CompiledModuleBuilder::Optimize() {
  llvm::legacy::FunctionPassManager function_passes(llvm_module_.get());
  // Add the appropriate TargetTransformInfo.
  function_passes.add(llvm::createTargetTransformInfoWrapperPass(target_machine_->getTargetIRAnalysis()));
  // Build up optimization pipeline.
  llvm::PassManagerBuilder pm_builder;
  uint32_t opt_level = 3;
  uint32_t size_opt_level = 0;
  bool disable_inline_hot_call_site = false;
  pm_builder.OptLevel = opt_level;
  pm_builder.Inliner = llvm::createFunctionInliningPass(opt_level, size_opt_level, disable_inline_hot_call_site);
  pm_builder.populateFunctionPassManager(function_passes);
  // Add custom passes. Hand-selected based on empirical evaluation.
  function_passes.add(llvm::createInstructionCombiningPass());
  function_passes.add(llvm::createReassociatePass());
  function_passes.add(llvm::createGVNPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  function_passes.add(llvm::createAggressiveDCEPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  // Run optimization passes on all functions.
  function_passes.doInitialization();
  for (llvm::Function &func : *llvm_module_) {
    function_passes.run(func);
  }
  function_passes.doFinalization();
}

util下面的cpu_info用于读取机器信息(但感觉只是用来确定CPU三级缓存大小)

void CpuInfo::InitCacheInfo() {
#ifdef __APPLE__
  // Lookup cache sizes.
  std::size_t len = 0;
  sysctlbyname("hw.cachesize", nullptr, &len, nullptr, 0);
  auto data = std::make_unique<uint64_t[]>(len);
  sysctlbyname("hw.cachesize", data.get(), &len, nullptr, 0);
  NOISEPAGE_ASSERT(len / sizeof(uint64_t) >= 3, "Expected three levels of cache!");
  // Copy data
  for (uint32_t idx = 0; idx < K_NUM_CACHE_LEVELS; idx++) {
    cache_sizes_[idx] = data[idx];
  }
  // Lookup cache line sizes.
  std::size_t linesize;
  std::size_t sizeof_linesize = sizeof(linesize);
  sysctlbyname("hw.cachelinesize", &linesize, &sizeof_linesize, nullptr, 0);
  for (auto &cache_line_size : cache_line_sizes_) {
    cache_line_size = linesize;
  }
#else
  // Use sysconf to determine cache sizes.
  cache_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_SIZE));
  cache_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_SIZE));
  cache_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_SIZE));
  cache_line_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_LINESIZE));
  cache_line_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_LINESIZE));
  cache_line_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_LINESIZE));
#endif
}

一个暂时无法使用的csv_reader.cpp

// #include "execution/util/csv_reader.h" Fix later.

file.cpp直接通过POSIX接口写入文件

int32_t File::Write(const std::byte *data, std::size_t len) const {
  NOISEPAGE_ASSERT(IsOpen(), "File must be open before reading");
  return HANDLE_EINTR(write(fd_, data, len));
}
int64_t File::Seek(File::Whence whence, int64_t offset) const {
  static_assert(sizeof(int64_t) == sizeof(off_t), "off_t must be 64 bits");
  return lseek(fd_, static_cast<off_t>(offset), static_cast<int32_t>(whence));
}
bool File::Flush() const {
#if defined(OS_LINUX)
  return HANDLE_EINTR(fdatasync(fd_)) == 0;
#else
  return HANDLE_EINTR(fsync(fd_)) == 0;
#endif
}

而在vector_util.cpp中,则是关于AVX向量化的操作,也就是向量化操作并不是在JIT中完成,而是实现做好向量化后再塞入JIT引擎

uint32_t VectorUtil::BitVectorToSelectionVectorDenseAvX2(const uint64_t *bit_vector, uint32_t num_bits,
                                                         sel_t *sel_vector) {
  // Vector of '8's = [8,8,8,8,8,8,8]
  const auto eight = _mm_set1_epi16(8);
  // Selection vector write index
  auto idx = _mm_set1_epi16(0);
  // Selection vector size
  uint32_t k = 0;
  const uint32_t num_words = common::MathUtil::DivRoundUp(num_bits, 64);
  for (uint32_t i = 0; i < num_words; i++) {
    uint64_t word = bit_vector[i];
    for (uint32_t j = 0; j < 8; j++) {
      const auto mask = static_cast<uint8_t>(word);
      word >>= 8u;
      const __m128i match_pos_scaled =
          _mm_loadl_epi64(reinterpret_cast<const __m128i *>(&simd::K8_BIT_MATCH_LUT[mask]));
      const __m128i match_pos = _mm_cvtepi8_epi16(match_pos_scaled);
      const __m128i pos_vec = _mm_add_epi16(idx, match_pos);
      idx = _mm_add_epi16(idx, eight);
      _mm_storeu_si128(reinterpret_cast<__m128i *>(sel_vector + k), pos_vec);
      k += BitUtil::CountPopulation(static_cast<uint32_t>(mask));
    }
  }
  return k;
}

在sql文件夹下面的memorypool.cpp,用的是std::calloc和std::malloc分配内存,可在CMakeLists.txt中我记得是有开启jemalloc的,似乎并没有用上

void *MemoryPool::AllocateAligned(const std::size_t size, const std::size_t alignment, const bool clear) {
  void *buf = nullptr;
  if (size >= mmap_threshold.load(std::memory_order_relaxed)) {
    buf = util::Memory::MallocHuge(size, true);
    NOISEPAGE_ASSERT(buf != nullptr, "Null memory pointer");
    // No need to clear memory, guaranteed on Linux
  } else {
    if (alignment < MIN_MALLOC_ALIGNMENT) {
      if (clear) {
        buf = std::calloc(size, 1);
      } else {
        buf = std::malloc(size);
      }
    } else {
      buf = util::Memory::MallocAligned(size, alignment);
      if (clear) {
        std::memset(buf, 0, size);
      }
    }
  }

在self_driving文件夹下,有关于论文中提到的冷热切换的实现,而结合Script代码来看,似乎模型可以通过Python进行调整

关于Arrow

storage里面基本都是关于Arrow的

这玩意，又是Arrow，又是LLVM？

数据在每个block被组织成PAX（行列混存）格式，一个tuple的所有列的数据都在这个block，每个block有一个layout对象，由下面3个部分组成

总结

代码质量很高，不懂得地方看下注释基本都能反应过来

但由于LLVM使用的版本太老,不确定程序是否还能运行起来

源码仓库：https://github.com/cmu-db/noisepage

该项目于2023年在Github Archive，但我看有不少CMU的论文还是基于NoisePage上面改进，所以依然值得去剖析相关源码

编译环境

使用GCC9.3/Clang 8.0编译，CCCache作为缓存，测试系统为Ubuntu 20.04(Focal)，使用CMake + Ninja编译套件

JIT环境是LLVM的MCJIT(ORC JIT要等LLVM14以后才有)

使用Jenkinss作为CI

提供DockerFile用于测试

third_party(第三方依赖)

libpg_query 用于SQL解析，这没的说

libcuckoo 高性能HashTable库

Google的FlatBuffers，用于定义 Apache Arrow项目中的消息格式规范generated文件夹里面的代码由FlatBuffers的Compiler生成

而BW-Tree则是团队对于数据存储的实现，这里贴一段B+树和BW-Tree的不同：

特性	B+树	BW-Tree
并发机制	需加锁	无锁（基于CAS）
更新方式	原地修改	Delta节点追加 + 写时复制
崩溃恢复	较复杂	天然支持版本追踪与回滚
写放大	一定程度存在	减少写放大（无页面迁移）
查询效率（高并发）	可受锁影响	查询需遍历Delta链，视情况而定

utils

有三个文件夹，分别是execution, include, runner

runner

关于运行时（Runner）的基础配置

execution

运行环境配置，主要是关于LLVM的部分

从参数可以看出，支持打印AST和ByteCode

有一个名为TPL的Compiler

llvm::cl::OptionCategory TPL_OPTIONS_CATEGORY("TPL Compiler Options", "Options for controlling the TPL compilation process.");  // NOLINT
llvm::cl::opt<bool> PRINT_AST("print-ast", llvm::cl::desc("Print the programs AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRINT_TBC("print-tbc", llvm::cl::desc("Print the generated TPL Bytecode"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRETTY_PRINT("pretty-print", llvm::cl::desc("Pretty-print the source from the parsed AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> IS_SQL("sql", llvm::cl::desc("Is the input a SQL query?"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> TPCH("tpch", llvm::cl::desc("Should the TPCH database be loaded? Requires '-schema' and '-data' directories."), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> DATA_DIR("data", llvm::cl::desc("Where to find data files of tables to load"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> INPUT_FILE(llvm::cl::Positional, llvm::cl::desc("<input file>"), llvm::cl::init(""), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> OUTPUT_NAME("output-name", llvm::cl::desc("Print the output name"), llvm::cl::init("schema10"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> HANDLERS_PATH("handlers-path", llvm::cl::desc("Path to the bytecode handlers bitcode file"), llvm::cl::init("./bytecode_handlers_ir.bc"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT

在gen_opt_bc.cpp中，主要关于LLVM Bitcode的处理

// This executable reads the unoptimized bitcode file and:
// 1. Converts to an LLVM Module
// 2. Removes the static global variable
// 3. Modifies linkage types of all defined functions to LinkOnce
// 4. Cleans up function arguments
// 5. Writes out optimized module as bitcode file

有一个函数用于清除LLVM全局标记，我觉得可以记下

void RemoveGlobalUses(llvm::Module *module) {
  // When we created the original bitcode file, we forced all functions to be
  // generated by storing their address in a global variable. We delete this
  // variable now so the final binary can be made smaller by eliminating unused
  // ops.
  auto var = module->getGlobalVariable(GLOBAL_VAR_NAME);
  if (var != nullptr) {
    var->replaceAllUsesWith(llvm::UndefValue::get(var->getType()));
    var->eraseFromParent();
  }
  // Clang created a global variable holding all force-used items. Delete it.
  auto used = module->getGlobalVariable(LLVM_COMPILED_USED);
  if (used != nullptr) {
    used->eraseFromParent();
  }
}

在table_generator子文件夹下面是关于表的生成与读取

表的生成基于C++模板

还携带一个GenerateTestTables()的样例

test

下面有多个文件夹对应不同的测试，似乎使用CTest作为测试套件

binder用于检查SQL与执行计划的解析，包括了CTE的Dependency Graph和Struct Statement Test

TEST_F(BinderCorrectnessTest, SelectStatementComplexTest) {
  // Test regular table name
  BINDER_LOG_DEBUG("Parsing sql query");
  std::string select_sql =
      "SELECT A.A1, B.B2 FROM A INNER JOIN b ON a.a1 = b.b1 WHERE a1 < 100 "
      "GROUP BY A.a1, B.b2 HAVING a1 > 50 ORDER BY a1";
  auto parse_tree = parser::PostgresParser::BuildParseTree(select_sql);
  auto statement = parse_tree->GetStatements()[0];
  binder_->BindNameToNode(common::ManagedPointer(parse_tree), nullptr, nullptr);
  auto select_stmt = statement.CastManagedPointerTo<parser::SelectStatement>();
  EXPECT_EQ(0, select_stmt->GetDepth());
  // Check select_list
  BINDER_LOG_DEBUG("Checking select list");
  auto col_expr = select_stmt->GetSelectColumns()[0].CastManagedPointerTo<parser::ColumnValueExpression>();
  EXPECT_EQ(col_expr->GetDatabaseOid(), db_oid_);              // A.a1
  EXPECT_EQ(col_expr->GetTableOid(), table_a_oid_);            // A.a1
  EXPECT_EQ(col_expr->GetColumnOid(), catalog::col_oid_t(1));  // A.a1; columns are indexed from 1
  EXPECT_EQ(execution::sql::SqlTypeId::Integer, col_expr->GetReturnValueType());
  EXPECT_EQ(0, col_expr->GetDepth());

此外就不一一列举了

src(源码)

目录基本和Test差不多

接下来我挑选几个我感兴趣的部分看看

Binder

以访问者模式实现SQL的解析

namespace noisepage {
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::AggregateExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::CaseExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ColumnValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ComparisonExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConjunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConstantValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DefaultValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DerivedValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::FunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}

Execution

execution下面还有诸多子一级文件夹

vm文件夹下边有很多关于bytecode的文件(感觉是Noispage有自己的一套ByteCode体系?)

BytecodeEmitter的部分代码如下

void BytecodeEmitter::EmitDeref(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Deref1 || bytecode == Bytecode::Deref2 || bytecode == Bytecode::Deref4 ||
                       bytecode == Bytecode::Deref8,
                   "Bytecode is not a Deref code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitDerefN(LocalVar dest, LocalVar src, uint32_t len) {
  EmitAll(Bytecode::DerefN, dest, src, len);
}
void BytecodeEmitter::EmitAssign(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Assign1 || bytecode == Bytecode::Assign2 || bytecode == Bytecode::Assign4 ||
                       bytecode == Bytecode::Assign8,
                   "Bytecode is not an Assign code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitAssignImm1(LocalVar dest, int8_t val) { EmitAll(Bytecode::AssignImm1, dest, val); }
void BytecodeEmitter::EmitAssignImm2(LocalVar dest, int16_t val) { EmitAll(Bytecode::AssignImm2, dest, val); }
void BytecodeEmitter::EmitAssignImm4(LocalVar dest, int32_t val) { EmitAll(Bytecode::AssignImm4, dest, val); }
void BytecodeEmitter::EmitAssignImm8(LocalVar dest, int64_t val) { EmitAll(Bytecode::AssignImm8, dest, val); }
void BytecodeEmitter::EmitAssignImm4F(LocalVar dest, float val) { EmitAll(Bytecode::AssignImm4F, dest, val); }
void BytecodeEmitter::EmitAssignImm8F(LocalVar dest, double val) { EmitAll(Bytecode::AssignImm8F, dest, val); }
void BytecodeEmitter::EmitUnaryOp(Bytecode bytecode, LocalVar dest, LocalVar input) { EmitAll(bytecode, dest, input); }
void BytecodeEmitter::EmitBinaryOp(Bytecode bytecode, LocalVar dest, LocalVar lhs, LocalVar rhs) {
  EmitAll(bytecode, dest, lhs, rhs);
}

而在llvm_engine.cpp中,对于类型Type有明确的映射(我想也可以完成ByteCode往LLVM IR的映射?)

class LLVMEngine::TypeMap {
 public:
  explicit TypeMap(llvm::Module *module) : module_(module) {
    llvm::LLVMContext &ctx = module->getContext();
    type_map_["nil"] = llvm::Type::getVoidTy(ctx);
    type_map_["bool"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["int32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["int64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["int128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["uint8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["uint16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["uint32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["uint64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["uint128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["float32"] = llvm::Type::getFloatTy(ctx);
    type_map_["float64"] = llvm::Type::getDoubleTy(ctx);
    type_map_["string"] = llvm::Type::getInt8PtrTy(ctx);
  }
  /** No copying or moving this class. */
  DISALLOW_COPY_AND_MOVE(TypeMap);
  llvm::Type *VoidType() { return type_map_["nil"]; }
  llvm::Type *BoolType() { return type_map_["bool"]; }
  llvm::Type *Int8Type() { return type_map_["int8"]; }
  llvm::Type *Int16Type() { return type_map_["int16"]; }
  llvm::Type *Int32Type() { return type_map_["int32"]; }
  llvm::Type *Int64Type() { return type_map_["int64"]; }
  llvm::Type *UInt8Type() { return type_map_["uint8"]; }
  llvm::Type *UInt16Type() { return type_map_["uint16"]; }
  llvm::Type *UInt32Type() { return type_map_["uint32"]; }
  llvm::Type *UInt64Type() { return type_map_["uint64"]; }
  llvm::Type *Float32Type() { return type_map_["float32"]; }
  llvm::Type *Float64Type() { return type_map_["float64"]; }
  llvm::Type *StringType() { return type_map_["string"]; }
  llvm::Type *GetLLVMType(const ast::Type *type);
 private:
  // Given a non-primitive builtin type, convert it to an LLVM type
  llvm::Type *GetLLVMTypeForBuiltin(const ast::BuiltinType *builtin_type);
  // Given a struct type, convert it into an equivalent LLVM struct type
  llvm::StructType *GetLLVMStructType(const ast::StructType *struct_type);
  // Given a TPL function type, convert it into an equivalent LLVM function type
  llvm::FunctionType *GetLLVMFunctionType(const ast::FunctionType *func_type);
 private:
  llvm::Module *module_;
  std::unordered_map<std::string, llvm::Type *> type_map_;
};

从AST完成向LLVM Struct的构建

llvm::StructType *LLVMEngine::TypeMap::GetLLVMStructType(const ast::StructType *struct_type) {
  // Collect the fields here
  llvm::SmallVector<llvm::Type *, 8> fields;
  for (const auto &field : struct_type->GetAllFields()) {
    fields.push_back(GetLLVMType(field.type_));
  }
  return llvm::StructType::create(fields);
}

从下面关于LLVM的TargetMachine可以看到,要想向量化操作还是依赖编译器的自动向量化

LVMEngine::CompiledModuleBuilder::CompiledModuleBuilder(const CompilerOptions &options,
                                                         const BytecodeModule &tpl_module)
    : options_(options),
      tpl_module_(tpl_module),
      target_machine_(nullptr),
      context_(std::make_unique<llvm::LLVMContext>()),
      llvm_module_(nullptr),
      type_map_(nullptr) {
  //
  // We need to create a suitable TargetMachine for LLVM to before we can JIT
  // TPL programs. At the moment, we rely on LLVM to discover all CPU features
  // e.g., AVX2 or AVX512, and we make no assumptions about symbol relocations.
  //
  // TODO(pmenon): This may change with LLVM8 that comes with
  // TargetMachineBuilders
  // TODO(pmenon): Alter the flags as need be
  //
  const std::string target_triple = llvm::sys::getProcessTriple();
  {
    std::string error;
    auto *target = llvm::TargetRegistry::lookupTarget(target_triple, error);
    if (target == nullptr) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find target with target_triple {}", target_triple);
      return;
    }
    // Collect CPU features
    llvm::StringMap<bool> feature_map;
    if (bool success = llvm::sys::getHostCPUFeatures(feature_map); !success) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find all CPU features");
      return;
    }
    llvm::SubtargetFeatures target_features;
    for (const auto &entry : feature_map) {
      target_features.AddFeature(entry.getKey(), entry.getValue());
    }
    EXECUTION_LOG_TRACE("LLVM: Discovered CPU features: {}", target_features.getString());
    // Both relocation=PIC or JIT=true work. Use the latter for now.
    llvm::TargetOptions target_options;
    llvm::Optional<llvm::Reloc::Model> reloc;
    const llvm::CodeGenOpt::Level opt_level = llvm::CodeGenOpt::Aggressive;
    target_machine_.reset(target->createTargetMachine(target_triple, llvm::sys::getHostCPUName(),
                                                      target_features.getString(), target_options, reloc, {}, opt_level,
                                                      true));
    NOISEPAGE_ASSERT(target_machine_ != nullptr, "LLVM: Unable to find a suitable target machine!");
  }
  //
  // We've built a TargetMachine we use to generate machine code. Now, we
  // load the pre-compiled bytecode module containing all the TPL bytecode
  // logic. We add the functions we're about to compile into this module. This
  // module forms the unit of JIT.
  //
  {
    auto memory_buffer = llvm::MemoryBuffer::getFile(GetEngineSettings()->GetBytecodeHandlersBcPath());
    if (auto error = memory_buffer.getError()) {
      EXECUTION_LOG_ERROR("There was an error loading the handler bytecode: {}", error.message());
    }
    auto module = llvm::parseBitcodeFile(*(memory_buffer.get()), *context_);
    if (!module) {
      auto error = llvm::toString(module.takeError());
      EXECUTION_LOG_ERROR("{}", error);
      throw std::runtime_error(error);
    }
    llvm_module_ = std::move(module.get());
    llvm_module_->setModuleIdentifier(tpl_module.GetName());
    llvm_module_->setSourceFileName(tpl_module.GetName() + ".tpl");
    llvm_module_->setDataLayout(target_machine_->createDataLayout());
    llvm_module_->setTargetTriple(target_triple);
  }
  type_map_ = std::make_unique<TypeMap>(llvm_module_.get());
}

不太理解为什么还需要重新生成CFG,也许是发现函数过于复杂要拆分?我看更多是关于分支和LLVM Block联系的相关代码

void LLVMEngine::CompiledModuleBuilder::BuildSimpleCFG(const FunctionInfo &func_info,
                                                       std::map<std::size_t, llvm::BasicBlock *> *blocks) {
  // Before we can generate LLVM IR, we need to build a control-flow graph (CFG) for the function.
  // We do this construction directly from the TPL bytecode using a vanilla DFS and produce an
  // ordered map ('blocks') from bytecode position to an LLVM basic block. Each entry in the map
  // indicates the start of a basic block.
  // We use this vector as a stack for DFS traversal
  llvm::SmallVector<std::size_t, 16> bb_begin_positions = {0};
  for (auto iter = tpl_module_.GetBytecodeForFunction(func_info); !bb_begin_positions.empty();) {
    std::size_t begin_pos = bb_begin_positions.back();
    bb_begin_positions.pop_back();
    // We're at what we think is the start of a new basic block. Scan it until we find a terminal
    // instruction. Once we do,
    for (iter.SetPosition(begin_pos); !iter.Done(); iter.Advance()) {
      Bytecode bytecode = iter.CurrentBytecode();
      // If the bytecode isn't a terminal for the block, continue until we reach one
      if (!Bytecodes::IsTerminal(bytecode)) {
        continue;
      }
      // Return?
      if (Bytecodes::IsReturn(bytecode)) {
        break;
      }
      // Unconditional branch?
      if (Bytecodes::IsUnconditionalJump(bytecode)) {
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 0) + iter.GetJumpOffsetOperand(0);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          (*blocks)[branch_target_pos] = nullptr;
          bb_begin_positions.push_back(branch_target_pos);
        }
        break;
      }
      // Conditional branch?
      if (Bytecodes::IsConditionalJump(bytecode)) {
        std::size_t fallthrough_pos = iter.GetPosition() + iter.CurrentBytecodeSize();
        if (blocks->find(fallthrough_pos) == blocks->end()) {
          bb_begin_positions.push_back(fallthrough_pos);
          (*blocks)[fallthrough_pos] = nullptr;
        }
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 1) + iter.GetJumpOffsetOperand(1);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          bb_begin_positions.push_back(branch_target_pos);
          (*blocks)[branch_target_pos] = nullptr;
        }
        break;
      }
    }
  }
}

optimize()处放上LLVM的相关优化

void LLVMEngine::CompiledModuleBuilder::Optimize() {
  llvm::legacy::FunctionPassManager function_passes(llvm_module_.get());
  // Add the appropriate TargetTransformInfo.
  function_passes.add(llvm::createTargetTransformInfoWrapperPass(target_machine_->getTargetIRAnalysis()));
  // Build up optimization pipeline.
  llvm::PassManagerBuilder pm_builder;
  uint32_t opt_level = 3;
  uint32_t size_opt_level = 0;
  bool disable_inline_hot_call_site = false;
  pm_builder.OptLevel = opt_level;
  pm_builder.Inliner = llvm::createFunctionInliningPass(opt_level, size_opt_level, disable_inline_hot_call_site);
  pm_builder.populateFunctionPassManager(function_passes);
  // Add custom passes. Hand-selected based on empirical evaluation.
  function_passes.add(llvm::createInstructionCombiningPass());
  function_passes.add(llvm::createReassociatePass());
  function_passes.add(llvm::createGVNPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  function_passes.add(llvm::createAggressiveDCEPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  // Run optimization passes on all functions.
  function_passes.doInitialization();
  for (llvm::Function &func : *llvm_module_) {
    function_passes.run(func);
  }
  function_passes.doFinalization();
}

util下面的cpu_info用于读取机器信息(但感觉只是用来确定CPU三级缓存大小)

void CpuInfo::InitCacheInfo() {
#ifdef __APPLE__
  // Lookup cache sizes.
  std::size_t len = 0;
  sysctlbyname("hw.cachesize", nullptr, &len, nullptr, 0);
  auto data = std::make_unique<uint64_t[]>(len);
  sysctlbyname("hw.cachesize", data.get(), &len, nullptr, 0);
  NOISEPAGE_ASSERT(len / sizeof(uint64_t) >= 3, "Expected three levels of cache!");
  // Copy data
  for (uint32_t idx = 0; idx < K_NUM_CACHE_LEVELS; idx++) {
    cache_sizes_[idx] = data[idx];
  }
  // Lookup cache line sizes.
  std::size_t linesize;
  std::size_t sizeof_linesize = sizeof(linesize);
  sysctlbyname("hw.cachelinesize", &linesize, &sizeof_linesize, nullptr, 0);
  for (auto &cache_line_size : cache_line_sizes_) {
    cache_line_size = linesize;
  }
#else
  // Use sysconf to determine cache sizes.
  cache_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_SIZE));
  cache_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_SIZE));
  cache_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_SIZE));
  cache_line_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_LINESIZE));
  cache_line_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_LINESIZE));
  cache_line_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_LINESIZE));
#endif
}

一个暂时无法使用的csv_reader.cpp

// #include "execution/util/csv_reader.h" Fix later.

file.cpp直接通过POSIX接口写入文件

int32_t File::Write(const std::byte *data, std::size_t len) const {
  NOISEPAGE_ASSERT(IsOpen(), "File must be open before reading");
  return HANDLE_EINTR(write(fd_, data, len));
}
int64_t File::Seek(File::Whence whence, int64_t offset) const {
  static_assert(sizeof(int64_t) == sizeof(off_t), "off_t must be 64 bits");
  return lseek(fd_, static_cast<off_t>(offset), static_cast<int32_t>(whence));
}
bool File::Flush() const {
#if defined(OS_LINUX)
  return HANDLE_EINTR(fdatasync(fd_)) == 0;
#else
  return HANDLE_EINTR(fsync(fd_)) == 0;
#endif
}

而在vector_util.cpp中,则是关于AVX向量化的操作,也就是向量化操作并不是在JIT中完成,而是实现做好向量化后再塞入JIT引擎

uint32_t VectorUtil::BitVectorToSelectionVectorDenseAvX2(const uint64_t *bit_vector, uint32_t num_bits,
                                                         sel_t *sel_vector) {
  // Vector of '8's = [8,8,8,8,8,8,8]
  const auto eight = _mm_set1_epi16(8);
  // Selection vector write index
  auto idx = _mm_set1_epi16(0);
  // Selection vector size
  uint32_t k = 0;
  const uint32_t num_words = common::MathUtil::DivRoundUp(num_bits, 64);
  for (uint32_t i = 0; i < num_words; i++) {
    uint64_t word = bit_vector[i];
    for (uint32_t j = 0; j < 8; j++) {
      const auto mask = static_cast<uint8_t>(word);
      word >>= 8u;
      const __m128i match_pos_scaled =
          _mm_loadl_epi64(reinterpret_cast<const __m128i *>(&simd::K8_BIT_MATCH_LUT[mask]));
      const __m128i match_pos = _mm_cvtepi8_epi16(match_pos_scaled);
      const __m128i pos_vec = _mm_add_epi16(idx, match_pos);
      idx = _mm_add_epi16(idx, eight);
      _mm_storeu_si128(reinterpret_cast<__m128i *>(sel_vector + k), pos_vec);
      k += BitUtil::CountPopulation(static_cast<uint32_t>(mask));
    }
  }
  return k;
}

在sql文件夹下面的memorypool.cpp,用的是std::calloc和std::malloc分配内存,可在CMakeLists.txt中我记得是有开启jemalloc的,似乎并没有用上

void *MemoryPool::AllocateAligned(const std::size_t size, const std::size_t alignment, const bool clear) {
  void *buf = nullptr;
  if (size >= mmap_threshold.load(std::memory_order_relaxed)) {
    buf = util::Memory::MallocHuge(size, true);
    NOISEPAGE_ASSERT(buf != nullptr, "Null memory pointer");
    // No need to clear memory, guaranteed on Linux
  } else {
    if (alignment < MIN_MALLOC_ALIGNMENT) {
      if (clear) {
        buf = std::calloc(size, 1);
      } else {
        buf = std::malloc(size);
      }
    } else {
      buf = util::Memory::MallocAligned(size, alignment);
      if (clear) {
        std::memset(buf, 0, size);
      }
    }
  }

在self_driving文件夹下,有关于论文中提到的冷热切换的实现,而结合Script代码来看,似乎模型可以通过Python进行调整

关于Arrow

storage里面基本都是关于Arrow的

这玩意，又是Arrow，又是LLVM？

数据在每个block被组织成PAX（行列混存）格式，一个tuple的所有列的数据都在这个block，每个block有一个layout对象，由下面3个部分组成

总结

代码质量很高，不懂得地方看下注释基本都能反应过来

但由于LLVM使用的版本太老,不确定程序是否还能运行起来

源码仓库：https://github.com/cmu-db/noisepage

该项目于2023年在Github Archive，但我看有不少CMU的论文还是基于NoisePage上面改进，所以依然值得去剖析相关源码

编译环境

使用GCC9.3/Clang 8.0编译，CCCache作为缓存，测试系统为Ubuntu 20.04(Focal)，使用CMake + Ninja编译套件

JIT环境是LLVM的MCJIT(ORC JIT要等LLVM14以后才有)

使用Jenkinss作为CI

提供DockerFile用于测试

third_party(第三方依赖)

libpg_query 用于SQL解析，这没的说

libcuckoo 高性能HashTable库

Google的FlatBuffers，用于定义 Apache Arrow项目中的消息格式规范generated文件夹里面的代码由FlatBuffers的Compiler生成

而BW-Tree则是团队对于数据存储的实现，这里贴一段B+树和BW-Tree的不同：

特性	B+树	BW-Tree
并发机制	需加锁	无锁（基于CAS）
更新方式	原地修改	Delta节点追加 + 写时复制
崩溃恢复	较复杂	天然支持版本追踪与回滚
写放大	一定程度存在	减少写放大（无页面迁移）
查询效率（高并发）	可受锁影响	查询需遍历Delta链，视情况而定

utils

有三个文件夹，分别是execution, include, runner

runner

关于运行时（Runner）的基础配置

execution

运行环境配置，主要是关于LLVM的部分

从参数可以看出，支持打印AST和ByteCode

有一个名为TPL的Compiler

llvm::cl::OptionCategory TPL_OPTIONS_CATEGORY("TPL Compiler Options", "Options for controlling the TPL compilation process.");  // NOLINT
llvm::cl::opt<bool> PRINT_AST("print-ast", llvm::cl::desc("Print the programs AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRINT_TBC("print-tbc", llvm::cl::desc("Print the generated TPL Bytecode"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> PRETTY_PRINT("pretty-print", llvm::cl::desc("Pretty-print the source from the parsed AST"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> IS_SQL("sql", llvm::cl::desc("Is the input a SQL query?"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<bool> TPCH("tpch", llvm::cl::desc("Should the TPCH database be loaded? Requires '-schema' and '-data' directories."), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> DATA_DIR("data", llvm::cl::desc("Where to find data files of tables to load"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> INPUT_FILE(llvm::cl::Positional, llvm::cl::desc("<input file>"), llvm::cl::init(""), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> OUTPUT_NAME("output-name", llvm::cl::desc("Print the output name"), llvm::cl::init("schema10"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT
llvm::cl::opt<std::string> HANDLERS_PATH("handlers-path", llvm::cl::desc("Path to the bytecode handlers bitcode file"), llvm::cl::init("./bytecode_handlers_ir.bc"), llvm::cl::cat(TPL_OPTIONS_CATEGORY));  // NOLINT

在gen_opt_bc.cpp中，主要关于LLVM Bitcode的处理

// This executable reads the unoptimized bitcode file and:
// 1. Converts to an LLVM Module
// 2. Removes the static global variable
// 3. Modifies linkage types of all defined functions to LinkOnce
// 4. Cleans up function arguments
// 5. Writes out optimized module as bitcode file

有一个函数用于清除LLVM全局标记，我觉得可以记下

void RemoveGlobalUses(llvm::Module *module) {
  // When we created the original bitcode file, we forced all functions to be
  // generated by storing their address in a global variable. We delete this
  // variable now so the final binary can be made smaller by eliminating unused
  // ops.
  auto var = module->getGlobalVariable(GLOBAL_VAR_NAME);
  if (var != nullptr) {
    var->replaceAllUsesWith(llvm::UndefValue::get(var->getType()));
    var->eraseFromParent();
  }
  // Clang created a global variable holding all force-used items. Delete it.
  auto used = module->getGlobalVariable(LLVM_COMPILED_USED);
  if (used != nullptr) {
    used->eraseFromParent();
  }
}

在table_generator子文件夹下面是关于表的生成与读取

表的生成基于C++模板

还携带一个GenerateTestTables()的样例

test

下面有多个文件夹对应不同的测试，似乎使用CTest作为测试套件

binder用于检查SQL与执行计划的解析，包括了CTE的Dependency Graph和Struct Statement Test

TEST_F(BinderCorrectnessTest, SelectStatementComplexTest) {
  // Test regular table name
  BINDER_LOG_DEBUG("Parsing sql query");
  std::string select_sql =
      "SELECT A.A1, B.B2 FROM A INNER JOIN b ON a.a1 = b.b1 WHERE a1 < 100 "
      "GROUP BY A.a1, B.b2 HAVING a1 > 50 ORDER BY a1";
  auto parse_tree = parser::PostgresParser::BuildParseTree(select_sql);
  auto statement = parse_tree->GetStatements()[0];
  binder_->BindNameToNode(common::ManagedPointer(parse_tree), nullptr, nullptr);
  auto select_stmt = statement.CastManagedPointerTo<parser::SelectStatement>();
  EXPECT_EQ(0, select_stmt->GetDepth());
  // Check select_list
  BINDER_LOG_DEBUG("Checking select list");
  auto col_expr = select_stmt->GetSelectColumns()[0].CastManagedPointerTo<parser::ColumnValueExpression>();
  EXPECT_EQ(col_expr->GetDatabaseOid(), db_oid_);              // A.a1
  EXPECT_EQ(col_expr->GetTableOid(), table_a_oid_);            // A.a1
  EXPECT_EQ(col_expr->GetColumnOid(), catalog::col_oid_t(1));  // A.a1; columns are indexed from 1
  EXPECT_EQ(execution::sql::SqlTypeId::Integer, col_expr->GetReturnValueType());
  EXPECT_EQ(0, col_expr->GetDepth());

此外就不一一列举了

src(源码)

目录基本和Test差不多

接下来我挑选几个我感兴趣的部分看看

Binder

以访问者模式实现SQL的解析

namespace noisepage {
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::AggregateExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::CaseExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ColumnValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ComparisonExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConjunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::ConstantValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DefaultValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::DerivedValueExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}
void binder::SqlNodeVisitor::Visit(common::ManagedPointer<parser::FunctionExpression> expr) {
  expr->AcceptChildren(common::ManagedPointer(this));
}

Execution

execution下面还有诸多子一级文件夹

vm文件夹下边有很多关于bytecode的文件(感觉是Noispage有自己的一套ByteCode体系?)

BytecodeEmitter的部分代码如下

void BytecodeEmitter::EmitDeref(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Deref1 || bytecode == Bytecode::Deref2 || bytecode == Bytecode::Deref4 ||
                       bytecode == Bytecode::Deref8,
                   "Bytecode is not a Deref code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitDerefN(LocalVar dest, LocalVar src, uint32_t len) {
  EmitAll(Bytecode::DerefN, dest, src, len);
}
void BytecodeEmitter::EmitAssign(Bytecode bytecode, LocalVar dest, LocalVar src) {
  NOISEPAGE_ASSERT(bytecode == Bytecode::Assign1 || bytecode == Bytecode::Assign2 || bytecode == Bytecode::Assign4 ||
                       bytecode == Bytecode::Assign8,
                   "Bytecode is not an Assign code");
  EmitAll(bytecode, dest, src);
}
void BytecodeEmitter::EmitAssignImm1(LocalVar dest, int8_t val) { EmitAll(Bytecode::AssignImm1, dest, val); }
void BytecodeEmitter::EmitAssignImm2(LocalVar dest, int16_t val) { EmitAll(Bytecode::AssignImm2, dest, val); }
void BytecodeEmitter::EmitAssignImm4(LocalVar dest, int32_t val) { EmitAll(Bytecode::AssignImm4, dest, val); }
void BytecodeEmitter::EmitAssignImm8(LocalVar dest, int64_t val) { EmitAll(Bytecode::AssignImm8, dest, val); }
void BytecodeEmitter::EmitAssignImm4F(LocalVar dest, float val) { EmitAll(Bytecode::AssignImm4F, dest, val); }
void BytecodeEmitter::EmitAssignImm8F(LocalVar dest, double val) { EmitAll(Bytecode::AssignImm8F, dest, val); }
void BytecodeEmitter::EmitUnaryOp(Bytecode bytecode, LocalVar dest, LocalVar input) { EmitAll(bytecode, dest, input); }
void BytecodeEmitter::EmitBinaryOp(Bytecode bytecode, LocalVar dest, LocalVar lhs, LocalVar rhs) {
  EmitAll(bytecode, dest, lhs, rhs);
}

而在llvm_engine.cpp中,对于类型Type有明确的映射(我想也可以完成ByteCode往LLVM IR的映射?)

class LLVMEngine::TypeMap {
 public:
  explicit TypeMap(llvm::Module *module) : module_(module) {
    llvm::LLVMContext &ctx = module->getContext();
    type_map_["nil"] = llvm::Type::getVoidTy(ctx);
    type_map_["bool"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["int16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["int32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["int64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["int128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["uint8"] = llvm::Type::getInt8Ty(ctx);
    type_map_["uint16"] = llvm::Type::getInt16Ty(ctx);
    type_map_["uint32"] = llvm::Type::getInt32Ty(ctx);
    type_map_["uint64"] = llvm::Type::getInt64Ty(ctx);
    type_map_["uint128"] = llvm::Type::getInt128Ty(ctx);
    type_map_["float32"] = llvm::Type::getFloatTy(ctx);
    type_map_["float64"] = llvm::Type::getDoubleTy(ctx);
    type_map_["string"] = llvm::Type::getInt8PtrTy(ctx);
  }
  /** No copying or moving this class. */
  DISALLOW_COPY_AND_MOVE(TypeMap);
  llvm::Type *VoidType() { return type_map_["nil"]; }
  llvm::Type *BoolType() { return type_map_["bool"]; }
  llvm::Type *Int8Type() { return type_map_["int8"]; }
  llvm::Type *Int16Type() { return type_map_["int16"]; }
  llvm::Type *Int32Type() { return type_map_["int32"]; }
  llvm::Type *Int64Type() { return type_map_["int64"]; }
  llvm::Type *UInt8Type() { return type_map_["uint8"]; }
  llvm::Type *UInt16Type() { return type_map_["uint16"]; }
  llvm::Type *UInt32Type() { return type_map_["uint32"]; }
  llvm::Type *UInt64Type() { return type_map_["uint64"]; }
  llvm::Type *Float32Type() { return type_map_["float32"]; }
  llvm::Type *Float64Type() { return type_map_["float64"]; }
  llvm::Type *StringType() { return type_map_["string"]; }
  llvm::Type *GetLLVMType(const ast::Type *type);
 private:
  // Given a non-primitive builtin type, convert it to an LLVM type
  llvm::Type *GetLLVMTypeForBuiltin(const ast::BuiltinType *builtin_type);
  // Given a struct type, convert it into an equivalent LLVM struct type
  llvm::StructType *GetLLVMStructType(const ast::StructType *struct_type);
  // Given a TPL function type, convert it into an equivalent LLVM function type
  llvm::FunctionType *GetLLVMFunctionType(const ast::FunctionType *func_type);
 private:
  llvm::Module *module_;
  std::unordered_map<std::string, llvm::Type *> type_map_;
};

从AST完成向LLVM Struct的构建

llvm::StructType *LLVMEngine::TypeMap::GetLLVMStructType(const ast::StructType *struct_type) {
  // Collect the fields here
  llvm::SmallVector<llvm::Type *, 8> fields;
  for (const auto &field : struct_type->GetAllFields()) {
    fields.push_back(GetLLVMType(field.type_));
  }
  return llvm::StructType::create(fields);
}

从下面关于LLVM的TargetMachine可以看到,要想向量化操作还是依赖编译器的自动向量化

LVMEngine::CompiledModuleBuilder::CompiledModuleBuilder(const CompilerOptions &options,
                                                         const BytecodeModule &tpl_module)
    : options_(options),
      tpl_module_(tpl_module),
      target_machine_(nullptr),
      context_(std::make_unique<llvm::LLVMContext>()),
      llvm_module_(nullptr),
      type_map_(nullptr) {
  //
  // We need to create a suitable TargetMachine for LLVM to before we can JIT
  // TPL programs. At the moment, we rely on LLVM to discover all CPU features
  // e.g., AVX2 or AVX512, and we make no assumptions about symbol relocations.
  //
  // TODO(pmenon): This may change with LLVM8 that comes with
  // TargetMachineBuilders
  // TODO(pmenon): Alter the flags as need be
  //
  const std::string target_triple = llvm::sys::getProcessTriple();
  {
    std::string error;
    auto *target = llvm::TargetRegistry::lookupTarget(target_triple, error);
    if (target == nullptr) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find target with target_triple {}", target_triple);
      return;
    }
    // Collect CPU features
    llvm::StringMap<bool> feature_map;
    if (bool success = llvm::sys::getHostCPUFeatures(feature_map); !success) {
      EXECUTION_LOG_ERROR("LLVM: Unable to find all CPU features");
      return;
    }
    llvm::SubtargetFeatures target_features;
    for (const auto &entry : feature_map) {
      target_features.AddFeature(entry.getKey(), entry.getValue());
    }
    EXECUTION_LOG_TRACE("LLVM: Discovered CPU features: {}", target_features.getString());
    // Both relocation=PIC or JIT=true work. Use the latter for now.
    llvm::TargetOptions target_options;
    llvm::Optional<llvm::Reloc::Model> reloc;
    const llvm::CodeGenOpt::Level opt_level = llvm::CodeGenOpt::Aggressive;
    target_machine_.reset(target->createTargetMachine(target_triple, llvm::sys::getHostCPUName(),
                                                      target_features.getString(), target_options, reloc, {}, opt_level,
                                                      true));
    NOISEPAGE_ASSERT(target_machine_ != nullptr, "LLVM: Unable to find a suitable target machine!");
  }
  //
  // We've built a TargetMachine we use to generate machine code. Now, we
  // load the pre-compiled bytecode module containing all the TPL bytecode
  // logic. We add the functions we're about to compile into this module. This
  // module forms the unit of JIT.
  //
  {
    auto memory_buffer = llvm::MemoryBuffer::getFile(GetEngineSettings()->GetBytecodeHandlersBcPath());
    if (auto error = memory_buffer.getError()) {
      EXECUTION_LOG_ERROR("There was an error loading the handler bytecode: {}", error.message());
    }
    auto module = llvm::parseBitcodeFile(*(memory_buffer.get()), *context_);
    if (!module) {
      auto error = llvm::toString(module.takeError());
      EXECUTION_LOG_ERROR("{}", error);
      throw std::runtime_error(error);
    }
    llvm_module_ = std::move(module.get());
    llvm_module_->setModuleIdentifier(tpl_module.GetName());
    llvm_module_->setSourceFileName(tpl_module.GetName() + ".tpl");
    llvm_module_->setDataLayout(target_machine_->createDataLayout());
    llvm_module_->setTargetTriple(target_triple);
  }
  type_map_ = std::make_unique<TypeMap>(llvm_module_.get());
}

不太理解为什么还需要重新生成CFG,也许是发现函数过于复杂要拆分?我看更多是关于分支和LLVM Block联系的相关代码

void LLVMEngine::CompiledModuleBuilder::BuildSimpleCFG(const FunctionInfo &func_info,
                                                       std::map<std::size_t, llvm::BasicBlock *> *blocks) {
  // Before we can generate LLVM IR, we need to build a control-flow graph (CFG) for the function.
  // We do this construction directly from the TPL bytecode using a vanilla DFS and produce an
  // ordered map ('blocks') from bytecode position to an LLVM basic block. Each entry in the map
  // indicates the start of a basic block.
  // We use this vector as a stack for DFS traversal
  llvm::SmallVector<std::size_t, 16> bb_begin_positions = {0};
  for (auto iter = tpl_module_.GetBytecodeForFunction(func_info); !bb_begin_positions.empty();) {
    std::size_t begin_pos = bb_begin_positions.back();
    bb_begin_positions.pop_back();
    // We're at what we think is the start of a new basic block. Scan it until we find a terminal
    // instruction. Once we do,
    for (iter.SetPosition(begin_pos); !iter.Done(); iter.Advance()) {
      Bytecode bytecode = iter.CurrentBytecode();
      // If the bytecode isn't a terminal for the block, continue until we reach one
      if (!Bytecodes::IsTerminal(bytecode)) {
        continue;
      }
      // Return?
      if (Bytecodes::IsReturn(bytecode)) {
        break;
      }
      // Unconditional branch?
      if (Bytecodes::IsUnconditionalJump(bytecode)) {
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 0) + iter.GetJumpOffsetOperand(0);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          (*blocks)[branch_target_pos] = nullptr;
          bb_begin_positions.push_back(branch_target_pos);
        }
        break;
      }
      // Conditional branch?
      if (Bytecodes::IsConditionalJump(bytecode)) {
        std::size_t fallthrough_pos = iter.GetPosition() + iter.CurrentBytecodeSize();
        if (blocks->find(fallthrough_pos) == blocks->end()) {
          bb_begin_positions.push_back(fallthrough_pos);
          (*blocks)[fallthrough_pos] = nullptr;
        }
        std::size_t branch_target_pos =
            iter.GetPosition() + Bytecodes::GetNthOperandOffset(bytecode, 1) + iter.GetJumpOffsetOperand(1);
        if (blocks->find(branch_target_pos) == blocks->end()) {
          bb_begin_positions.push_back(branch_target_pos);
          (*blocks)[branch_target_pos] = nullptr;
        }
        break;
      }
    }
  }
}

optimize()处放上LLVM的相关优化

void LLVMEngine::CompiledModuleBuilder::Optimize() {
  llvm::legacy::FunctionPassManager function_passes(llvm_module_.get());
  // Add the appropriate TargetTransformInfo.
  function_passes.add(llvm::createTargetTransformInfoWrapperPass(target_machine_->getTargetIRAnalysis()));
  // Build up optimization pipeline.
  llvm::PassManagerBuilder pm_builder;
  uint32_t opt_level = 3;
  uint32_t size_opt_level = 0;
  bool disable_inline_hot_call_site = false;
  pm_builder.OptLevel = opt_level;
  pm_builder.Inliner = llvm::createFunctionInliningPass(opt_level, size_opt_level, disable_inline_hot_call_site);
  pm_builder.populateFunctionPassManager(function_passes);
  // Add custom passes. Hand-selected based on empirical evaluation.
  function_passes.add(llvm::createInstructionCombiningPass());
  function_passes.add(llvm::createReassociatePass());
  function_passes.add(llvm::createGVNPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  function_passes.add(llvm::createAggressiveDCEPass());
  function_passes.add(llvm::createCFGSimplificationPass());
  // Run optimization passes on all functions.
  function_passes.doInitialization();
  for (llvm::Function &func : *llvm_module_) {
    function_passes.run(func);
  }
  function_passes.doFinalization();
}

util下面的cpu_info用于读取机器信息(但感觉只是用来确定CPU三级缓存大小)

void CpuInfo::InitCacheInfo() {
#ifdef __APPLE__
  // Lookup cache sizes.
  std::size_t len = 0;
  sysctlbyname("hw.cachesize", nullptr, &len, nullptr, 0);
  auto data = std::make_unique<uint64_t[]>(len);
  sysctlbyname("hw.cachesize", data.get(), &len, nullptr, 0);
  NOISEPAGE_ASSERT(len / sizeof(uint64_t) >= 3, "Expected three levels of cache!");
  // Copy data
  for (uint32_t idx = 0; idx < K_NUM_CACHE_LEVELS; idx++) {
    cache_sizes_[idx] = data[idx];
  }
  // Lookup cache line sizes.
  std::size_t linesize;
  std::size_t sizeof_linesize = sizeof(linesize);
  sysctlbyname("hw.cachelinesize", &linesize, &sizeof_linesize, nullptr, 0);
  for (auto &cache_line_size : cache_line_sizes_) {
    cache_line_size = linesize;
  }
#else
  // Use sysconf to determine cache sizes.
  cache_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_SIZE));
  cache_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_SIZE));
  cache_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_SIZE));
  cache_line_sizes_[L1_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL1_DCACHE_LINESIZE));
  cache_line_sizes_[L2_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL2_CACHE_LINESIZE));
  cache_line_sizes_[L3_CACHE] = static_cast<uint32_t>(sysconf(_SC_LEVEL3_CACHE_LINESIZE));
#endif
}

一个暂时无法使用的csv_reader.cpp

// #include "execution/util/csv_reader.h" Fix later.

file.cpp直接通过POSIX接口写入文件

int32_t File::Write(const std::byte *data, std::size_t len) const {
  NOISEPAGE_ASSERT(IsOpen(), "File must be open before reading");
  return HANDLE_EINTR(write(fd_, data, len));
}
int64_t File::Seek(File::Whence whence, int64_t offset) const {
  static_assert(sizeof(int64_t) == sizeof(off_t), "off_t must be 64 bits");
  return lseek(fd_, static_cast<off_t>(offset), static_cast<int32_t>(whence));
}
bool File::Flush() const {
#if defined(OS_LINUX)
  return HANDLE_EINTR(fdatasync(fd_)) == 0;
#else
  return HANDLE_EINTR(fsync(fd_)) == 0;
#endif
}

而在vector_util.cpp中,则是关于AVX向量化的操作,也就是向量化操作并不是在JIT中完成,而是实现做好向量化后再塞入JIT引擎

uint32_t VectorUtil::BitVectorToSelectionVectorDenseAvX2(const uint64_t *bit_vector, uint32_t num_bits,
                                                         sel_t *sel_vector) {
  // Vector of '8's = [8,8,8,8,8,8,8]
  const auto eight = _mm_set1_epi16(8);
  // Selection vector write index
  auto idx = _mm_set1_epi16(0);
  // Selection vector size
  uint32_t k = 0;
  const uint32_t num_words = common::MathUtil::DivRoundUp(num_bits, 64);
  for (uint32_t i = 0; i < num_words; i++) {
    uint64_t word = bit_vector[i];
    for (uint32_t j = 0; j < 8; j++) {
      const auto mask = static_cast<uint8_t>(word);
      word >>= 8u;
      const __m128i match_pos_scaled =
          _mm_loadl_epi64(reinterpret_cast<const __m128i *>(&simd::K8_BIT_MATCH_LUT[mask]));
      const __m128i match_pos = _mm_cvtepi8_epi16(match_pos_scaled);
      const __m128i pos_vec = _mm_add_epi16(idx, match_pos);
      idx = _mm_add_epi16(idx, eight);
      _mm_storeu_si128(reinterpret_cast<__m128i *>(sel_vector + k), pos_vec);
      k += BitUtil::CountPopulation(static_cast<uint32_t>(mask));
    }
  }
  return k;
}

在sql文件夹下面的memorypool.cpp,用的是std::calloc和std::malloc分配内存,可在CMakeLists.txt中我记得是有开启jemalloc的,似乎并没有用上

void *MemoryPool::AllocateAligned(const std::size_t size, const std::size_t alignment, const bool clear) {
  void *buf = nullptr;
  if (size >= mmap_threshold.load(std::memory_order_relaxed)) {
    buf = util::Memory::MallocHuge(size, true);
    NOISEPAGE_ASSERT(buf != nullptr, "Null memory pointer");
    // No need to clear memory, guaranteed on Linux
  } else {
    if (alignment < MIN_MALLOC_ALIGNMENT) {
      if (clear) {
        buf = std::calloc(size, 1);
      } else {
        buf = std::malloc(size);
      }
    } else {
      buf = util::Memory::MallocAligned(size, alignment);
      if (clear) {
        std::memset(buf, 0, size);
      }
    }
  }

在self_driving文件夹下,有关于论文中提到的冷热切换的实现,而结合Script代码来看,似乎模型可以通过Python进行调整

关于Arrow

storage里面基本都是关于Arrow的

这玩意，又是Arrow，又是LLVM？

数据在每个block被组织成PAX（行列混存）格式，一个tuple的所有列的数据都在这个block，每个block有一个layout对象，由下面3个部分组成

总结

代码质量很高，不懂得地方看下注释基本都能反应过来

但由于LLVM使用的版本太老,不确定程序是否还能运行起来