从代码到图

抽象语法树

AST强调程序由哪些结构嵌套组成。它不保留所有表面语法细节，例如括号、分号和部分语法糖。

论文中将AST描述为有序树：其中内部节点通常是操作符或语法结构，叶节点通常是常量、标识符等操作数。

Joern AST

在Joern CPG中，AST子节点通过AST边连接，节点通常带有：

CODE：节点对应的原始代码片段
ORDER：该节点在兄弟节点中的次序
LINE_NUMBER：起始行号
节点类型，例如METHOD、BLOCK、CALL、IDENTIFIER、LITERAL、CONTROL_STRUCTURE

优势

语法结构清晰
- 具有层级的特征，优先级和结合性确定
适合识别局部漏洞模式
- 对于某些危险函数能够很快识别出来
易于生成、结构相对稳定

也就是说，AST更适合用于检测局部危险API模式、指针和数组访问模式、

算数表达模式、类型转换异常等与语法强相关的漏洞候选点。

局限

只能知道代码长什么样，不知道代码如何运行，数据如何流动。也就是说，AST不显式编码控制流和数据依赖，因此不适合独立完成死代码检测、未初始化变量分析等更复杂的任务。

控制流图

控制流图是一个有向图，它表示当前语句执行完以后，下一句话可能执行哪条语句。它以基本快（basic block）为节点，以控制流转移（比如if/else，while、goto）为边，描述了程序所有可能的执行路径

Joern CFG

这里的CFG只由一部分AST节点构成，同时CFG表示控制从源节点流向目的节点。Joern还会建模表达式内部的执行顺序，而不只是语句之间的顺序。

优势

显式表示可能的执行路径
- 一个风险函数出现不代表程序有漏洞，危险操作只有在条件满足可达时才能报警
适合分析检查逻辑
- 适合检查边界检查、空指针检查、权限检查、错误处理、资源释放路径、条件缺失等等这类模式
适合提取执行路径序列
- CFG可以将一个函数拆成多条潜在执行路径

局限

CFG知道谁先执行，但不知道数据是谁传给谁的。例如

int size = get_user_input();
int x = 0;
int y = calculate();
log(y);
memcpy(buf, src, size);

CFG能够完整保留执行顺序

1	get_user_input → x = 0 → calculate → log → memcpy

但没有显式说明真正和memcpy风险相关的是size，而不是x或y。因此，长函数中大量与漏洞无关的语句会成为噪声。

程序依赖图

PDG不再关心下一条执行什么，而是关心某条语句是否会受到另一条语句的影响。PDG通常由DDG和CDG组成，其中前者是数据依赖图，后者是控制依赖图。

DDG

例如

1
2
3

int n = get_user_input();
int size = n * 2;
memcpy(buf, src, size);

数据依赖可以表示如下

n = get_user_input()
        ↓  n
size = n * 2
        ↓  size
memcpy(buf, src, size)

其中size = n2依赖于get_user_input()。而memcpy又依赖于size = n\2，在Joern中这类边被称为 REACHING_DEF，意思是源节点产生的变量在沿途没有被重新赋值的情况下可以达到目标节点

同时REACHING_DEF边还带有 VARIABLE 属性，用于记录传播的是哪个变量

CDG

例如

1
2
3

if (n > 0) {
    memcpy(buf, src, n);
}

其中mcmcpy是否执行，依赖于 n>0，因此存在控制依赖

1
2
3

if (n > 0)
     ↓ CDG
memcpy(buf, src, n)

在Joern中，这类关系通过CDG边表示，边的定义为目标节点在控制上依赖于源节点

优势

能跨越长距离
能够压缩无关代码
适合污点传播于程序切片

局限

丢失完整执行顺序
- PDG表示的是依赖，而不是先后次序。如果两个函数之间没有数据依赖和控制依赖，PDG可能不会显式连接它们。
静态分析存在近似误差
- 当静态分析出现指针，指针的内容是一个函数返回地址，且静态分析无法知道具体地址，于是只能做保守推断这个地址可能是上面的某个变量，于是出现多条路径。尽管真实运行的时候这个指针只指向其中一个对象。这个问题也叫做 过度近似
生成成本更高

数据预处理

无论输入CNN还是GNN，都需要先完成三个步骤

原始代码
  ↓
Joern 解析
  ↓
CPG
  ↓
筛选节点和边
  ↓
节点文本归一化
  ↓
将文本转换成向量

Joern解析步骤我们就省略了，F&Q数据集每个样本是一个独立函数，这个筛选目标函数也可以省略。、

节点文本归一化

也就是说将变量名统一为VAR_1、VAR_2等等。但是不能过度归一化，对于函数名、运算符和常量等等需要保留。

文本转换为向量

神经网络不能直接处理字符串，必须将节点文本编码成数值向量。流行的编码方法有两种

Word2Vec

CALL memcpy
  ↓
分词
  ↓
["CALL", "memcpy"]
  ↓
词向量平均或池化
  ↓
x_i ∈ R^128

CodeBERT

CALL memcpy
  ↓
Tokenizer
  ↓
CodeBERT
  ↓
池化
  ↓
x_i ∈ R^768

输入CNN

普通一维CNN的输入通常是规则张量 $X \in \mathbb{R}^{L \times d}$ ，其中L是序列长度，d是每个未知的向量维度。对于一个自然语言句子来说，可以转换为：

我 / 喜欢 / 深度 / 学习
↓
四个词向量
↓
X ∈ R^(4 × d)

但程序图并不是序列。因为一个节点可能连接多个节点；不同函数节点数量不同；CFG可能存在循环；PDG中节点没有天然线性顺序。因此要使用CNN必须要先人为规定一种线性化策略

文本CNN使用卷积核捕捉响铃token 的局部组合模式，再通过池化获得整体表示。这类架构最初广泛应用于句子级分类

我们考虑一个例子。它源码如下：

void copy_data(char *src, int n) {
    char buf[16];

    if (n > 0) {
        memcpy(buf, src, n);
    }
}

漏洞在于 n > 16 存在栈溢出问题。

对此我们有三种中间表示

AST

METHOD: copy_data
└── BLOCK
    ├── LOCAL: buf[16]
    └── CONTROL_STRUCTURE: if
        ├── CONDITION: n > 0
        │   ├── IDENTIFIER: n
        │   └── LITERAL: 0
        └── CALL: memcpy
            ├── IDENTIFIER: buf
            ├── IDENTIFIER: src
            └── IDENTIFIER: n

CFG

ENTRY
  ↓
buf[16]
  ↓
n > 0 ?
  ├── True  → memcpy(buf, src, n) → EXIT
  └── False ─────────────────────→ EXIT

PDG

PARAMETER: n
  ├── REACHING_DEF → CONDITION: n > 0
  └── REACHING_DEF → ARGUMENT: n in memcpy

PARAMETER: src
  └── REACHING_DEF → ARGUMENT: src in memcpy

CONDITION: n > 0
  └── CDG → CALL: memcpy(buf, src, n)

AST输入CNN

对AST进行前序遍历（根、左、右），我们得到一个线性的序列。

METHOD copy_data
BLOCK
LOCAL buf[16]
CONTROL_STRUCTURE if
CALL >
IDENTIFIER n
LITERAL 0
CALL memcpy
IDENTIFIER buf
IDENTIFIER src
IDENTIFIER n

也就是说是一个长度为11的向量，我们定义每个节点维度是128，那么输入大小就是[11, 128]。

在这个张量上，我们可以直接输入CNN，然后做卷积。对于卷积操作就不多说了。然后进行最大/平均池化

这样做优点是简单，能够捕捉危险函数和表达式。缺点是AST被压平后树结构会部分丢失，例如不能识别出buf、src、n是memcpy函数的参数。但也有解决方法，就是加入结构标记

CFG输入CNN

CFG是一个有向图，可以通过DFS获取控制流路径。比如上述例子可以提取两条路径

Path 1:
ENTRY → buf[16] → n > 0 → memcpy(buf, src, n) → EXIT

Path 2:
ENTRY → buf[16] → n > 0 → EXIT

每条路径都转换成一个节点序列，也就是说这里有两个向量，大小分别是[5, 128]和[4, 128]。我们将这两个向量分别输入同一个CNN，最后聚合。

通过这种方法可以识别控制流中的局部执行模式，但存在路径爆炸问题。

PDG输入CNN

PDG最适合使用程序切片。VulDeePecker提出code gadget就是选取语义相关但不一定在源码中连续的代码行，再将其向量化用于深度学习。SySeVR进一步强调了漏洞相关语法信息于语义信息的结合，并将这些信息转换为适合深度学习的向量表示。

程序切片可以理解为：删除于漏洞无关的代码，只保留可能影响关键操作的语义链条。对于上述例子，我们可以围绕memcpy()函数做PDG反向切片，只保留能够影响它的语句

PDG切片的构造过程

首先我们选定危险节点。这一步也叫 sink

1	CALL memcpy

然后我们沿 REACHING_DEF 反向追踪

memcpy 的第三个参数 n
  ↑
n 的定义
  ↑
get_user_input()

再沿着CDG补充控制条件

1
2
3

memcpy
  ↑ CDG
n > 0

最终得到切片

get_user_input()
  → n
  → n > 0
  → memcpy(buf, src, n)

和AST先比，PDG输入通常更短，但是语义密度更高。它适合学习漏洞传播模式

输入GNN

GNN不需要将图强行压平为一维序列。因为它直接接收节点、边、节点特征、边类型。图卷积网络可以直接作用于图结构数据，并通过局部领邻域聚合学习节点表示

输入张量

对于一个函数图，定义

$G = (V, E, X)$

其中V是节点集合、E是边集合、X是节点特征矩阵。假设图中有N（120）个节点，每个节点特征维度为d（160），那么节点特征的大小为[120, 160]，其中特征维度由两部分拼接

$x_i = [ e_i^{\text{text}} ; e_i^{\text{type}} ]$

第一项为文本特征，通过Word2Vec或者CodeBERT得到，维度为128维。第二项是节点类型特征，是对应的Embedding，维度为32。

在PyTorch Geometric中，图通常使用稀疏边列表表示，参考稀疏矩阵三元组表示。

将AST输入GNN

AST通常是有向树的形式，消息默认沿父节点流向子节点。如果我们希望参数信息也能回传给父节点（CALL memcpy节点），可以手动增加反向边。

将CFG输入GNN

同样加入反向边

将PDF输入GNn

多关系R-GNC

实例

对于267_1.c的源代码

static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset)

{

    int coeff = dirac_get_se_golomb(gb);

    const int sign = FFSIGN(coeff);

    if (coeff)

        coeff = sign*((sign * coeff * qfactor + qoffset) >> 2);

    return coeff;

}

这里存在符号整数溢出漏洞和。漏洞处具体在 coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2); 的 sign * coeff * qfactor 这个表达式。因为可以狗仔而已Dirac视频码流使得 dirac_get_se_golomb(gb) 返回异常大的coeff。

AST

完整AST如下：

- 107374182400 [METHOD] order=1 line=65 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign...
  - 111669149696 [METHOD_PARAMETER_IN] order=1 line=65 code=GetBitContext *gb
  - 115964116992 [METHOD_PARAMETER_OUT] order=1 line=65 code=GetBitContext *gb
  - 111669149697 [METHOD_PARAMETER_IN] order=2 line=65 code=int qfactor
  - 115964116993 [METHOD_PARAMETER_OUT] order=2 line=65 code=int qfactor
  - 111669149698 [METHOD_PARAMETER_IN] order=3 line=65 code=int qoffset
  - 115964116994 [METHOD_PARAMETER_OUT] order=3 line=65 code=int qoffset
  - 25769803776 [BLOCK] order=4 line=67 code={ int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2); return coeff; }
    - 94489280512 [LOCAL] order=1 line=69 code=int coeff
    - 30064771072 [CALL] order=2 line=69 code=coeff = dirac_get_se_golomb(gb)
      - 68719476736 [IDENTIFIER] order=1 line=69 code=coeff
      - 30064771073 [CALL] order=2 line=69 code=dirac_get_se_golomb(gb)
        - 68719476737 [IDENTIFIER] order=1 line=69 code=gb
    - 94489280513 [LOCAL] order=3 line=71 code=const int sign
    - 30064771074 [CALL] order=4 line=71 code=sign = FFSIGN(coeff)
      - 68719476738 [IDENTIFIER] order=1 line=71 code=sign
      - 30064771075 [CALL] order=2 line=71 code=FFSIGN(coeff)
        - 68719476739 [IDENTIFIER] order=1 line=71 code=coeff
    - 47244640256 [CONTROL_STRUCTURE] order=5 line=73 code=if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2);
      - 30064771076 [CALL] order=1 line=73 code=coeff != 0
        - 68719476740 [IDENTIFIER] order=1 line=73 code=coeff
        - 90194313216 [LITERAL] order=2 line=73 code=0
      - 25769803777 [BLOCK] order=2 line=75 code=<empty>
        - 30064771077 [CALL] order=1 line=75 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
          - 68719476741 [IDENTIFIER] order=1 line=75 code=coeff
          - 30064771078 [CALL] order=2 line=75 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
            - 68719476742 [IDENTIFIER] order=1 line=75 code=sign
            - 30064771079 [CALL] order=2 line=75 code=(sign * coeff * qfactor + qoffset) >> 2
              - 30064771080 [CALL] order=1 line=75 code=sign * coeff * qfactor + qoffset
                - 30064771081 [CALL] order=1 line=75 code=sign * coeff * qfactor
                  - 30064771082 [CALL] order=1 line=75 code=sign * coeff
                    - 68719476743 [IDENTIFIER] order=1 line=75 code=sign
                    - 68719476744 [IDENTIFIER] order=2 line=75 code=coeff
                  - 68719476745 [IDENTIFIER] order=2 line=75 code=qfactor
                - 68719476746 [IDENTIFIER] order=2 line=75 code=qoffset
              - 90194313217 [LITERAL] order=2 line=75 code=2
    - 141733920768 [RETURN] order=6 line=77 code=return coeff;
      - 68719476747 [IDENTIFIER] order=1 line=77 code=coeff
  - 128849018880 [MODIFIER] order=5 line=65 code=<empty>
  - 124554051584 [METHOD_RETURN] order=6 line=65 code=RET

我们可以这样读每一行

1	缩进 - 节点ID [节点类型] order=兄弟节点顺序 line=源码行号 code=对应代码片段

缩进越深，表示它是上面节点的子节点。我们可以先忽略超长节点ID，只看节点类型和 code=，可以缩略成以下整体结构

METHOD coeff_unpack_golomb
  METHOD_PARAMETER_IN gb
  METHOD_PARAMETER_OUT gb
  METHOD_PARAMETER_IN qfactor
  METHOD_PARAMETER_OUT qfactor
  METHOD_PARAMETER_IN qoffset
  METHOD_PARAMETER_OUT qoffset
  BLOCK
    LOCAL int coeff
    CALL coeff = dirac_get_se_golomb(gb)
    LOCAL const int sign
    CALL sign = FFSIGN(coeff)
    CONTROL_STRUCTURE if (coeff)
      CALL coeff != 0
      BLOCK
        CALL coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
    RETURN return coeff
  MODIFIER
  METHOD_RETURN RET

其中

[METHOD] 是函数根节点，对应整个 coeff_unpack_golomb。
[METHOD_PARAMETER_IN] / [METHOD_PARAMETER_OUT] 成对出现，是 Joern/CPG 为数据流建模准备的，不是源码里真的写了两遍参数。
[BLOCK] 是函数体 { ... }。
[LOCAL] 表示局部变量声明，比如 int coeff、const int sign。
[CALL] 不只表示普通函数调用，也表示运算符调用。比如赋值 =, 乘法 *, 加法 +, 右移 >> 都会被 Joern 表示成 CALL 节点。
[CONTROL_STRUCTURE] 是控制结构，比如 if。

CFG

完整内容如下

CFG nodes:
  N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ...
  N02: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff
  N03: 68719476737 [IDENTIFIER] line=69 order=1 code=gb
  N04: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb)
  N05: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb)
  N06: 68719476738 [IDENTIFIER] line=71 order=1 code=sign
  N07: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff
  N08: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff)
  N09: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff)
  N10: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff
  N11: 90194313216 [LITERAL] line=73 order=2 code=0
  N12: 30064771076 [CALL] line=73 order=1 code=coeff != 0
  N13: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff
  N14: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff
  N15: 68719476742 [IDENTIFIER] line=75 order=1 code=sign
  N16: 141733920768 [RETURN] line=77 order=6 code=return coeff;
  N17: 68719476743 [IDENTIFIER] line=75 order=1 code=sign
  N18: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET
  N19: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff
  N20: 30064771082 [CALL] line=75 order=1 code=sign * coeff
  N21: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor
  N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor
  N23: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset
  N24: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset
  N25: 90194313217 [LITERAL] line=75 order=2 code=2
  N26: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2
  N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
  N28: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)

CFG edges:
  N01 -> N02: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  coeff
  N02 -> N03: coeff  ==>  gb
  N03 -> N04: gb  ==>  dirac_get_se_golomb(gb)
  N04 -> N05: dirac_get_se_golomb(gb)  ==>  coeff = dirac_get_se_golomb(gb)
  N05 -> N06: coeff = dirac_get_se_golomb(gb)  ==>  sign
  N06 -> N07: sign  ==>  coeff
  N07 -> N08: coeff  ==>  FFSIGN(coeff)
  N08 -> N09: FFSIGN(coeff)  ==>  sign = FFSIGN(coeff)
  N09 -> N10: sign = FFSIGN(coeff)  ==>  coeff
  N10 -> N11: coeff  ==>  0
  N11 -> N12: 0  ==>  coeff != 0
  N12 -> N13: coeff != 0  ==>  coeff
  N12 -> N14: coeff != 0  ==>  coeff
  N13 -> N15: coeff  ==>  sign
  N14 -> N16: coeff  ==>  return coeff;
  N15 -> N17: sign  ==>  sign
  N16 -> N18: return coeff;  ==>  RET
  N17 -> N19: sign  ==>  coeff
  N19 -> N20: coeff  ==>  sign * coeff
  N20 -> N21: sign * coeff  ==>  qfactor
  N21 -> N22: qfactor  ==>  sign * coeff * qfactor
  N22 -> N23: sign * coeff * qfactor  ==>  qoffset
  N23 -> N24: qoffset  ==>  sign * coeff * qfactor + qoffset
  N24 -> N25: sign * coeff * qfactor + qoffset  ==>  2
  N25 -> N26: 2  ==>  (sign * coeff * qfactor + qoffset) >> 2
  N26 -> N27: (sign * coeff * qfactor + qoffset) >> 2  ==>  sign*((sign * coeff * qfactor + qoffset) >> 2)
  N27 -> N28: sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
  N28 -> N14: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  coeff

对应流程图如下

flowchart TB
    accTitle: coeff_unpack_golomb CFG
    accDescr: Simplified control flow for coeff_unpack_golomb, showing the condition branch and the merge at the return statement.

    entry([entry: coeff_unpack_golomb]) --> read_coeff["coeff = dirac_get_se_golomb(gb)"]
    read_coeff --> read_sign["sign = FFSIGN(coeff)"]
    read_sign --> test{"coeff != 0?"}
    test -->|true| scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
    test -->|false| return_coeff["return coeff"]
    scale_coeff --> return_coeff
    return_coeff --> exit_node([RET])

    classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
    classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
    classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12

    class entry,exit_node terminal
    class read_coeff,read_sign,scale_coeff,return_coeff process
    class test decision

Simplified node	Original Joern CFG nodes
`entry`	`METHOD` node `107374182400`
`read_coeff`	line 69 identifier/function-call/assignment nodes `68719476736`, `68719476737`, `30064771073`, `30064771072`
`read_sign`	line 71 identifier/function-call/assignment nodes `68719476738`, `68719476739`, `30064771075`, `30064771074`
`test`	line 73 condition nodes `68719476740`, `90194313216`, `30064771076`
`scale_coeff`	line 75 expression and assignment nodes `68719476741` through `30064771077`
`return_coeff`	line 77 nodes `68719476747`, `141733920768`
`exit_node`	`METHOD_RETURN` node `124554051584`

PDG

原始内容如下

PDG nodes:
  N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ...
  N02: 111669149696 [METHOD_PARAMETER_IN] line=65 order=1 code=GetBitContext *gb
  N03: 115964116992 [METHOD_PARAMETER_OUT] line=65 order=1 code=GetBitContext *gb
  N04: 111669149697 [METHOD_PARAMETER_IN] line=65 order=2 code=int qfactor
  N05: 115964116993 [METHOD_PARAMETER_OUT] line=65 order=2 code=int qfactor
  N06: 111669149698 [METHOD_PARAMETER_IN] line=65 order=3 code=int qoffset
  N07: 115964116994 [METHOD_PARAMETER_OUT] line=65 order=3 code=int qoffset
  N08: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET
  N09: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff
  N10: 68719476737 [IDENTIFIER] line=69 order=1 code=gb
  N11: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb)
  N12: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb)
  N13: 68719476738 [IDENTIFIER] line=71 order=1 code=sign
  N14: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff
  N15: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff)
  N16: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff)
  N17: 30064771076 [CALL] line=73 order=1 code=coeff != 0
  N18: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff
  N19: 90194313216 [LITERAL] line=73 order=2 code=0
  N20: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
  N21: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset
  N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor
  N23: 30064771082 [CALL] line=75 order=1 code=sign * coeff
  N24: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff
  N25: 68719476742 [IDENTIFIER] line=75 order=1 code=sign
  N26: 68719476743 [IDENTIFIER] line=75 order=1 code=sign
  N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
  N28: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2
  N29: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff
  N30: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor
  N31: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset
  N32: 90194313217 [LITERAL] line=75 order=2 code=2
  N33: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff
  N34: 141733920768 [RETURN] line=77 order=6 code=return coeff;

PDG edges:
  N17 -> N20 [CDG]: coeff != 0  ==>  coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
  N17 -> N21 [CDG]: coeff != 0  ==>  sign * coeff * qfactor + qoffset
  N17 -> N22 [CDG]: coeff != 0  ==>  sign * coeff * qfactor
  N17 -> N23 [CDG]: coeff != 0  ==>  sign * coeff
  N17 -> N24 [CDG]: coeff != 0  ==>  coeff
  N17 -> N25 [CDG]: coeff != 0  ==>  sign
  N17 -> N26 [CDG]: coeff != 0  ==>  sign
  N17 -> N27 [CDG]: coeff != 0  ==>  sign*((sign * coeff * qfactor + qoffset) >> 2)
  N17 -> N28 [CDG]: coeff != 0  ==>  (sign * coeff * qfactor + qoffset) >> 2
  N17 -> N29 [CDG]: coeff != 0  ==>  coeff
  N17 -> N30 [CDG]: coeff != 0  ==>  qfactor
  N17 -> N31 [CDG]: coeff != 0  ==>  qoffset
  N17 -> N32 [CDG]: coeff != 0  ==>  2
  N01 -> N02 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  GetBitContext *gb
  N01 -> N04 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  int qfactor
  N01 -> N06 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  int qoffset
  N01 -> N10 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  gb
  N01 -> N14 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  coeff
  N01 -> N18 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  coeff
  N01 -> N19 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  0
  N01 -> N25 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  sign
  N01 -> N26 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  sign
  N01 -> N29 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  coeff
  N01 -> N30 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  qfactor
  N01 -> N31 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  qoffset
  N01 -> N32 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  2
  N01 -> N33 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof...  ==>  coeff
  N02 -> N03 [REACHING_DEF property=gb]: GetBitContext *gb  ==>  GetBitContext *gb
  N02 -> N10 [REACHING_DEF property=gb]: GetBitContext *gb  ==>  gb
  N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor  ==>  int qfactor
  N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor  ==>  int qfactor
  N04 -> N08 [REACHING_DEF property=qfactor]: int qfactor  ==>  RET
  N04 -> N30 [REACHING_DEF property=qfactor]: int qfactor  ==>  qfactor
  N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset  ==>  int qoffset
  N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset  ==>  int qoffset
  N06 -> N08 [REACHING_DEF property=qoffset]: int qoffset  ==>  RET
  N06 -> N31 [REACHING_DEF property=qoffset]: int qoffset  ==>  qoffset
  N09 -> N11 [REACHING_DEF property=coeff]: coeff  ==>  coeff = dirac_get_se_golomb(gb)
  N09 -> N14 [REACHING_DEF property=coeff]: coeff  ==>  coeff
  N10 -> N03 [REACHING_DEF property=gb]: gb  ==>  GetBitContext *gb
  N10 -> N08 [REACHING_DEF property=gb]: gb  ==>  RET
  N10 -> N12 [REACHING_DEF property=gb]: gb  ==>  dirac_get_se_golomb(gb)
  N11 -> N08 [REACHING_DEF property=coeff = dirac_get_se_golomb(gb)]: coeff = dirac_get_se_golomb(gb)  ==>  RET
  N12 -> N08 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb)  ==>  RET
  N12 -> N09 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb)  ==>  coeff
  N12 -> N11 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb)  ==>  coeff = dirac_get_se_golomb(gb)
  N13 -> N08 [REACHING_DEF property=sign]: sign  ==>  RET
  N13 -> N16 [REACHING_DEF property=sign]: sign  ==>  sign = FFSIGN(coeff)
  N13 -> N26 [REACHING_DEF property=sign]: sign  ==>  sign
  N14 -> N15 [REACHING_DEF property=coeff]: coeff  ==>  FFSIGN(coeff)
  N14 -> N18 [REACHING_DEF property=coeff]: coeff  ==>  coeff
  N15 -> N08 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff)  ==>  RET
  N15 -> N13 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff)  ==>  sign
  N15 -> N16 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff)  ==>  sign = FFSIGN(coeff)
  N16 -> N08 [REACHING_DEF property=sign = FFSIGN(coeff)]: sign = FFSIGN(coeff)  ==>  RET
  N17 -> N08 [REACHING_DEF property=coeff != 0]: coeff != 0  ==>  RET
  N18 -> N08 [REACHING_DEF property=coeff]: coeff  ==>  RET
  N18 -> N17 [REACHING_DEF property=coeff]: coeff  ==>  coeff != 0
  N18 -> N29 [REACHING_DEF property=coeff]: coeff  ==>  coeff
  N18 -> N33 [REACHING_DEF property=coeff]: coeff  ==>  coeff
  N19 -> N17 [REACHING_DEF property=0]: 0  ==>  coeff != 0
  N19 -> N18 [REACHING_DEF property=0]: 0  ==>  coeff
  N20 -> N08 [REACHING_DEF property=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)]: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  RET
  N21 -> N08 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset  ==>  RET
  N21 -> N28 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset  ==>  (sign * coeff * qfactor + qoffset) >> 2
  N22 -> N08 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor  ==>  RET
  N22 -> N21 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor  ==>  sign * coeff * qfactor + qoffset
  N23 -> N08 [REACHING_DEF property=sign * coeff]: sign * coeff  ==>  RET
  N23 -> N22 [REACHING_DEF property=sign * coeff]: sign * coeff  ==>  sign * coeff * qfactor
  N23 -> N30 [REACHING_DEF property=sign * coeff]: sign * coeff  ==>  qfactor
  N24 -> N08 [REACHING_DEF property=coeff]: coeff  ==>  RET
  N24 -> N20 [REACHING_DEF property=coeff]: coeff  ==>  coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
  N24 -> N33 [REACHING_DEF property=coeff]: coeff  ==>  coeff
  N25 -> N08 [REACHING_DEF property=sign]: sign  ==>  RET
  N25 -> N27 [REACHING_DEF property=sign]: sign  ==>  sign*((sign * coeff * qfactor + qoffset) >> 2)
  N25 -> N28 [REACHING_DEF property=sign]: sign  ==>  (sign * coeff * qfactor + qoffset) >> 2
  N26 -> N23 [REACHING_DEF property=sign]: sign  ==>  sign * coeff
  N26 -> N25 [REACHING_DEF property=sign]: sign  ==>  sign
  N26 -> N29 [REACHING_DEF property=sign]: sign  ==>  coeff
  N27 -> N08 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  RET
  N27 -> N20 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
  N27 -> N24 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2)  ==>  coeff
  N28 -> N08 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2  ==>  RET
  N28 -> N25 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2  ==>  sign
  N28 -> N27 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2  ==>  sign*((sign * coeff * qfactor + qoffset) >> 2)
  N29 -> N23 [REACHING_DEF property=coeff]: coeff  ==>  sign * coeff
  N29 -> N26 [REACHING_DEF property=coeff]: coeff  ==>  sign
  N30 -> N05 [REACHING_DEF property=qfactor]: qfactor  ==>  int qfactor
  N30 -> N08 [REACHING_DEF property=qfactor]: qfactor  ==>  RET
  N30 -> N22 [REACHING_DEF property=qfactor]: qfactor  ==>  sign * coeff * qfactor
  N30 -> N23 [REACHING_DEF property=qfactor]: qfactor  ==>  sign * coeff
  N31 -> N07 [REACHING_DEF property=qoffset]: qoffset  ==>  int qoffset
  N31 -> N08 [REACHING_DEF property=qoffset]: qoffset  ==>  RET
  N31 -> N21 [REACHING_DEF property=qoffset]: qoffset  ==>  sign * coeff * qfactor + qoffset
  N32 -> N21 [REACHING_DEF property=2]: 2  ==>  sign * coeff * qfactor + qoffset
  N32 -> N28 [REACHING_DEF property=2]: 2  ==>  (sign * coeff * qfactor + qoffset) >> 2
  N33 -> N34 [REACHING_DEF property=coeff]: coeff  ==>  return coeff;
  N34 -> N08 [REACHING_DEF property=<RET>]: return coeff;  ==>  RET

对应可视化如下

flowchart LR
    accTitle: coeff_unpack_golomb PDG
    accDescr: Simplified program-dependence graph showing data dependencies from parameters and assignments, plus the control dependency from the if condition to the guarded assignment.

    gb_param([gb parameter])
    qfactor_param([qfactor parameter])
    qoffset_param([qoffset parameter])

    read_coeff["coeff = dirac_get_se_golomb(gb)"]
    read_sign["sign = FFSIGN(coeff)"]
    test{"coeff != 0?"}
    scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
    return_coeff["return coeff"]
    exit_node([RET])

    gb_param -->|"data: gb"| read_coeff
    read_coeff -->|"data: coeff"| read_sign
    read_coeff -->|"data: coeff"| test
    read_coeff -->|"data: coeff"| scale_coeff
    read_sign -->|"data: sign"| scale_coeff
    qfactor_param -->|"data: qfactor"| scale_coeff
    qoffset_param -->|"data: qoffset"| scale_coeff
    test -.->|"control"| scale_coeff
    read_coeff -->|"data: coeff if false"| return_coeff
    scale_coeff -->|"data: coeff if true"| return_coeff
    return_coeff -->|"return value"| exit_node

    classDef param fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
    classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
    classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
    classDef exit fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d

    class gb_param,qfactor_param,qoffset_param param
    class read_coeff,read_sign,scale_coeff,return_coeff process
    class test decision
    class exit_node exit

Dependency type	Meaning in this function
`REACHING_DEF`	A value definition can reach a later use, such as `coeff` reaching `FFSIGN(coeff)` or `return coeff`
`CDG`	A node is executed only under a control condition, here the line 75 assignment is controlled by `coeff != 0`

Simplified dependency	Original Joern relation
`gb parameter -> coeff = dirac_get_se_golomb(gb)`	`REACHING_DEF` through the `gb` identifier and call node
`coeff = dirac_get_se_golomb(gb) -> sign = FFSIGN(coeff)`	`REACHING_DEF` for `coeff`
`coeff = dirac_get_se_golomb(gb) -> coeff != 0`	`REACHING_DEF` for `coeff`
`coeff != 0 -> line 75 assignment`	`CDG` from condition node `30064771076` to line 75 expression nodes
`sign`, `coeff`, `qfactor`, `qoffset -> line 75 assignment`	`REACHING_DEF` into the scale expression
initial or updated `coeff -> return coeff`	`REACHING_DEF` into the return value / method return