0%

从代码到图

从代码到图

抽象语法树

AST强调程序由哪些结构嵌套组成。它不保留所有表面语法细节,例如括号、分号和部分语法糖。

论文中将AST描述为有序树:其中内部节点通常是操作符或语法结构,叶节点通常是常量、标识符等操作数。

Joern AST

在Joern CPG中,AST子节点通过AST边连接,节点通常带有:

  • CODE:节点对应的原始代码片段
  • ORDER:该节点在兄弟节点中的次序
  • LINE_NUMBER:起始行号
  • 节点类型,例如METHOD、BLOCK、CALL、IDENTIFIER、LITERAL、CONTROL_STRUCTURE

优势

  • 语法结构清晰

    • 具有层级的特征,优先级和结合性确定
  • 适合识别局部漏洞模式

    • 对于某些危险函数能够很快识别出来
  • 易于生成、结构相对稳定

也就是说,AST更适合用于检测局部危险API模式、指针和数组访问模式、

算数表达模式、类型转换异常等与语法强相关的漏洞候选点。

局限

只能知道代码长什么样,不知道代码如何运行,数据如何流动。也就是说,AST不显式编码控制流和数据依赖,因此不适合独立完成死代码检测、未初始化变量分析等更复杂的任务。

控制流图

控制流图是一个有向图,它表示当前语句执行完以后,下一句话可能执行哪条语句。它以基本快(basic block)为节点,以控制流转移(比如if/else,while、goto)为边,描述了程序所有可能的执行路径

Joern CFG

这里的CFG只由一部分AST节点构成,同时CFG表示控制从源节点流向目的节点。Joern还会建模表达式内部的执行顺序,而不只是语句之间的顺序。

优势

  • 显式表示可能的执行路径
    • 一个风险函数出现不代表程序有漏洞,危险操作只有在条件满足可达时才能报警
  • 适合分析检查逻辑
    • 适合检查边界检查、空指针检查、权限检查、错误处理、资源释放路径、条件缺失等等这类模式
  • 适合提取执行路径序列
    • CFG可以将一个函数拆成多条潜在执行路径

局限

CFG知道谁先执行,但不知道数据是谁传给谁的。例如

1
2
3
4
5
int size = get_user_input();
int x = 0;
int y = calculate();
log(y);
memcpy(buf, src, size);

CFG能够完整保留执行顺序

1
get_user_input → x = 0 → calculate → log → memcpy

但没有显式说明真正和memcpy风险相关的是size,而不是x或y。因此,长函数中大量与漏洞无关的语句会成为噪声。

程序依赖图

PDG不再关心下一条执行什么,而是关心某条语句是否会受到另一条语句的影响。PDG通常由DDG和CDG组成,其中前者是数据依赖图,后者是控制依赖图。

DDG

例如

1
2
3
int n = get_user_input();
int size = n * 2;
memcpy(buf, src, size);

数据依赖可以表示如下

1
2
3
4
5
n = get_user_input()
↓ n
size = n * 2
↓ size
memcpy(buf, src, size)

其中size = n2依赖于get_user_input()。而memcpy又依赖于size = n\2,在Joern中这类边被称为 REACHING_DEF,意思是源节点产生的变量在沿途没有被重新赋值的情况下可以达到目标节点

同时REACHING_DEF边还带有 VARIABLE 属性, 用于记录传播的是哪个变量

CDG

例如

1
2
3
if (n > 0) {
memcpy(buf, src, n);
}

其中mcmcpy是否执行,依赖于 n>0,因此存在控制依赖

1
2
3
if (n > 0)
↓ CDG
memcpy(buf, src, n)

在Joern中,这类关系通过CDG边表示,边的定义为目标节点在控制上依赖于源节点

优势

  • 能跨越长距离

  • 能够压缩无关代码

  • 适合污点传播于程序切片

局限

  • 丢失完整执行顺序
    • PDG表示的是依赖,而不是先后次序。如果两个函数之间没有数据依赖和控制依赖,PDG可能不会显式连接它们。
  • 静态分析存在近似误差
    • 当静态分析出现指针,指针的内容是一个函数返回地址,且静态分析无法知道具体地址,于是只能做保守推断这个地址可能是上面的某个变量,于是出现多条路径。尽管真实运行的时候这个指针只指向其中一个对象。这个问题也叫做 过度近似
  • 生成成本更高

数据预处理

无论输入CNN还是GNN,都需要先完成三个步骤

1
2
3
4
5
6
7
8
9
10
11
原始代码

Joern 解析

CPG

筛选节点和边

节点文本归一化

将文本转换成向量

Joern解析步骤我们就省略了,F&Q数据集每个样本是一个独立函数,这个筛选目标函数也可以省略。、

节点文本归一化

也就是说将变量名统一为VAR_1、VAR_2等等。但是不能过度归一化,对于函数名、运算符和常量等等需要保留。

文本转换为向量

神经网络不能直接处理字符串,必须将节点文本编码成数值向量。流行的编码方法有两种

Word2Vec

1
2
3
4
5
6
7
8
9
CALL memcpy

分词

["CALL", "memcpy"]

词向量平均或池化

x_i ∈ R^128

CodeBERT

1
2
3
4
5
6
7
8
9
CALL memcpy

Tokenizer

CodeBERT

池化

x_i ∈ R^768

输入CNN

普通一维CNN的输入通常是规则张量,其中L是序列长度,d是每个未知的向量维度。对于一个自然语言句子来说,可以转换为:

1
2
3
4
5
我 / 喜欢 / 深度 / 学习

四个词向量

X ∈ R^(4 × d)

但程序图并不是序列。因为一个节点可能连接多个节点;不同函数节点数量不同;CFG可能存在循环;PDG中节点没有天然线性顺序。因此要使用CNN必须要先人为规定一种线性化策略

文本CNN使用卷积核捕捉响铃token 的局部组合模式,再通过池化获得整体表示。这类架构最初广泛应用于句子级分类

我们考虑一个例子。它源码如下:

1
2
3
4
5
6
7
void copy_data(char *src, int n) {
char buf[16];

if (n > 0) {
memcpy(buf, src, n);
}
}

漏洞在于 n > 16 存在栈溢出问题。

对此我们有三种中间表示

AST
1
2
3
4
5
6
7
8
9
10
11
METHOD: copy_data
└── BLOCK
├── LOCAL: buf[16]
└── CONTROL_STRUCTURE: if
├── CONDITION: n > 0
│ ├── IDENTIFIER: n
│ └── LITERAL: 0
└── CALL: memcpy
├── IDENTIFIER: buf
├── IDENTIFIER: src
└── IDENTIFIER: n
CFG
1
2
3
4
5
6
7
ENTRY

buf[16]

n > 0 ?
├── True → memcpy(buf, src, n) → EXIT
└── False ─────────────────────→ EXIT
PDG
1
2
3
4
5
6
7
8
9
PARAMETER: n
├── REACHING_DEF → CONDITION: n > 0
└── REACHING_DEF → ARGUMENT: n in memcpy

PARAMETER: src
└── REACHING_DEF → ARGUMENT: src in memcpy

CONDITION: n > 0
└── CDG → CALL: memcpy(buf, src, n)

AST输入CNN

对AST进行前序遍历(根、左、右),我们得到一个线性的序列。

1
2
3
4
5
6
7
8
9
10
11
METHOD copy_data
BLOCK
LOCAL buf[16]
CONTROL_STRUCTURE if
CALL >
IDENTIFIER n
LITERAL 0
CALL memcpy
IDENTIFIER buf
IDENTIFIER src
IDENTIFIER n

也就是说是一个长度为11的向量,我们定义每个节点维度是128,那么输入大小就是[11, 128]。

在这个张量上,我们可以直接输入CNN,然后做卷积。对于卷积操作就不多说了。然后进行最大/平均池化

这样做优点是简单,能够捕捉危险函数和表达式。缺点是AST被压平后树结构会部分丢失,例如不能识别出buf、src、n是memcpy函数的参数。但也有解决方法,就是加入结构标记

CFG输入CNN

CFG是一个有向图,可以通过DFS获取控制流路径。比如上述例子可以提取两条路径

1
2
3
4
5
Path 1:
ENTRY → buf[16] → n > 0 → memcpy(buf, src, n) → EXIT

Path 2:
ENTRY → buf[16] → n > 0 → EXIT

每条路径都转换成一个节点序列,也就是说这里有两个向量,大小分别是[5, 128][4, 128]。我们将这两个向量分别输入同一个CNN,最后聚合。

通过这种方法可以识别控制流中的局部执行模式,但存在路径爆炸问题。

PDG输入CNN

PDG最适合使用程序切片。VulDeePecker提出code gadget就是选取语义相关但不一定在源码中连续的代码行,再将其向量化用于深度学习。SySeVR进一步强调了漏洞相关语法信息于语义信息的结合,并将这些信息转换为适合深度学习的向量表示。

程序切片可以理解为:删除于漏洞无关的代码,只保留可能影响关键操作的语义链条。对于上述例子,我们可以围绕memcpy()函数做PDG反向切片,只保留能够影响它的语句

PDG切片的构造过程

首先我们选定危险节点。这一步也叫 sink

1
CALL memcpy

然后我们沿 REACHING_DEF 反向追踪

1
2
3
4
5
memcpy 的第三个参数 n

n 的定义

get_user_input()

再沿着CDG补充控制条件

1
2
3
memcpy
↑ CDG
n > 0

最终得到切片

1
2
3
4
get_user_input()
→ n
→ n > 0
→ memcpy(buf, src, n)

和AST先比,PDG输入通常更短,但是语义密度更高。它适合学习漏洞传播模式

输入GNN

GNN不需要将图强行压平为一维序列。因为它直接接收节点、边、节点特征、边类型。图卷积网络可以直接作用于图结构数据,并通过局部领邻域聚合学习节点表示

输入张量

对于一个函数图,定义

其中V是节点集合、E是边集合、X是节点特征矩阵。假设图中有N(120)个节点,每个节点特征维度为d(160),那么节点特征的大小为[120, 160],其中特征维度由两部分拼接

第一项为文本特征,通过Word2Vec或者CodeBERT得到,维度为128维。第二项是节点类型特征,是对应的Embedding,维度为32。

在PyTorch Geometric中,图通常使用稀疏边列表表示,参考稀疏矩阵三元组表示。

将AST输入GNN

AST通常是有向树的形式,消息默认沿父节点流向子节点。如果我们希望参数信息也能回传给父节点(CALL memcpy节点),可以手动增加反向边。

将CFG输入GNN

同样加入反向边

将PDF输入GNn

多关系R-GNC

实例

对于267_1.c的源代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset)

{

int coeff = dirac_get_se_golomb(gb);

const int sign = FFSIGN(coeff);

if (coeff)

coeff = sign*((sign * coeff * qfactor + qoffset) >> 2);

return coeff;

}

这里存在符号整数溢出漏洞和。漏洞处具体在 coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2);sign * coeff * qfactor 这个表达式。因为可以狗仔而已Dirac视频码流使得 dirac_get_se_golomb(gb) 返回异常大的coeff。

AST

完整AST如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
- 107374182400 [METHOD] order=1 line=65 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign...
- 111669149696 [METHOD_PARAMETER_IN] order=1 line=65 code=GetBitContext *gb
- 115964116992 [METHOD_PARAMETER_OUT] order=1 line=65 code=GetBitContext *gb
- 111669149697 [METHOD_PARAMETER_IN] order=2 line=65 code=int qfactor
- 115964116993 [METHOD_PARAMETER_OUT] order=2 line=65 code=int qfactor
- 111669149698 [METHOD_PARAMETER_IN] order=3 line=65 code=int qoffset
- 115964116994 [METHOD_PARAMETER_OUT] order=3 line=65 code=int qoffset
- 25769803776 [BLOCK] order=4 line=67 code={ int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2); return coeff; }
- 94489280512 [LOCAL] order=1 line=69 code=int coeff
- 30064771072 [CALL] order=2 line=69 code=coeff = dirac_get_se_golomb(gb)
- 68719476736 [IDENTIFIER] order=1 line=69 code=coeff
- 30064771073 [CALL] order=2 line=69 code=dirac_get_se_golomb(gb)
- 68719476737 [IDENTIFIER] order=1 line=69 code=gb
- 94489280513 [LOCAL] order=3 line=71 code=const int sign
- 30064771074 [CALL] order=4 line=71 code=sign = FFSIGN(coeff)
- 68719476738 [IDENTIFIER] order=1 line=71 code=sign
- 30064771075 [CALL] order=2 line=71 code=FFSIGN(coeff)
- 68719476739 [IDENTIFIER] order=1 line=71 code=coeff
- 47244640256 [CONTROL_STRUCTURE] order=5 line=73 code=if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2);
- 30064771076 [CALL] order=1 line=73 code=coeff != 0
- 68719476740 [IDENTIFIER] order=1 line=73 code=coeff
- 90194313216 [LITERAL] order=2 line=73 code=0
- 25769803777 [BLOCK] order=2 line=75 code=<empty>
- 30064771077 [CALL] order=1 line=75 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
- 68719476741 [IDENTIFIER] order=1 line=75 code=coeff
- 30064771078 [CALL] order=2 line=75 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
- 68719476742 [IDENTIFIER] order=1 line=75 code=sign
- 30064771079 [CALL] order=2 line=75 code=(sign * coeff * qfactor + qoffset) >> 2
- 30064771080 [CALL] order=1 line=75 code=sign * coeff * qfactor + qoffset
- 30064771081 [CALL] order=1 line=75 code=sign * coeff * qfactor
- 30064771082 [CALL] order=1 line=75 code=sign * coeff
- 68719476743 [IDENTIFIER] order=1 line=75 code=sign
- 68719476744 [IDENTIFIER] order=2 line=75 code=coeff
- 68719476745 [IDENTIFIER] order=2 line=75 code=qfactor
- 68719476746 [IDENTIFIER] order=2 line=75 code=qoffset
- 90194313217 [LITERAL] order=2 line=75 code=2
- 141733920768 [RETURN] order=6 line=77 code=return coeff;
- 68719476747 [IDENTIFIER] order=1 line=77 code=coeff
- 128849018880 [MODIFIER] order=5 line=65 code=<empty>
- 124554051584 [METHOD_RETURN] order=6 line=65 code=RET

我们可以这样读每一行

1
缩进 - 节点ID [节点类型] order=兄弟节点顺序 line=源码行号 code=对应代码片段

缩进越深,表示它是上面节点的子节点。我们可以先忽略超长节点ID,只看 节点类型 和 code=,可以缩略成以下整体结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
METHOD coeff_unpack_golomb
METHOD_PARAMETER_IN gb
METHOD_PARAMETER_OUT gb
METHOD_PARAMETER_IN qfactor
METHOD_PARAMETER_OUT qfactor
METHOD_PARAMETER_IN qoffset
METHOD_PARAMETER_OUT qoffset
BLOCK
LOCAL int coeff
CALL coeff = dirac_get_se_golomb(gb)
LOCAL const int sign
CALL sign = FFSIGN(coeff)
CONTROL_STRUCTURE if (coeff)
CALL coeff != 0
BLOCK
CALL coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
RETURN return coeff
MODIFIER
METHOD_RETURN RET

其中

  • [METHOD] 是函数根节点,对应整个 coeff_unpack_golomb

  • [METHOD_PARAMETER_IN] / [METHOD_PARAMETER_OUT] 成对出现,是 Joern/CPG 为数据流建模准备的,不是源码里真的写了两遍参数。

  • [BLOCK] 是函数体 { ... }

  • [LOCAL] 表示局部变量声明,比如 int coeffconst int sign

  • [CALL] 不只表示普通函数调用,也表示运算符调用。比如赋值 =, 乘法 *, 加法 +, 右移 >> 都会被 Joern 表示成 CALL 节点。

  • [CONTROL_STRUCTURE] 是控制结构,比如 if

CFG

完整内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
CFG nodes:
N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ...
N02: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff
N03: 68719476737 [IDENTIFIER] line=69 order=1 code=gb
N04: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb)
N05: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb)
N06: 68719476738 [IDENTIFIER] line=71 order=1 code=sign
N07: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff
N08: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff)
N09: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff)
N10: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff
N11: 90194313216 [LITERAL] line=73 order=2 code=0
N12: 30064771076 [CALL] line=73 order=1 code=coeff != 0
N13: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff
N14: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff
N15: 68719476742 [IDENTIFIER] line=75 order=1 code=sign
N16: 141733920768 [RETURN] line=77 order=6 code=return coeff;
N17: 68719476743 [IDENTIFIER] line=75 order=1 code=sign
N18: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET
N19: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff
N20: 30064771082 [CALL] line=75 order=1 code=sign * coeff
N21: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor
N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor
N23: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset
N24: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset
N25: 90194313217 [LITERAL] line=75 order=2 code=2
N26: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2
N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
N28: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)

CFG edges:
N01 -> N02: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff
N02 -> N03: coeff ==> gb
N03 -> N04: gb ==> dirac_get_se_golomb(gb)
N04 -> N05: dirac_get_se_golomb(gb) ==> coeff = dirac_get_se_golomb(gb)
N05 -> N06: coeff = dirac_get_se_golomb(gb) ==> sign
N06 -> N07: sign ==> coeff
N07 -> N08: coeff ==> FFSIGN(coeff)
N08 -> N09: FFSIGN(coeff) ==> sign = FFSIGN(coeff)
N09 -> N10: sign = FFSIGN(coeff) ==> coeff
N10 -> N11: coeff ==> 0
N11 -> N12: 0 ==> coeff != 0
N12 -> N13: coeff != 0 ==> coeff
N12 -> N14: coeff != 0 ==> coeff
N13 -> N15: coeff ==> sign
N14 -> N16: coeff ==> return coeff;
N15 -> N17: sign ==> sign
N16 -> N18: return coeff; ==> RET
N17 -> N19: sign ==> coeff
N19 -> N20: coeff ==> sign * coeff
N20 -> N21: sign * coeff ==> qfactor
N21 -> N22: qfactor ==> sign * coeff * qfactor
N22 -> N23: sign * coeff * qfactor ==> qoffset
N23 -> N24: qoffset ==> sign * coeff * qfactor + qoffset
N24 -> N25: sign * coeff * qfactor + qoffset ==> 2
N25 -> N26: 2 ==> (sign * coeff * qfactor + qoffset) >> 2
N26 -> N27: (sign * coeff * qfactor + qoffset) >> 2 ==> sign*((sign * coeff * qfactor + qoffset) >> 2)
N27 -> N28: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
N28 -> N14: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff

对应流程图如下

flowchart TB
    accTitle: coeff_unpack_golomb CFG
    accDescr: Simplified control flow for coeff_unpack_golomb, showing the condition branch and the merge at the return statement.

    entry([entry: coeff_unpack_golomb]) --> read_coeff["coeff = dirac_get_se_golomb(gb)"]
    read_coeff --> read_sign["sign = FFSIGN(coeff)"]
    read_sign --> test{"coeff != 0?"}
    test -->|true| scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
    test -->|false| return_coeff["return coeff"]
    scale_coeff --> return_coeff
    return_coeff --> exit_node([RET])

    classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
    classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
    classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12

    class entry,exit_node terminal
    class read_coeff,read_sign,scale_coeff,return_coeff process
    class test decision
Simplified node Original Joern CFG nodes
entry METHOD node 107374182400
read_coeff line 69 identifier/function-call/assignment nodes 68719476736, 68719476737, 30064771073, 30064771072
read_sign line 71 identifier/function-call/assignment nodes 68719476738, 68719476739, 30064771075, 30064771074
test line 73 condition nodes 68719476740, 90194313216, 30064771076
scale_coeff line 75 expression and assignment nodes 68719476741 through 30064771077
return_coeff line 77 nodes 68719476747, 141733920768
exit_node METHOD_RETURN node 124554051584

PDG

原始内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
PDG nodes:
N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ...
N02: 111669149696 [METHOD_PARAMETER_IN] line=65 order=1 code=GetBitContext *gb
N03: 115964116992 [METHOD_PARAMETER_OUT] line=65 order=1 code=GetBitContext *gb
N04: 111669149697 [METHOD_PARAMETER_IN] line=65 order=2 code=int qfactor
N05: 115964116993 [METHOD_PARAMETER_OUT] line=65 order=2 code=int qfactor
N06: 111669149698 [METHOD_PARAMETER_IN] line=65 order=3 code=int qoffset
N07: 115964116994 [METHOD_PARAMETER_OUT] line=65 order=3 code=int qoffset
N08: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET
N09: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff
N10: 68719476737 [IDENTIFIER] line=69 order=1 code=gb
N11: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb)
N12: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb)
N13: 68719476738 [IDENTIFIER] line=71 order=1 code=sign
N14: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff
N15: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff)
N16: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff)
N17: 30064771076 [CALL] line=73 order=1 code=coeff != 0
N18: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff
N19: 90194313216 [LITERAL] line=73 order=2 code=0
N20: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
N21: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset
N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor
N23: 30064771082 [CALL] line=75 order=1 code=sign * coeff
N24: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff
N25: 68719476742 [IDENTIFIER] line=75 order=1 code=sign
N26: 68719476743 [IDENTIFIER] line=75 order=1 code=sign
N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2)
N28: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2
N29: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff
N30: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor
N31: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset
N32: 90194313217 [LITERAL] line=75 order=2 code=2
N33: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff
N34: 141733920768 [RETURN] line=77 order=6 code=return coeff;

PDG edges:
N17 -> N20 [CDG]: coeff != 0 ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
N17 -> N21 [CDG]: coeff != 0 ==> sign * coeff * qfactor + qoffset
N17 -> N22 [CDG]: coeff != 0 ==> sign * coeff * qfactor
N17 -> N23 [CDG]: coeff != 0 ==> sign * coeff
N17 -> N24 [CDG]: coeff != 0 ==> coeff
N17 -> N25 [CDG]: coeff != 0 ==> sign
N17 -> N26 [CDG]: coeff != 0 ==> sign
N17 -> N27 [CDG]: coeff != 0 ==> sign*((sign * coeff * qfactor + qoffset) >> 2)
N17 -> N28 [CDG]: coeff != 0 ==> (sign * coeff * qfactor + qoffset) >> 2
N17 -> N29 [CDG]: coeff != 0 ==> coeff
N17 -> N30 [CDG]: coeff != 0 ==> qfactor
N17 -> N31 [CDG]: coeff != 0 ==> qoffset
N17 -> N32 [CDG]: coeff != 0 ==> 2
N01 -> N02 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> GetBitContext *gb
N01 -> N04 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> int qfactor
N01 -> N06 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> int qoffset
N01 -> N10 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> gb
N01 -> N14 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff
N01 -> N18 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff
N01 -> N19 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> 0
N01 -> N25 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> sign
N01 -> N26 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> sign
N01 -> N29 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff
N01 -> N30 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> qfactor
N01 -> N31 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> qoffset
N01 -> N32 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> 2
N01 -> N33 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff
N02 -> N03 [REACHING_DEF property=gb]: GetBitContext *gb ==> GetBitContext *gb
N02 -> N10 [REACHING_DEF property=gb]: GetBitContext *gb ==> gb
N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor ==> int qfactor
N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor ==> int qfactor
N04 -> N08 [REACHING_DEF property=qfactor]: int qfactor ==> RET
N04 -> N30 [REACHING_DEF property=qfactor]: int qfactor ==> qfactor
N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset ==> int qoffset
N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset ==> int qoffset
N06 -> N08 [REACHING_DEF property=qoffset]: int qoffset ==> RET
N06 -> N31 [REACHING_DEF property=qoffset]: int qoffset ==> qoffset
N09 -> N11 [REACHING_DEF property=coeff]: coeff ==> coeff = dirac_get_se_golomb(gb)
N09 -> N14 [REACHING_DEF property=coeff]: coeff ==> coeff
N10 -> N03 [REACHING_DEF property=gb]: gb ==> GetBitContext *gb
N10 -> N08 [REACHING_DEF property=gb]: gb ==> RET
N10 -> N12 [REACHING_DEF property=gb]: gb ==> dirac_get_se_golomb(gb)
N11 -> N08 [REACHING_DEF property=coeff = dirac_get_se_golomb(gb)]: coeff = dirac_get_se_golomb(gb) ==> RET
N12 -> N08 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> RET
N12 -> N09 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> coeff
N12 -> N11 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> coeff = dirac_get_se_golomb(gb)
N13 -> N08 [REACHING_DEF property=sign]: sign ==> RET
N13 -> N16 [REACHING_DEF property=sign]: sign ==> sign = FFSIGN(coeff)
N13 -> N26 [REACHING_DEF property=sign]: sign ==> sign
N14 -> N15 [REACHING_DEF property=coeff]: coeff ==> FFSIGN(coeff)
N14 -> N18 [REACHING_DEF property=coeff]: coeff ==> coeff
N15 -> N08 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> RET
N15 -> N13 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> sign
N15 -> N16 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> sign = FFSIGN(coeff)
N16 -> N08 [REACHING_DEF property=sign = FFSIGN(coeff)]: sign = FFSIGN(coeff) ==> RET
N17 -> N08 [REACHING_DEF property=coeff != 0]: coeff != 0 ==> RET
N18 -> N08 [REACHING_DEF property=coeff]: coeff ==> RET
N18 -> N17 [REACHING_DEF property=coeff]: coeff ==> coeff != 0
N18 -> N29 [REACHING_DEF property=coeff]: coeff ==> coeff
N18 -> N33 [REACHING_DEF property=coeff]: coeff ==> coeff
N19 -> N17 [REACHING_DEF property=0]: 0 ==> coeff != 0
N19 -> N18 [REACHING_DEF property=0]: 0 ==> coeff
N20 -> N08 [REACHING_DEF property=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)]: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) ==> RET
N21 -> N08 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset ==> RET
N21 -> N28 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset ==> (sign * coeff * qfactor + qoffset) >> 2
N22 -> N08 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor ==> RET
N22 -> N21 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor ==> sign * coeff * qfactor + qoffset
N23 -> N08 [REACHING_DEF property=sign * coeff]: sign * coeff ==> RET
N23 -> N22 [REACHING_DEF property=sign * coeff]: sign * coeff ==> sign * coeff * qfactor
N23 -> N30 [REACHING_DEF property=sign * coeff]: sign * coeff ==> qfactor
N24 -> N08 [REACHING_DEF property=coeff]: coeff ==> RET
N24 -> N20 [REACHING_DEF property=coeff]: coeff ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
N24 -> N33 [REACHING_DEF property=coeff]: coeff ==> coeff
N25 -> N08 [REACHING_DEF property=sign]: sign ==> RET
N25 -> N27 [REACHING_DEF property=sign]: sign ==> sign*((sign * coeff * qfactor + qoffset) >> 2)
N25 -> N28 [REACHING_DEF property=sign]: sign ==> (sign * coeff * qfactor + qoffset) >> 2
N26 -> N23 [REACHING_DEF property=sign]: sign ==> sign * coeff
N26 -> N25 [REACHING_DEF property=sign]: sign ==> sign
N26 -> N29 [REACHING_DEF property=sign]: sign ==> coeff
N27 -> N08 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> RET
N27 -> N20 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
N27 -> N24 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff
N28 -> N08 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> RET
N28 -> N25 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> sign
N28 -> N27 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> sign*((sign * coeff * qfactor + qoffset) >> 2)
N29 -> N23 [REACHING_DEF property=coeff]: coeff ==> sign * coeff
N29 -> N26 [REACHING_DEF property=coeff]: coeff ==> sign
N30 -> N05 [REACHING_DEF property=qfactor]: qfactor ==> int qfactor
N30 -> N08 [REACHING_DEF property=qfactor]: qfactor ==> RET
N30 -> N22 [REACHING_DEF property=qfactor]: qfactor ==> sign * coeff * qfactor
N30 -> N23 [REACHING_DEF property=qfactor]: qfactor ==> sign * coeff
N31 -> N07 [REACHING_DEF property=qoffset]: qoffset ==> int qoffset
N31 -> N08 [REACHING_DEF property=qoffset]: qoffset ==> RET
N31 -> N21 [REACHING_DEF property=qoffset]: qoffset ==> sign * coeff * qfactor + qoffset
N32 -> N21 [REACHING_DEF property=2]: 2 ==> sign * coeff * qfactor + qoffset
N32 -> N28 [REACHING_DEF property=2]: 2 ==> (sign * coeff * qfactor + qoffset) >> 2
N33 -> N34 [REACHING_DEF property=coeff]: coeff ==> return coeff;
N34 -> N08 [REACHING_DEF property=<RET>]: return coeff; ==> RET

对应可视化如下

flowchart LR
    accTitle: coeff_unpack_golomb PDG
    accDescr: Simplified program-dependence graph showing data dependencies from parameters and assignments, plus the control dependency from the if condition to the guarded assignment.

    gb_param([gb parameter])
    qfactor_param([qfactor parameter])
    qoffset_param([qoffset parameter])

    read_coeff["coeff = dirac_get_se_golomb(gb)"]
    read_sign["sign = FFSIGN(coeff)"]
    test{"coeff != 0?"}
    scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
    return_coeff["return coeff"]
    exit_node([RET])

    gb_param -->|"data: gb"| read_coeff
    read_coeff -->|"data: coeff"| read_sign
    read_coeff -->|"data: coeff"| test
    read_coeff -->|"data: coeff"| scale_coeff
    read_sign -->|"data: sign"| scale_coeff
    qfactor_param -->|"data: qfactor"| scale_coeff
    qoffset_param -->|"data: qoffset"| scale_coeff
    test -.->|"control"| scale_coeff
    read_coeff -->|"data: coeff if false"| return_coeff
    scale_coeff -->|"data: coeff if true"| return_coeff
    return_coeff -->|"return value"| exit_node

    classDef param fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
    classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
    classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
    classDef exit fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d

    class gb_param,qfactor_param,qoffset_param param
    class read_coeff,read_sign,scale_coeff,return_coeff process
    class test decision
    class exit_node exit
Dependency type Meaning in this function
REACHING_DEF A value definition can reach a later use, such as coeff reaching FFSIGN(coeff) or return coeff
CDG A node is executed only under a control condition, here the line 75 assignment is controlled by coeff != 0
Simplified dependency Original Joern relation
gb parameter -> coeff = dirac_get_se_golomb(gb) REACHING_DEF through the gb identifier and call node
coeff = dirac_get_se_golomb(gb) -> sign = FFSIGN(coeff) REACHING_DEF for coeff
coeff = dirac_get_se_golomb(gb) -> coeff != 0 REACHING_DEF for coeff
coeff != 0 -> line 75 assignment CDG from condition node 30064771076 to line 75 expression nodes
sign, coeff, qfactor, qoffset -> line 75 assignment REACHING_DEF into the scale expression
initial or updated coeff -> return coeff REACHING_DEF into the return value / method return