从代码到图
抽象语法树
AST强调程序由哪些结构嵌套组成。它不保留所有表面语法细节,例如括号、分号和部分语法糖。
论文中将AST描述为有序树:其中内部节点通常是操作符或语法结构,叶节点通常是常量、标识符等操作数。
Joern AST
在Joern CPG中,AST子节点通过AST边连接,节点通常带有:
- CODE:节点对应的原始代码片段
- ORDER:该节点在兄弟节点中的次序
- LINE_NUMBER:起始行号
- 节点类型,例如METHOD、BLOCK、CALL、IDENTIFIER、LITERAL、CONTROL_STRUCTURE
优势
语法结构清晰
适合识别局部漏洞模式
易于生成、结构相对稳定
也就是说,AST更适合用于检测局部危险API模式、指针和数组访问模式、
算数表达模式、类型转换异常等与语法强相关的漏洞候选点。
局限
只能知道代码长什么样,不知道代码如何运行,数据如何流动。也就是说,AST不显式编码控制流和数据依赖,因此不适合独立完成死代码检测、未初始化变量分析等更复杂的任务。
控制流图
控制流图是一个有向图,它表示当前语句执行完以后,下一句话可能执行哪条语句。它以基本快(basic block)为节点,以控制流转移(比如if/else,while、goto)为边,描述了程序所有可能的执行路径
Joern CFG
这里的CFG只由一部分AST节点构成,同时CFG表示控制从源节点流向目的节点。Joern还会建模表达式内部的执行顺序,而不只是语句之间的顺序。
优势
- 显式表示可能的执行路径
- 一个风险函数出现不代表程序有漏洞,危险操作只有在条件满足可达时才能报警
- 适合分析检查逻辑
- 适合检查边界检查、空指针检查、权限检查、错误处理、资源释放路径、条件缺失等等这类模式
- 适合提取执行路径序列
局限
CFG知道谁先执行,但不知道数据是谁传给谁的。例如
1 2 3 4 5
| int size = get_user_input(); int x = 0; int y = calculate(); log(y); memcpy(buf, src, size);
|
CFG能够完整保留执行顺序
1
| get_user_input → x = 0 → calculate → log → memcpy
|
但没有显式说明真正和memcpy风险相关的是size,而不是x或y。因此,长函数中大量与漏洞无关的语句会成为噪声。
程序依赖图
PDG不再关心下一条执行什么,而是关心某条语句是否会受到另一条语句的影响。PDG通常由DDG和CDG组成,其中前者是数据依赖图,后者是控制依赖图。
DDG
例如
1 2 3
| int n = get_user_input(); int size = n * 2; memcpy(buf, src, size);
|
数据依赖可以表示如下
1 2 3 4 5
| n = get_user_input() ↓ n size = n * 2 ↓ size memcpy(buf, src, size)
|
其中size = n2依赖于get_user_input()。而memcpy又依赖于size = n\2,在Joern中这类边被称为 REACHING_DEF,意思是源节点产生的变量在沿途没有被重新赋值的情况下可以达到目标节点
同时REACHING_DEF边还带有 VARIABLE 属性, 用于记录传播的是哪个变量
CDG
例如
1 2 3
| if (n > 0) { memcpy(buf, src, n); }
|
其中mcmcpy是否执行,依赖于 n>0,因此存在控制依赖
1 2 3
| if (n > 0) ↓ CDG memcpy(buf, src, n)
|
在Joern中,这类关系通过CDG边表示,边的定义为目标节点在控制上依赖于源节点
优势
能跨越长距离
能够压缩无关代码
- 适合污点传播于程序切片
局限
- 丢失完整执行顺序
- PDG表示的是依赖,而不是先后次序。如果两个函数之间没有数据依赖和控制依赖,PDG可能不会显式连接它们。
- 静态分析存在近似误差
- 当静态分析出现指针,指针的内容是一个函数返回地址,且静态分析无法知道具体地址,于是只能做保守推断这个地址可能是上面的某个变量,于是出现多条路径。尽管真实运行的时候这个指针只指向其中一个对象。这个问题也叫做 过度近似
- 生成成本更高
数据预处理
无论输入CNN还是GNN,都需要先完成三个步骤
1 2 3 4 5 6 7 8 9 10 11
| 原始代码 ↓ Joern 解析 ↓ CPG ↓ 筛选节点和边 ↓ 节点文本归一化 ↓ 将文本转换成向量
|
Joern解析步骤我们就省略了,F&Q数据集每个样本是一个独立函数,这个筛选目标函数也可以省略。、
节点文本归一化
也就是说将变量名统一为VAR_1、VAR_2等等。但是不能过度归一化,对于函数名、运算符和常量等等需要保留。
文本转换为向量
神经网络不能直接处理字符串,必须将节点文本编码成数值向量。流行的编码方法有两种
Word2Vec
1 2 3 4 5 6 7 8 9
| CALL memcpy ↓ 分词 ↓ ["CALL", "memcpy"] ↓ 词向量平均或池化 ↓ x_i ∈ R^128
|
CodeBERT
1 2 3 4 5 6 7 8 9
| CALL memcpy ↓ Tokenizer ↓ CodeBERT ↓ 池化 ↓ x_i ∈ R^768
|
输入CNN
普通一维CNN的输入通常是规则张量,其中L是序列长度,d是每个未知的向量维度。对于一个自然语言句子来说,可以转换为:
1 2 3 4 5
| 我 / 喜欢 / 深度 / 学习 ↓ 四个词向量 ↓ X ∈ R^(4 × d)
|
但程序图并不是序列。因为一个节点可能连接多个节点;不同函数节点数量不同;CFG可能存在循环;PDG中节点没有天然线性顺序。因此要使用CNN必须要先人为规定一种线性化策略
文本CNN使用卷积核捕捉响铃token 的局部组合模式,再通过池化获得整体表示。这类架构最初广泛应用于句子级分类
我们考虑一个例子。它源码如下:
1 2 3 4 5 6 7
| void copy_data(char *src, int n) { char buf[16];
if (n > 0) { memcpy(buf, src, n); } }
|
漏洞在于 n > 16 存在栈溢出问题。
对此我们有三种中间表示
AST
1 2 3 4 5 6 7 8 9 10 11
| METHOD: copy_data └── BLOCK ├── LOCAL: buf[16] └── CONTROL_STRUCTURE: if ├── CONDITION: n > 0 │ ├── IDENTIFIER: n │ └── LITERAL: 0 └── CALL: memcpy ├── IDENTIFIER: buf ├── IDENTIFIER: src └── IDENTIFIER: n
|
CFG
1 2 3 4 5 6 7
| ENTRY ↓ buf[16] ↓ n > 0 ? ├── True → memcpy(buf, src, n) → EXIT └── False ─────────────────────→ EXIT
|
PDG
1 2 3 4 5 6 7 8 9
| PARAMETER: n ├── REACHING_DEF → CONDITION: n > 0 └── REACHING_DEF → ARGUMENT: n in memcpy
PARAMETER: src └── REACHING_DEF → ARGUMENT: src in memcpy
CONDITION: n > 0 └── CDG → CALL: memcpy(buf, src, n)
|
AST输入CNN
对AST进行前序遍历(根、左、右),我们得到一个线性的序列。
1 2 3 4 5 6 7 8 9 10 11
| METHOD copy_data BLOCK LOCAL buf[16] CONTROL_STRUCTURE if CALL > IDENTIFIER n LITERAL 0 CALL memcpy IDENTIFIER buf IDENTIFIER src IDENTIFIER n
|
也就是说是一个长度为11的向量,我们定义每个节点维度是128,那么输入大小就是[11, 128]。
在这个张量上,我们可以直接输入CNN,然后做卷积。对于卷积操作就不多说了。然后进行最大/平均池化
这样做优点是简单,能够捕捉危险函数和表达式。缺点是AST被压平后树结构会部分丢失,例如不能识别出buf、src、n是memcpy函数的参数。但也有解决方法,就是加入结构标记
CFG输入CNN
CFG是一个有向图,可以通过DFS获取控制流路径。比如上述例子可以提取两条路径
1 2 3 4 5
| Path 1: ENTRY → buf[16] → n > 0 → memcpy(buf, src, n) → EXIT
Path 2: ENTRY → buf[16] → n > 0 → EXIT
|
每条路径都转换成一个节点序列,也就是说这里有两个向量,大小分别是[5, 128]和[4, 128]。我们将这两个向量分别输入同一个CNN,最后聚合。
通过这种方法可以识别控制流中的局部执行模式,但存在路径爆炸问题。
PDG输入CNN
PDG最适合使用程序切片。VulDeePecker提出code gadget就是选取语义相关但不一定在源码中连续的代码行,再将其向量化用于深度学习。SySeVR进一步强调了漏洞相关语法信息于语义信息的结合,并将这些信息转换为适合深度学习的向量表示。
程序切片可以理解为:删除于漏洞无关的代码,只保留可能影响关键操作的语义链条。对于上述例子,我们可以围绕memcpy()函数做PDG反向切片,只保留能够影响它的语句
PDG切片的构造过程
首先我们选定危险节点。这一步也叫 sink
然后我们沿 REACHING_DEF 反向追踪
1 2 3 4 5
| memcpy 的第三个参数 n ↑ n 的定义 ↑ get_user_input()
|
再沿着CDG补充控制条件
最终得到切片
1 2 3 4
| get_user_input() → n → n > 0 → memcpy(buf, src, n)
|
和AST先比,PDG输入通常更短,但是语义密度更高。它适合学习漏洞传播模式
输入GNN
GNN不需要将图强行压平为一维序列。因为它直接接收节点、边、节点特征、边类型。图卷积网络可以直接作用于图结构数据,并通过局部领邻域聚合学习节点表示
输入张量
对于一个函数图,定义
其中V是节点集合、E是边集合、X是节点特征矩阵。假设图中有N(120)个节点,每个节点特征维度为d(160),那么节点特征的大小为[120, 160],其中特征维度由两部分拼接
第一项为文本特征,通过Word2Vec或者CodeBERT得到,维度为128维。第二项是节点类型特征,是对应的Embedding,维度为32。
在PyTorch Geometric中,图通常使用稀疏边列表表示,参考稀疏矩阵三元组表示。
将AST输入GNN
AST通常是有向树的形式,消息默认沿父节点流向子节点。如果我们希望参数信息也能回传给父节点(CALL memcpy节点),可以手动增加反向边。
将CFG输入GNN
同样加入反向边
将PDF输入GNn
多关系R-GNC
实例
对于267_1.c的源代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset)
{
int coeff = dirac_get_se_golomb(gb);
const int sign = FFSIGN(coeff);
if (coeff)
coeff = sign*((sign * coeff * qfactor + qoffset) >> 2);
return coeff;
}
|
这里存在符号整数溢出漏洞和。漏洞处具体在 coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2); 的 sign * coeff * qfactor 这个表达式。因为可以狗仔而已Dirac视频码流使得 dirac_get_se_golomb(gb) 返回异常大的coeff。
AST
完整AST如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| - 107374182400 [METHOD] order=1 line=65 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign... - 111669149696 [METHOD_PARAMETER_IN] order=1 line=65 code=GetBitContext *gb - 115964116992 [METHOD_PARAMETER_OUT] order=1 line=65 code=GetBitContext *gb - 111669149697 [METHOD_PARAMETER_IN] order=2 line=65 code=int qfactor - 115964116993 [METHOD_PARAMETER_OUT] order=2 line=65 code=int qfactor - 111669149698 [METHOD_PARAMETER_IN] order=3 line=65 code=int qoffset - 115964116994 [METHOD_PARAMETER_OUT] order=3 line=65 code=int qoffset - 25769803776 [BLOCK] order=4 line=67 code={ int coeff = dirac_get_se_golomb(gb); const int sign = FFSIGN(coeff); if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2); return coeff; } - 94489280512 [LOCAL] order=1 line=69 code=int coeff - 30064771072 [CALL] order=2 line=69 code=coeff = dirac_get_se_golomb(gb) - 68719476736 [IDENTIFIER] order=1 line=69 code=coeff - 30064771073 [CALL] order=2 line=69 code=dirac_get_se_golomb(gb) - 68719476737 [IDENTIFIER] order=1 line=69 code=gb - 94489280513 [LOCAL] order=3 line=71 code=const int sign - 30064771074 [CALL] order=4 line=71 code=sign = FFSIGN(coeff) - 68719476738 [IDENTIFIER] order=1 line=71 code=sign - 30064771075 [CALL] order=2 line=71 code=FFSIGN(coeff) - 68719476739 [IDENTIFIER] order=1 line=71 code=coeff - 47244640256 [CONTROL_STRUCTURE] order=5 line=73 code=if (coeff) coeff = sign*((sign * coeff * qfactor + qoffset) >> 2); - 30064771076 [CALL] order=1 line=73 code=coeff != 0 - 68719476740 [IDENTIFIER] order=1 line=73 code=coeff - 90194313216 [LITERAL] order=2 line=73 code=0 - 25769803777 [BLOCK] order=2 line=75 code=<empty> - 30064771077 [CALL] order=1 line=75 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) - 68719476741 [IDENTIFIER] order=1 line=75 code=coeff - 30064771078 [CALL] order=2 line=75 code=sign*((sign * coeff * qfactor + qoffset) >> 2) - 68719476742 [IDENTIFIER] order=1 line=75 code=sign - 30064771079 [CALL] order=2 line=75 code=(sign * coeff * qfactor + qoffset) >> 2 - 30064771080 [CALL] order=1 line=75 code=sign * coeff * qfactor + qoffset - 30064771081 [CALL] order=1 line=75 code=sign * coeff * qfactor - 30064771082 [CALL] order=1 line=75 code=sign * coeff - 68719476743 [IDENTIFIER] order=1 line=75 code=sign - 68719476744 [IDENTIFIER] order=2 line=75 code=coeff - 68719476745 [IDENTIFIER] order=2 line=75 code=qfactor - 68719476746 [IDENTIFIER] order=2 line=75 code=qoffset - 90194313217 [LITERAL] order=2 line=75 code=2 - 141733920768 [RETURN] order=6 line=77 code=return coeff; - 68719476747 [IDENTIFIER] order=1 line=77 code=coeff - 128849018880 [MODIFIER] order=5 line=65 code=<empty> - 124554051584 [METHOD_RETURN] order=6 line=65 code=RET
|
我们可以这样读每一行
1
| 缩进 - 节点ID [节点类型] order=兄弟节点顺序 line=源码行号 code=对应代码片段
|
缩进越深,表示它是上面节点的子节点。我们可以先忽略超长节点ID,只看 节点类型 和 code=,可以缩略成以下整体结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| METHOD coeff_unpack_golomb METHOD_PARAMETER_IN gb METHOD_PARAMETER_OUT gb METHOD_PARAMETER_IN qfactor METHOD_PARAMETER_OUT qfactor METHOD_PARAMETER_IN qoffset METHOD_PARAMETER_OUT qoffset BLOCK LOCAL int coeff CALL coeff = dirac_get_se_golomb(gb) LOCAL const int sign CALL sign = FFSIGN(coeff) CONTROL_STRUCTURE if (coeff) CALL coeff != 0 BLOCK CALL coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) RETURN return coeff MODIFIER METHOD_RETURN RET
|
其中
[METHOD] 是函数根节点,对应整个 coeff_unpack_golomb。
[METHOD_PARAMETER_IN] / [METHOD_PARAMETER_OUT] 成对出现,是 Joern/CPG 为数据流建模准备的,不是源码里真的写了两遍参数。
[BLOCK] 是函数体 { ... }。
[LOCAL] 表示局部变量声明,比如 int coeff、const int sign。
[CALL] 不只表示普通函数调用,也表示运算符调用。比如赋值 =, 乘法 *, 加法 +, 右移 >> 都会被 Joern 表示成 CALL 节点。
[CONTROL_STRUCTURE] 是控制结构,比如 if。
CFG
完整内容如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
| CFG nodes: N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ... N02: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff N03: 68719476737 [IDENTIFIER] line=69 order=1 code=gb N04: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb) N05: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb) N06: 68719476738 [IDENTIFIER] line=71 order=1 code=sign N07: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff N08: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff) N09: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff) N10: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff N11: 90194313216 [LITERAL] line=73 order=2 code=0 N12: 30064771076 [CALL] line=73 order=1 code=coeff != 0 N13: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff N14: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff N15: 68719476742 [IDENTIFIER] line=75 order=1 code=sign N16: 141733920768 [RETURN] line=77 order=6 code=return coeff; N17: 68719476743 [IDENTIFIER] line=75 order=1 code=sign N18: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET N19: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff N20: 30064771082 [CALL] line=75 order=1 code=sign * coeff N21: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor N23: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset N24: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset N25: 90194313217 [LITERAL] line=75 order=2 code=2 N26: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2 N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2) N28: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)
CFG edges: N01 -> N02: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff N02 -> N03: coeff ==> gb N03 -> N04: gb ==> dirac_get_se_golomb(gb) N04 -> N05: dirac_get_se_golomb(gb) ==> coeff = dirac_get_se_golomb(gb) N05 -> N06: coeff = dirac_get_se_golomb(gb) ==> sign N06 -> N07: sign ==> coeff N07 -> N08: coeff ==> FFSIGN(coeff) N08 -> N09: FFSIGN(coeff) ==> sign = FFSIGN(coeff) N09 -> N10: sign = FFSIGN(coeff) ==> coeff N10 -> N11: coeff ==> 0 N11 -> N12: 0 ==> coeff != 0 N12 -> N13: coeff != 0 ==> coeff N12 -> N14: coeff != 0 ==> coeff N13 -> N15: coeff ==> sign N14 -> N16: coeff ==> return coeff; N15 -> N17: sign ==> sign N16 -> N18: return coeff; ==> RET N17 -> N19: sign ==> coeff N19 -> N20: coeff ==> sign * coeff N20 -> N21: sign * coeff ==> qfactor N21 -> N22: qfactor ==> sign * coeff * qfactor N22 -> N23: sign * coeff * qfactor ==> qoffset N23 -> N24: qoffset ==> sign * coeff * qfactor + qoffset N24 -> N25: sign * coeff * qfactor + qoffset ==> 2 N25 -> N26: 2 ==> (sign * coeff * qfactor + qoffset) >> 2 N26 -> N27: (sign * coeff * qfactor + qoffset) >> 2 ==> sign*((sign * coeff * qfactor + qoffset) >> 2) N27 -> N28: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) N28 -> N14: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff
|
对应流程图如下
flowchart TB
accTitle: coeff_unpack_golomb CFG
accDescr: Simplified control flow for coeff_unpack_golomb, showing the condition branch and the merge at the return statement.
entry([entry: coeff_unpack_golomb]) --> read_coeff["coeff = dirac_get_se_golomb(gb)"]
read_coeff --> read_sign["sign = FFSIGN(coeff)"]
read_sign --> test{"coeff != 0?"}
test -->|true| scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
test -->|false| return_coeff["return coeff"]
scale_coeff --> return_coeff
return_coeff --> exit_node([RET])
classDef terminal fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
class entry,exit_node terminal
class read_coeff,read_sign,scale_coeff,return_coeff process
class test decision
| Simplified node |
Original Joern CFG nodes |
entry |
METHOD node 107374182400 |
read_coeff |
line 69 identifier/function-call/assignment nodes 68719476736, 68719476737, 30064771073, 30064771072 |
read_sign |
line 71 identifier/function-call/assignment nodes 68719476738, 68719476739, 30064771075, 30064771074 |
test |
line 73 condition nodes 68719476740, 90194313216, 30064771076 |
scale_coeff |
line 75 expression and assignment nodes 68719476741 through 30064771077 |
return_coeff |
line 77 nodes 68719476747, 141733920768 |
exit_node |
METHOD_RETURN node 124554051584 |
PDG
原始内容如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
| PDG nodes: N01: 107374182400 [METHOD] line=65 order=1 code=static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qoffset) { int coeff = dirac_get_se_golomb(gb); const int sign ... N02: 111669149696 [METHOD_PARAMETER_IN] line=65 order=1 code=GetBitContext *gb N03: 115964116992 [METHOD_PARAMETER_OUT] line=65 order=1 code=GetBitContext *gb N04: 111669149697 [METHOD_PARAMETER_IN] line=65 order=2 code=int qfactor N05: 115964116993 [METHOD_PARAMETER_OUT] line=65 order=2 code=int qfactor N06: 111669149698 [METHOD_PARAMETER_IN] line=65 order=3 code=int qoffset N07: 115964116994 [METHOD_PARAMETER_OUT] line=65 order=3 code=int qoffset N08: 124554051584 [METHOD_RETURN] line=65 order=6 code=RET N09: 68719476736 [IDENTIFIER] line=69 order=1 code=coeff N10: 68719476737 [IDENTIFIER] line=69 order=1 code=gb N11: 30064771072 [CALL] line=69 order=2 code=coeff = dirac_get_se_golomb(gb) N12: 30064771073 [CALL] line=69 order=2 code=dirac_get_se_golomb(gb) N13: 68719476738 [IDENTIFIER] line=71 order=1 code=sign N14: 68719476739 [IDENTIFIER] line=71 order=1 code=coeff N15: 30064771075 [CALL] line=71 order=2 code=FFSIGN(coeff) N16: 30064771074 [CALL] line=71 order=4 code=sign = FFSIGN(coeff) N17: 30064771076 [CALL] line=73 order=1 code=coeff != 0 N18: 68719476740 [IDENTIFIER] line=73 order=1 code=coeff N19: 90194313216 [LITERAL] line=73 order=2 code=0 N20: 30064771077 [CALL] line=75 order=1 code=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) N21: 30064771080 [CALL] line=75 order=1 code=sign * coeff * qfactor + qoffset N22: 30064771081 [CALL] line=75 order=1 code=sign * coeff * qfactor N23: 30064771082 [CALL] line=75 order=1 code=sign * coeff N24: 68719476741 [IDENTIFIER] line=75 order=1 code=coeff N25: 68719476742 [IDENTIFIER] line=75 order=1 code=sign N26: 68719476743 [IDENTIFIER] line=75 order=1 code=sign N27: 30064771078 [CALL] line=75 order=2 code=sign*((sign * coeff * qfactor + qoffset) >> 2) N28: 30064771079 [CALL] line=75 order=2 code=(sign * coeff * qfactor + qoffset) >> 2 N29: 68719476744 [IDENTIFIER] line=75 order=2 code=coeff N30: 68719476745 [IDENTIFIER] line=75 order=2 code=qfactor N31: 68719476746 [IDENTIFIER] line=75 order=2 code=qoffset N32: 90194313217 [LITERAL] line=75 order=2 code=2 N33: 68719476747 [IDENTIFIER] line=77 order=1 code=coeff N34: 141733920768 [RETURN] line=77 order=6 code=return coeff;
PDG edges: N17 -> N20 [CDG]: coeff != 0 ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) N17 -> N21 [CDG]: coeff != 0 ==> sign * coeff * qfactor + qoffset N17 -> N22 [CDG]: coeff != 0 ==> sign * coeff * qfactor N17 -> N23 [CDG]: coeff != 0 ==> sign * coeff N17 -> N24 [CDG]: coeff != 0 ==> coeff N17 -> N25 [CDG]: coeff != 0 ==> sign N17 -> N26 [CDG]: coeff != 0 ==> sign N17 -> N27 [CDG]: coeff != 0 ==> sign*((sign * coeff * qfactor + qoffset) >> 2) N17 -> N28 [CDG]: coeff != 0 ==> (sign * coeff * qfactor + qoffset) >> 2 N17 -> N29 [CDG]: coeff != 0 ==> coeff N17 -> N30 [CDG]: coeff != 0 ==> qfactor N17 -> N31 [CDG]: coeff != 0 ==> qoffset N17 -> N32 [CDG]: coeff != 0 ==> 2 N01 -> N02 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> GetBitContext *gb N01 -> N04 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> int qfactor N01 -> N06 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> int qoffset N01 -> N10 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> gb N01 -> N14 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff N01 -> N18 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff N01 -> N19 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> 0 N01 -> N25 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> sign N01 -> N26 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> sign N01 -> N29 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff N01 -> N30 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> qfactor N01 -> N31 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> qoffset N01 -> N32 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> 2 N01 -> N33 [REACHING_DEF]: static inline int coeff_unpack_golomb(GetBitContext *gb, int qfactor, int qof... ==> coeff N02 -> N03 [REACHING_DEF property=gb]: GetBitContext *gb ==> GetBitContext *gb N02 -> N10 [REACHING_DEF property=gb]: GetBitContext *gb ==> gb N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor ==> int qfactor N04 -> N05 [REACHING_DEF property=qfactor]: int qfactor ==> int qfactor N04 -> N08 [REACHING_DEF property=qfactor]: int qfactor ==> RET N04 -> N30 [REACHING_DEF property=qfactor]: int qfactor ==> qfactor N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset ==> int qoffset N06 -> N07 [REACHING_DEF property=qoffset]: int qoffset ==> int qoffset N06 -> N08 [REACHING_DEF property=qoffset]: int qoffset ==> RET N06 -> N31 [REACHING_DEF property=qoffset]: int qoffset ==> qoffset N09 -> N11 [REACHING_DEF property=coeff]: coeff ==> coeff = dirac_get_se_golomb(gb) N09 -> N14 [REACHING_DEF property=coeff]: coeff ==> coeff N10 -> N03 [REACHING_DEF property=gb]: gb ==> GetBitContext *gb N10 -> N08 [REACHING_DEF property=gb]: gb ==> RET N10 -> N12 [REACHING_DEF property=gb]: gb ==> dirac_get_se_golomb(gb) N11 -> N08 [REACHING_DEF property=coeff = dirac_get_se_golomb(gb)]: coeff = dirac_get_se_golomb(gb) ==> RET N12 -> N08 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> RET N12 -> N09 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> coeff N12 -> N11 [REACHING_DEF property=dirac_get_se_golomb(gb)]: dirac_get_se_golomb(gb) ==> coeff = dirac_get_se_golomb(gb) N13 -> N08 [REACHING_DEF property=sign]: sign ==> RET N13 -> N16 [REACHING_DEF property=sign]: sign ==> sign = FFSIGN(coeff) N13 -> N26 [REACHING_DEF property=sign]: sign ==> sign N14 -> N15 [REACHING_DEF property=coeff]: coeff ==> FFSIGN(coeff) N14 -> N18 [REACHING_DEF property=coeff]: coeff ==> coeff N15 -> N08 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> RET N15 -> N13 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> sign N15 -> N16 [REACHING_DEF property=FFSIGN(coeff)]: FFSIGN(coeff) ==> sign = FFSIGN(coeff) N16 -> N08 [REACHING_DEF property=sign = FFSIGN(coeff)]: sign = FFSIGN(coeff) ==> RET N17 -> N08 [REACHING_DEF property=coeff != 0]: coeff != 0 ==> RET N18 -> N08 [REACHING_DEF property=coeff]: coeff ==> RET N18 -> N17 [REACHING_DEF property=coeff]: coeff ==> coeff != 0 N18 -> N29 [REACHING_DEF property=coeff]: coeff ==> coeff N18 -> N33 [REACHING_DEF property=coeff]: coeff ==> coeff N19 -> N17 [REACHING_DEF property=0]: 0 ==> coeff != 0 N19 -> N18 [REACHING_DEF property=0]: 0 ==> coeff N20 -> N08 [REACHING_DEF property=coeff = sign*((sign * coeff * qfactor + qoffset) >> 2)]: coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) ==> RET N21 -> N08 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset ==> RET N21 -> N28 [REACHING_DEF property=sign * coeff * qfactor + qoffset]: sign * coeff * qfactor + qoffset ==> (sign * coeff * qfactor + qoffset) >> 2 N22 -> N08 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor ==> RET N22 -> N21 [REACHING_DEF property=sign * coeff * qfactor]: sign * coeff * qfactor ==> sign * coeff * qfactor + qoffset N23 -> N08 [REACHING_DEF property=sign * coeff]: sign * coeff ==> RET N23 -> N22 [REACHING_DEF property=sign * coeff]: sign * coeff ==> sign * coeff * qfactor N23 -> N30 [REACHING_DEF property=sign * coeff]: sign * coeff ==> qfactor N24 -> N08 [REACHING_DEF property=coeff]: coeff ==> RET N24 -> N20 [REACHING_DEF property=coeff]: coeff ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) N24 -> N33 [REACHING_DEF property=coeff]: coeff ==> coeff N25 -> N08 [REACHING_DEF property=sign]: sign ==> RET N25 -> N27 [REACHING_DEF property=sign]: sign ==> sign*((sign * coeff * qfactor + qoffset) >> 2) N25 -> N28 [REACHING_DEF property=sign]: sign ==> (sign * coeff * qfactor + qoffset) >> 2 N26 -> N23 [REACHING_DEF property=sign]: sign ==> sign * coeff N26 -> N25 [REACHING_DEF property=sign]: sign ==> sign N26 -> N29 [REACHING_DEF property=sign]: sign ==> coeff N27 -> N08 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> RET N27 -> N20 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff = sign*((sign * coeff * qfactor + qoffset) >> 2) N27 -> N24 [REACHING_DEF property=sign*((sign * coeff * qfactor + qoffset) >> 2)]: sign*((sign * coeff * qfactor + qoffset) >> 2) ==> coeff N28 -> N08 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> RET N28 -> N25 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> sign N28 -> N27 [REACHING_DEF property=(sign * coeff * qfactor + qoffset) >> 2]: (sign * coeff * qfactor + qoffset) >> 2 ==> sign*((sign * coeff * qfactor + qoffset) >> 2) N29 -> N23 [REACHING_DEF property=coeff]: coeff ==> sign * coeff N29 -> N26 [REACHING_DEF property=coeff]: coeff ==> sign N30 -> N05 [REACHING_DEF property=qfactor]: qfactor ==> int qfactor N30 -> N08 [REACHING_DEF property=qfactor]: qfactor ==> RET N30 -> N22 [REACHING_DEF property=qfactor]: qfactor ==> sign * coeff * qfactor N30 -> N23 [REACHING_DEF property=qfactor]: qfactor ==> sign * coeff N31 -> N07 [REACHING_DEF property=qoffset]: qoffset ==> int qoffset N31 -> N08 [REACHING_DEF property=qoffset]: qoffset ==> RET N31 -> N21 [REACHING_DEF property=qoffset]: qoffset ==> sign * coeff * qfactor + qoffset N32 -> N21 [REACHING_DEF property=2]: 2 ==> sign * coeff * qfactor + qoffset N32 -> N28 [REACHING_DEF property=2]: 2 ==> (sign * coeff * qfactor + qoffset) >> 2 N33 -> N34 [REACHING_DEF property=coeff]: coeff ==> return coeff; N34 -> N08 [REACHING_DEF property=<RET>]: return coeff; ==> RET
|
对应可视化如下
flowchart LR
accTitle: coeff_unpack_golomb PDG
accDescr: Simplified program-dependence graph showing data dependencies from parameters and assignments, plus the control dependency from the if condition to the guarded assignment.
gb_param([gb parameter])
qfactor_param([qfactor parameter])
qoffset_param([qoffset parameter])
read_coeff["coeff = dirac_get_se_golomb(gb)"]
read_sign["sign = FFSIGN(coeff)"]
test{"coeff != 0?"}
scale_coeff["coeff = sign * ((sign * coeff * qfactor + qoffset) >> 2)"]
return_coeff["return coeff"]
exit_node([RET])
gb_param -->|"data: gb"| read_coeff
read_coeff -->|"data: coeff"| read_sign
read_coeff -->|"data: coeff"| test
read_coeff -->|"data: coeff"| scale_coeff
read_sign -->|"data: sign"| scale_coeff
qfactor_param -->|"data: qfactor"| scale_coeff
qoffset_param -->|"data: qoffset"| scale_coeff
test -.->|"control"| scale_coeff
read_coeff -->|"data: coeff if false"| return_coeff
scale_coeff -->|"data: coeff if true"| return_coeff
return_coeff -->|"return value"| exit_node
classDef param fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#1f2937
classDef process fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e3a5f
classDef decision fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#713f12
classDef exit fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
class gb_param,qfactor_param,qoffset_param param
class read_coeff,read_sign,scale_coeff,return_coeff process
class test decision
class exit_node exit
| Dependency type |
Meaning in this function |
REACHING_DEF |
A value definition can reach a later use, such as coeff reaching FFSIGN(coeff) or return coeff |
CDG |
A node is executed only under a control condition, here the line 75 assignment is controlled by coeff != 0 |
| Simplified dependency |
Original Joern relation |
gb parameter -> coeff = dirac_get_se_golomb(gb) |
REACHING_DEF through the gb identifier and call node |
coeff = dirac_get_se_golomb(gb) -> sign = FFSIGN(coeff) |
REACHING_DEF for coeff |
coeff = dirac_get_se_golomb(gb) -> coeff != 0 |
REACHING_DEF for coeff |
coeff != 0 -> line 75 assignment |
CDG from condition node 30064771076 to line 75 expression nodes |
sign, coeff, qfactor, qoffset -> line 75 assignment |
REACHING_DEF into the scale expression |
initial or updated coeff -> return coeff |
REACHING_DEF into the return value / method return |