Ghidra Pcode详解

Ghidra

Decompile

发布日期: 2021-07-25

更新日期: 2021-08-30

文章字数: 3k

阅读时长: 13 分

阅读次数:

Pcode

P-code 是一种为逆向工程应用设计的寄存器传输语言。该语言足够通用，可以对许多不同处理器的行为进行建模。通过这种方式建模，将不同处理器的分析放到一个通用的框架中，促进可重定向分析算法和应用程序的开发。

从根本上说，p-code 的工作原理是将单个处理器指令转换为一系列p-code 操作，这些操作将处理器状态的一部分作为输入和输出变量 ( varnodes )。一组独特的 p-code 操作（由opcode区分）包含一组相当紧密的由通用处理器执行的算术和逻辑操作。将指令直接转换为这些操作称为原始 Pcode。原始 Pcode可用于直接模拟指令执行，通常遵循相同的控制流，尽管它可能会添加一些自己的内部控制流。

Pcode 专门设计用于促进数据流图的构建，以便对反汇编指令进行后续分析。Varnodes 和 p-code 操作符可以被明确地认为是这些图中的节点。生成原始 p-code 是图构建的必要第一步，但需要额外的步骤，这引入了一些新的操作码。其中两个 MULTIEQUAL和INDIRECT特定于图构建过程，但可以在随后的图分析和转换过程中引入其他操作码，并帮助保持恢复的数据类型关系。所有新的操作码都在原始原始 Pcode翻译中都不会发生。最后，一些 Pcode运算符 CALL、 CALLIND和RETURN可能在分析期间更改了它们的输入和输出变量节点，以便它们不再匹配其原始 Pcode形式。

p-code的核心概念是：

Address Space

Pcode 的地址空间是 RAM 的概括。它被简单地定义为可以由 Pcode操作读取和写入的索引字节序列。对于特定字节，标记它的唯一索引是该字节的地址。地址空间有一个名称来标识它，一个大小表示空间中不同索引的数量，以及与之关联的字节顺序，表示整数和其他多字节值如何编码到空间中。一个典型的处理器将有一个ram空间，用于模拟可通过其主数据总线访问的内存，以及一个寄存器用于对处理器的通用寄存器进行建模的空间。处理器操作的任何数据都必须在某个地址空间中。处理器的规范可以根据需要自由定义任意数量的地址空间。总是有一个特殊的地址空间，称为常量地址空间，用于对 Pcode操作所需的任何常量值进行编码。生成 p-code 的系统通常也使用专用的临时空间，可以将其视为临时寄存器的无底源。这些用于在对指令行为建模时保存中间值。

Pcode规范允许地址空间的可寻址单元大于一个字节。每个地址空间都有一个wordsize属性，可以设置该属性来指示一个单元中的字节数。大于 1 的字大小对 Pcode的表示几乎没有影响。地址空间中的所有偏移量在内部仍表示为字节偏移量。唯一的例外是LOAD和 STORE p-code 操作。这些操作读取一个指针偏移量，当取消引用指针时，该偏移量必须正确缩放以获得正确的字节偏移量。wordsize 属性对任何其他 Pcode操作没有影响。

pcode 可以读写的寻址空间。每个 address space 有三种属性

name
size
endianness

对于特定处理器的 pcode 模型，地址空间中含有该处理器特征的所有信息（例如 RAM, register, stack…)

不同处理器通用的地址空间通常包括以下几种

ram：对可以通过数据总线访问的空间进行的抽象
register：对处理器的通用寄存器进行的抽象
unique：对临时寄存器（存储了中间值）进行的抽象
stack：对可以通过 stack pointer 访问的空间进行的抽象
constant：pcode 中用于编码常量的特殊空间

Varnode

一个varnode要么是寄存器或存储器位置的概括。它由形式三元组表示：地址空间、空间偏移量和大小。直观地，varnode 是某个地址空间中的连续字节序列，可以将其视为单个值。p-code 操作对数据的所有操作都发生在 varnode 上。

Varnodes 本身只是一个连续的字节块，由地址和大小标识，并且它们没有类型。然而，Pcode操作可以强制对 varnode进行三种类型解释之一：整数、布尔值和浮点数。

操作整数的操作总是将 varnode 解释为使用与包含 varnode 的地址空间相关联的字节序的二进制补码编码。
用作布尔值的 varnode 被假定为单个字节，该字节只能取值 0，表示false，而 1，表示true。
浮点运算使用正在建模的处理器预期的编码，这取决于 varnode 的大小。对于大多数处理器，这些编码由 IEEE 754 标准描述，但原则上其他编码也是可能的。

如果将 varnode 指定为常量地址空间的偏移量，则在使用该 varnode 的任何 pcode操作中，该偏移量将被解释为常量或立即数。在这种情况下，varnode 的大小可以被视为可用于常量编码的大小或精度。与其他 varnodes 一样，常量只有使用它们的 p 代码操作强制使用的类型。

被定义为地址空间中字节序列。是 pcode 中使用的变量，包含(address space、offset、size)三个属性，地址空间和偏移量合在一起表示 varnode 的地址

Varnodes 本身不一定具有与之关联的数据类型。反编译器最终会分配一个正式的数据类型，但在 p-code 的最低级别，varnodes 从作用于它们的 p-code 操作继承构建块数据类型

Integer
Boolean
Floating Point

P-code Operation

一个p-code操作是机器指令的模拟。所有 p-code操作在内部具有相同的基本格式。它们都将一个或多个 varnode 作为输入，并可选择生成单个输出 varnode。操作的动作由它的操作码决定。对于几乎所有的 p-code 操作，只有输出 varnode 可以修改其值；操作没有间接影响。唯一可能的例外是伪操作，当对指令的行为不完全了解时，有时需要伪操作。

所有 p-code操作都与它们被翻译的原始处理器指令的地址相关联。对于单个指令，从零开始的计数器用于枚举其转换中涉及的多个 p-code操作。地址和计数器成对称为 p-code操作的唯一序列号。p-code操作的控制流通常遵循序列号顺序。当一条指令的所有 p-code执行完成时，如果该指令有失败从语义上讲，p-code 控制流按照对应于 fall-through 地址处的指令的顺序，选择第一个 p-code 操作。类似地，如果 p-code操作导致控制流分支，则按顺序在目标地址执行第一个 p-code操作。

可能的操作码列表类似于许多基于 RISC 的指令集。每个操作码的作用在下面的章节中详细描述，在名为“语法参考”的章节中给出了一个参考表。通常，特定 p-code 操作的大小或精度由 varnode 输入或输出的大小决定，而不是由操作码决定。

P-Code Operation Reference

Data Moving

Branching

Extension/Truncation

Integer Comparison

Integer Arithmetic

Overflow Test

Logical

Integer Shift

Boolean

Floating Point Comparison

Floating Point Arithmetic

Floating Point Conversion

Pseudo P-CODE Operations

Additional P-CODE Operations

Syntax Reference

Syntax Reference

Name	Syntax	Description
COPY	v0 = v1;	Copy v1 into v0.
LOAD	* v1 [spc]v1 :2 v1 *[spc]:2 v1	Dereference v1 as pointer into default space. Optionally specify a space to load from and size of data in bytes.
STORE	v0 = v1; [spc]v0 = v1; :4 v0 = v1; [spc]:4 v0 = v1;	Store in v1 in default space using v0 as pointer. Optionally specify space to store in and size of data in bytes.
BRANCH	goto v0;	Branch execution to address of v0.
CBRANCH	if (v0) goto v1;	Branch execution to address of v1 if v0 equals 1 (true).
BRANCHIND	goto [v0];	Branch execution to value in v0 viewed as an offset into the current space.
CALL	call v0;	Branch execution to address of v0. Hint that the branch is a subroutine call.
CALLIND	call [v0];	Branch execution to value in v0 viewed as an offset into the current space. Hint that the branch is a subroutine call.
RETURN	return [v0];	Branch execution to value in v0 viewed as an offset into the current space. Hint that the branch is a subroutine return.
INT_EQUAL	v0 == v1	True if v0 equals v1.
INT_NOTEQUAL	v0 != v1	True if v0 does not equal v1.
INT_SLESS	v0 s< v1v1 s> v0	True if v0 is less than v1 as a signed integer.
INT_SLESSEQUAL	v0 s<= v1v1 s>= v0	True if v0 is less than or equal to v1 as a signed integer.
INT_LESS	v0 < v1v1 > v0	True if v0 is less than v1 as an unsigned integer.
INT_LESSEQUAL	v0 <= v1v1 >= v0	True if v0 is less than or equal to v1 as an unsigned integer.
INT_ZEXT	zext(v0)	Zero extension of v0.
INT_SEXT	sext(v0)	Sign extension of v0.
INT_ADD	v0 + v1	Addition of v0 and v1 as integers.
INT_SUB	v0 - v1	Subtraction of v1 from v0 as integers.
INT_CARRY	carry(v0,v1)	True if adding v0 and v1 would produce an unsigned carry.
INT_SCARRY	scarry(v0,v1)	True if adding v0 and v1 would produce an signed carry.
INT_SBORROW	sborrow(v0,v1)	True if subtracting v1 from v0 would produce a signed borrow.
INT_2COMP	-v0	Twos complement of v0.
INT_NEGATE	~v0	Bitwise negation of v0.
INT_XOR	v0 ^ v1	Bitwise Exclusive Or of v0 with v1.
INT_AND	v0 & v1	Bitwise Logical And of v0 with v1.
INT_OR	v0 \| v1	Bitwise Logical Or of v0 with v1.
INT_LEFT	v0 << v1	Left shift of v0 by v1 bits.
INT_RIGHT	v0 >> v1	Unsigned (logical) right shift of v0 by v1 bits.
INT_SRIGHT	v0 s>> v1	Signed (arithmetic) right shift of v0 by v1 bits.
INT_MULT	v0 * v1	Integer multiplication of v0 and v1.
INT_DIV	v0 / v1	Unsigned division of v0 by v1.
INT_SDIV	v0 s/ v1	Signed division of v0 by v1.
INT_REM	v0 % v1	Unsigned remainder of v0 modulo v1.
INT_SREM	v0 s% v1	Signed remainder of v0 modulo v1.
BOOL_NEGATE	!v0	Negation of boolean value v0.
BOOL_XOR	v0 ^^ v1	Exclusive-Or of booleans v0 and v1.
BOOL_AND	v0 && v1	Logical-And of booleans v0 and v1.
BOOL_OR	v0 \|\| v1	Logical-Or of booleans v0 and v1.
FLOAT_EQUAL	v0 f== v1	True if v0 equals v1 viewed as floating-point numbers.
FLOAT_NOTEQUAL	v0 f!= v1	True if v0 does not equal v1 viewed as floating-point numbers.
FLOAT_LESS	v0 f< v1v1 f> v0	True if v0 is less than v1 viewed as floating-point numbers.
FLOAT_LESSEQUAL	v0 f<= v1v1 f>= v0	True if v0 is less than or equal to v1 viewed as floating-point numbers.
FLOAT_NAN	nan(v0)	True if v0 is not a valid floating-point number (NaN).
FLOAT_ADD	v0 f+ v1	Addition of v0 and v1 as floating-point numbers.
FLOAT_DIV	v0 f/ v1	Division of v0 by v1 as floating-point numbers.
FLOAT_MULT	v0 f* v1	Multiplication of v0 and v1 as floating-point numbers.
FLOAT_SUB	v0 f- v1	Subtraction of v1 from v0 as floating-point numbers.
FLOAT_NEG	f- v0	Additive inverse of v0 as a floating-point number.
FLOAT_ABS	abs(v0)	Absolute value of v0 as a floating-point number.
FLOAT_SQRT	sqrt(v0)	Square root of v0 as a floating-point number.
INT2FLOAT	int2float(v0)	Floating-point representation of v0 viewed as an integer.
FLOAT2FLOAT	float2float(v0)	Copy of floating-point number v0 with more or less precision.
TRUNC	trunc(v0)	Signed integer obtained by truncating v0 viewed as a floating-point number.
FLOAT_CEIL	ceil(v0)	Nearest integral floating-point value greater than v0, viewed as a floating-point number.
FLOAT_FLOOR	floor(v0)	Nearest integral floating-point value less than v0, viewed as a floating-point number.
FLOAT_ROUND	round(v0)	Nearest integral floating-point to v0, viewed as a floating-point number.
SUBPIECE	v0:2	The least signficant n bytes of v0.
SUBPIECE	v0(2)	All but the least significant n bytes of v0.
PIECE		Concatenate two varnodes into a single varnode.
CPOOLREF	cpool(v0,…)	Obtain constant pool value.
NEW	newobject(v0) newobject(v0,v1)	Allocate an object or an array of objects.
MULTIEQUAL		Compiler phi-node: values merging from multiple control-flow paths.
INDIRECT		Indirect effect from input varnode to output varnode.
CAST		Copy from input to output. A hint that the underlying datatype has changed.
PTRADD		Construct a pointer to an element from a pointer to the start of an array and an index.
PTRSUB		Construct a pointer to a field from a pointer to a structure and an offset.

杰克成

https://jackhcc.github.io/posts/ghidra-pcode.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源杰克成 !

Ghidra

反编译理论基础

反编译基础知识与算法

2021-07-26 Decompile

Decompiling

Ghidra色彩重构

记一次Ghidra GUI的色彩重构~

2021-07-24 Decompile

Ghidra

Ghidra Pcode详解

Pcode

P-Code Operation Reference

Data Moving

Branching

Extension/Truncation

Integer Comparison

Integer Arithmetic

Overflow Test

Logical

Integer Shift

Boolean

Floating Point Comparison

Floating Point Arithmetic

Floating Point Conversion

Pseudo P-CODE Operations

Additional P-CODE Operations

Syntax Reference

你的赏识是我前进的动力