forked from eden-emu/eden
		
	
		
			
	
	
		
			582 lines
		
	
	
	
		
			27 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			582 lines
		
	
	
	
		
			27 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
|  | # Introduction
 | ||
|  | 
 | ||
|  | Given a minidump file, the Breakpad processor produces stack traces that include | ||
|  | function names and source locations. However, minidump files contain only the | ||
|  | byte-by-byte contents of threads' registers and stacks, without function names | ||
|  | or machine-code-to-source mapping data. The processor consults Breakpad symbol | ||
|  | files for the information it needs to produce human-readable stack traces from | ||
|  | the binary-only minidump file. | ||
|  | 
 | ||
|  | The platform-specific symbol dumping tools parse the debugging information the | ||
|  | compiler provides (whether as DWARF or STABS sections in an ELF file or as | ||
|  | stand-alone PDB files), and write that information back out in the Breakpad | ||
|  | symbol file format. This format is much simpler and less detailed than compiler | ||
|  | debugging information, and values legibility over compactness. | ||
|  | 
 | ||
|  | # Overview
 | ||
|  | 
 | ||
|  | Breakpad symbol files are ASCII text files, with lines delimited as appropriate | ||
|  | for the host platform. Each line is a _record_, divided into fields by single | ||
|  | spaces; in some cases, the last field of the record can contain spaces. The | ||
|  | first field is a string indicating what sort of record the line represents | ||
|  | (except for line records; these are very common, making them the default saves | ||
|  | space). Some fields hold decimal or hexadecimal numbers; hexadecimal numbers | ||
|  | have no "0x" prefix, and use lower-case letters. | ||
|  | 
 | ||
|  | Breakpad symbol files contain the following record types. With some | ||
|  | restrictions, these may appear in any order. | ||
|  | 
 | ||
|  | *   A `MODULE` record describes the executable file or shared library from which | ||
|  |     this data was derived, for use by symbol suppliers. A `MODULE' record should | ||
|  |     be the first record in the file. | ||
|  | 
 | ||
|  | *   A `FILE` record gives a source file name, and assigns it a number by which | ||
|  |     other records can refer to it. | ||
|  | 
 | ||
|  | *   An `INLINE_ORIGIN` record holds an inline function name for `INLINE` records | ||
|  |     to refer to. | ||
|  | 
 | ||
|  | *   A `FUNC` record describes a function present in the source code. | ||
|  | 
 | ||
|  | *   An `INLINE` record describes the inline function's nest level, call site | ||
|  |     line and call site source file to which the given ranges of machine code | ||
|  |     should be attributed. | ||
|  | 
 | ||
|  | *   A line record indicates to which source file and line a given range of | ||
|  |     machine code should be attributed. The line is attributed to the function | ||
|  |     defined by the most recent `FUNC` record. | ||
|  | 
 | ||
|  | *   A `PUBLIC` record gives the address of a linker symbol. | ||
|  | 
 | ||
|  | *   A `STACK` record provides information necessary to produce stack traces. | ||
|  | 
 | ||
|  | # `MODULE` records
 | ||
|  | 
 | ||
|  | A `MODULE` record provides meta-information about the module the symbol file | ||
|  | describes. It has the form: | ||
|  | 
 | ||
|  | > `MODULE` _operatingsystem_ _architecture_ _id_ _name_
 | ||
|  | 
 | ||
|  | For example: `MODULE Linux x86 D3096ED481217FD4C16B29CD9BC208BA0 firefox-bin | ||
|  | ` These records provide meta-information about the executable or shared library | ||
|  | from which this symbol file was generated. A symbol supplier might use this | ||
|  | information to find the correct symbol files to use to interpret a given | ||
|  | minidump, or to perform other sorts of validation. If present, a `MODULE` record | ||
|  | should be the first line in the file. | ||
|  | 
 | ||
|  | The fields are separated by spaces, and cannot contain spaces themselves, except | ||
|  | for _name_. | ||
|  | 
 | ||
|  | *   The _operatingsystem_ field names the operating system on which the | ||
|  |     executable or shared library was intended to run. This field should have one | ||
|  |     of the following values: | ||
|  | 
 | ||
|  |     | **Value** | **Meaning** | | ||
|  |     |:----------|:--------------------| | ||
|  |     | Linux | Linux | | ||
|  |     | mac | Macintosh OSX | | ||
|  |     | windows | Microsoft Windows | | ||
|  | 
 | ||
|  | *   The _architecture_ field indicates what processor architecture the | ||
|  |     executable or shared library contains machine code for. This field should | ||
|  |     have one of the following values: | ||
|  | 
 | ||
|  |     | **Value** | **Instruction Set Architecture** | | ||
|  |     |:----------|:---------------------------------| | ||
|  |     | x86 | Intel IA-32 | | ||
|  |     | x86\_64 | AMD64/Intel 64 | | ||
|  |     | ppc | 32-bit PowerPC | | ||
|  |     | ppc64 | 64-bit PowerPC | | ||
|  |     | unknown | unknown | | ||
|  | 
 | ||
|  | *   The _id_ field is a sequence of hexadecimal digits that identifies the exact | ||
|  |     executable or library whose contents the symbol file describes. The way in | ||
|  |     which it is computed varies from platform to platform. | ||
|  | 
 | ||
|  | *   The _name_ field contains the base name (the final component of the | ||
|  |     directory path) of the executable or library. It may contain spaces, and | ||
|  |     extends to the end of the line. | ||
|  | 
 | ||
|  | # `FILE` records
 | ||
|  | 
 | ||
|  | A `FILE` record holds a source file name for other records to refer to. It has | ||
|  | the form: | ||
|  | 
 | ||
|  | > `FILE` _number_ _name_
 | ||
|  | 
 | ||
|  | For example: `FILE 2 /home/jimb/mc/in/browser/app/nsBrowserApp.cpp | ||
|  | ` | ||
|  | 
 | ||
|  | A `FILE` record provides the name of a source file, and assigns it a number | ||
|  | which other records (line records, in particular) can use to refer to that file | ||
|  | name. The _number_ field is a decimal number. The _name_ field is the name of | ||
|  | the file; it may contain spaces. | ||
|  | 
 | ||
|  | # `INLINE_ORIGIN` records
 | ||
|  | 
 | ||
|  | An `INLINE_ORIGIN` record holds an inline function name for `INLINE` records to | ||
|  | refer to. It has the form: | ||
|  | 
 | ||
|  | > `INLINE_ORIGIN` _number_ _name_
 | ||
|  | 
 | ||
|  | For example: `INLINE_ORIGIN 2 nsQueryInterfaceWithError::operator()(nsID const&, | ||
|  | void**) const | ||
|  | ` | ||
|  | 
 | ||
|  | An `INLINE_ORIGIN` record provides the name of an inline function, and assigns | ||
|  | it a number which other records (`INLINE` records, in particular) can use to | ||
|  | refer to that function name. The _number_ field is a decimal number. The _name_ | ||
|  | field is the name of the inline function; it may contain spaces. | ||
|  | 
 | ||
|  | # `FUNC` records
 | ||
|  | 
 | ||
|  | A `FUNC` record describes a source-language function. It has the form: | ||
|  | 
 | ||
|  | > `FUNC` _[m]_ _address_ _size_ _parameter\_size_ _name_
 | ||
|  | 
 | ||
|  | For example: `FUNC m c184 30 0 nsQueryInterfaceWithError::operator()(nsID const&, | ||
|  | void**) const | ||
|  | ` | ||
|  | 
 | ||
|  | The _m_ field is optional. If present it indicates that multiple symbols | ||
|  | reference this function's instructions. (In which case, only one symbol name is | ||
|  | mentioned within the breakpad file.) Multiple symbols referencing the same | ||
|  | instructions may occur due to identical code folding by the linker. | ||
|  | 
 | ||
|  | The _address_ and _size_ fields are hexadecimal numbers indicating the start | ||
|  | address and length in bytes of the machine code instructions the function | ||
|  | occupies. (Breakpad symbol files cannot accurately describe functions whose code | ||
|  | is not contiguous.) The start address is relative to the module's load address. | ||
|  | 
 | ||
|  | The _parameter\_size_ field is a hexadecimal number indicating the size, in | ||
|  | bytes, of the arguments pushed on the stack for this function. Some calling | ||
|  | conventions, like the Microsoft Windows `stdcall` convention, require the called | ||
|  | function to pop parameters passed to it on the stack from its caller before | ||
|  | returning. The stack walker uses this value, along with data from `STACK` | ||
|  | records, to step from the called function's frame to the caller's frame. | ||
|  | 
 | ||
|  | The _name_ field is the name of the function. In languages that use linker | ||
|  | symbol name mangling like C++, this should be the source language name (the | ||
|  | "unmangled" form). This field may contain spaces. | ||
|  | 
 | ||
|  | # `INLINE` records
 | ||
|  | 
 | ||
|  | An `INLINE` record describes the inline function's nest level, call site line | ||
|  | and call site source file to which the given ranges of machine code should be | ||
|  | attributed. It has the form: | ||
|  | 
 | ||
|  | > `INLINE` _inline_nest_level_ _call_site_line_ _call_site_file_num_
 | ||
|  | > _origin_num_ [_address_ _size_]+
 | ||
|  | 
 | ||
|  | For example: `INLINE 0 10 3 4 d30 2a fa1 b | ||
|  | ` | ||
|  | 
 | ||
|  | The _inline_nest_level_ field is a decimal number that means it's inlined at the | ||
|  | function described by a previous `INLINE` record which has _inline_nest_level_ | ||
|  | one less than its. In the example below, first and third `INLINE` records have | ||
|  | _inline_nest_level_ 0, which means they are inlined inside the function | ||
|  | described by the `FUNC` record. The second `INLINE` record has | ||
|  | _inline_nest_level_ 1 means that it's inlined at the inline function described | ||
|  | by first `INLINE` record. | ||
|  | ``` | ||
|  | FUNC ... | ||
|  | INLINE 0 ... | ||
|  | INLINE 1 ... | ||
|  | INLINE 0 ... | ||
|  | ``` | ||
|  | 
 | ||
|  | The _call_site_line_ and _call_site_file_num_ fields are decimal numbers | ||
|  | indicating where this inline function being called at. | ||
|  | 
 | ||
|  | The _origin_num_ field refers to an `INLINE_ORIGIN` record that has the name | ||
|  | of the inline function. | ||
|  | 
 | ||
|  | The _address_ and _size_ fields are hexadecimal numbers indicating the start | ||
|  | address and length in bytes of the machine code. The address is relative to the | ||
|  | module's load address. There could be more than one [_address_ _size_] range | ||
|  | pair, since inline functions could have discontinuous address ranges. The ranges | ||
|  | of an `INLINE` record are always inside the ranges described by its parent | ||
|  | record (a `FUNC` record or an `INLINE` record). | ||
|  | 
 | ||
|  | The `INLINE` record is assumed to belong to the function described by the last | ||
|  | preceding `FUNC` record. `INLINE` records may not appear before the first `FUNC` | ||
|  | record. | ||
|  | 
 | ||
|  | # Line records
 | ||
|  | 
 | ||
|  | A line record describes the source file and line number to which a given range | ||
|  | of machine code should be attributed. It has the form: | ||
|  | 
 | ||
|  | > _address_ _size_ _line_ _filenum_
 | ||
|  | 
 | ||
|  | For example: `c184 7 59 4 | ||
|  | ` | ||
|  | 
 | ||
|  | Because they are so common, line records do not begin with a string indicating | ||
|  | the record type. All other record types' names use upper-case letters; | ||
|  | hexadecimal numbers, like a line record's _address_, use lower-case letters. | ||
|  | 
 | ||
|  | The _address_ and _size_ fields are hexadecimal numbers indicating the start | ||
|  | address and length in bytes of the machine code. The address is relative to the | ||
|  | module's load address. | ||
|  | 
 | ||
|  | The _line_ field is the line number to which the machine code should be | ||
|  | attributed, in decimal; the first line of the source file is line number 1. The | ||
|  | _filenum_ field is a decimal number appearing in a prior `FILE` record; the name | ||
|  | given in that record is the source file name for the machine code. | ||
|  | 
 | ||
|  | The line is assumed to belong to the function described by the last preceding | ||
|  | `FUNC` record. Line records may not appear before the first `FUNC' record. | ||
|  | 
 | ||
|  | No two line records in a symbol file cover the same range of addresses. However, | ||
|  | there may be many line records with identical line and file numbers, as a given | ||
|  | source line may contribute many non-contiguous blocks of machine code. | ||
|  | 
 | ||
|  | # `PUBLIC` records
 | ||
|  | 
 | ||
|  | A `PUBLIC` record describes a publicly visible linker symbol, such as that used | ||
|  | to identify an assembly language entry point or region of memory. It has the | ||
|  | form: | ||
|  | 
 | ||
|  | > PUBLIC _[m]_ _address_ _parameter\_size_ _name_
 | ||
|  | 
 | ||
|  | For example: `PUBLIC m 2160 0 Public2_1 | ||
|  | ` | ||
|  | 
 | ||
|  | The Breakpad processor essentially treats a `PUBLIC` record as defining a | ||
|  | function with no line number data and an indeterminate size: the code extends to | ||
|  | the next address mentioned. If a given address is covered by both a `PUBLIC` | ||
|  | record and a `FUNC` record, the processor uses the `FUNC` data. | ||
|  | 
 | ||
|  | The _m_ field is optional. If present it indicates that multiple symbols | ||
|  | reference this function's instructions. (In which case, only one symbol name is | ||
|  | mentioned within the breakpad file.) Multiple symbols referencing the same | ||
|  | instructions may occur due to identical code folding by the linker. | ||
|  | 
 | ||
|  | The _address_ field is a hexadecimal number indicating the symbol's address, | ||
|  | relative to the module's load address. | ||
|  | 
 | ||
|  | The _parameter\_size_ field is a hexadecimal number indicating the size of the | ||
|  | parameters passed to the code whose entry point the symbol marks, if known. This | ||
|  | field has the same meaning as the _parameter\_size_ field of a `FUNC` record; | ||
|  | see that description for more details. | ||
|  | 
 | ||
|  | The _name_ field is the name of the symbol. In languages that use linker symbol | ||
|  | name mangling like C++, this should be the source language name (the "unmangled" | ||
|  | form). This field may contain spaces. | ||
|  | 
 | ||
|  | # `STACK WIN` records
 | ||
|  | 
 | ||
|  | Given a stack frame, a `STACK WIN` record indicates how to find the frame that | ||
|  | called it. It has the form: | ||
|  | 
 | ||
|  | > STACK WIN _type_ _rva_ _code\_size_ _prologue\_size_ _epilogue\_size_
 | ||
|  | > _parameter\_size_ _saved\_register\_size_ _local\_size_ _max\_stack\_size_
 | ||
|  | > _has\_program\_string_ _program\_string\_OR\_allocates\_base\_pointer_
 | ||
|  | 
 | ||
|  | For example: `STACK WIN 4 2170 14 1 0 0 0 0 0 1 $eip 4 + ^ = $esp $ebp 8 + = | ||
|  | $ebp $ebp ^ = | ||
|  | ` | ||
|  | 
 | ||
|  | All fields of a `STACK WIN` record, except for the last, are hexadecimal | ||
|  | numbers. | ||
|  | 
 | ||
|  | The _type_ field indicates what sort of stack frame data this record holds. Its | ||
|  | value should be one of the values of the | ||
|  | [StackFrameTypeEnum](http://msdn.microsoft.com/en-us/library/bc5207xw%28VS.100%29.aspx) | ||
|  | type in Microsoft's | ||
|  | [Debug Interface Access (DIA)](http://msdn.microsoft.com/en-us/library/x93ctkx8%28VS.100%29.aspx) API. | ||
|  | Breakpad uses only records of type 4 (`FrameTypeFrameData`) and 0 | ||
|  | (`FrameTypeFPO`); it ignores others. These types differ only in whether the last | ||
|  | field is an _allocates\_base\_pointer_ flag (`FrameTypeFPO`) or a program string | ||
|  | (`FrameTypeFrameData`). If more than one record covers a given address, Breakpad | ||
|  | prefers `FrameTypeFrameData` records over `FrameTypeFPO` records. | ||
|  | 
 | ||
|  | The _rva_ and _code\_size_ fields give the starting address and length in bytes | ||
|  | of the machine code covered by this record. The starting address is relative to | ||
|  | the module's load address. | ||
|  | 
 | ||
|  | The _prologue\_size_ and _epilogue\_size_ fields give the length, in bytes, of | ||
|  | the prologue and epilogue machine code within the record's range. Breakpad does | ||
|  | not use these values. | ||
|  | 
 | ||
|  | The _parameter\_size_ field gives the number of argument bytes this function | ||
|  | expects to have been passed. This field has the same meaning as the | ||
|  | _parameter\_size_ field of a `FUNC` record; see that description for more | ||
|  | details. | ||
|  | 
 | ||
|  | The _saved\_register\_size_ field gives the number of bytes in the stack frame | ||
|  | dedicated to preserving the values of any callee-saves registers used by this | ||
|  | function. | ||
|  | 
 | ||
|  | The _local\_size_ field gives the number of bytes in the stack frame dedicated | ||
|  | to holding the function's local variables and temporary values. | ||
|  | 
 | ||
|  | The _max\_stack\_size_ field gives the maximum number of bytes pushed on the | ||
|  | stack in the frame. Breakpad does not use this value. | ||
|  | 
 | ||
|  | If the _has\_program\_string_ field is zero, then the `STACK WIN` record's final | ||
|  | field is an _allocates\_base\_pointer_ flag, as a hexadecimal number; this is | ||
|  | expected for records whose _type_ is 0. Otherwise, the final field is a program | ||
|  | string. | ||
|  | 
 | ||
|  | ## Interpreting a `STACK WIN` record
 | ||
|  | 
 | ||
|  | Given the register values for a frame F, we can find the calling frame as | ||
|  | follows: | ||
|  | 
 | ||
|  | *   If the _has\_program\_string_ field of a `STACK WIN` record is zero, then | ||
|  |     the final field is _allocates\_base\_pointer_, a flag indicating whether the | ||
|  |     frame uses the frame pointer register, `%ebp`, as a general-purpose | ||
|  |     register. | ||
|  |     *   If _allocates\_base\_pointer_ is true, then `%ebp` does not point to the | ||
|  |         frame's base address. Instead, | ||
|  |         *   Let _next\_parameter\_size_ be the parameter size of the function | ||
|  |             frame F called (**not** this record's _parameter\_size_ field), or | ||
|  |             zero if F is the youngest frame on the stack. You must find this | ||
|  |             value in F's callee's `FUNC`, `STACK WIN`, or `PUBLIC` records. | ||
|  |         *   Let _frame\_size_ be the sum of the _local\_size_ field, the | ||
|  |             _saved\_register\_size_ field, and _next\_parameter\_size_. > > With | ||
|  |             those definitions in place, we can recover the calling frame as | ||
|  |             follows: | ||
|  |         *   F's return address is at `%esp +`_frame\_size_, | ||
|  |         *   the caller's value of `%ebp` is saved at `%esp | ||
|  |             +`_next\_parameter\_size_`+`_saved\_register\_size_`- 8`, and | ||
|  |         *   the caller's value of `%esp` just before the call instruction was | ||
|  |             `%esp +`_frame\_size_`+ 4`. > > (Why do we include | ||
|  |             _next\_parameter\_size_ in the sum when computing _frame\_size_ and | ||
|  |             the address of the saved `%ebp`? When a function A has called a | ||
|  |             function B, the arguments that A pushed for B are considered part of | ||
|  |             A's stack frame: A's value for `%esp` points at the last argument | ||
|  |             pushed for B. Thus, we must include the size of those arguments | ||
|  |             (given by the debugging info for B) along with the size of A's | ||
|  |             register save area and local variable area (given by the debugging | ||
|  |             info for A) when computing the overall size of A's frame.) | ||
|  |     *   If _allocates\_base\_pointer_ is false, then F's function doesn't use | ||
|  |         `%ebp` at all. You may recover the calling frame as above, except that | ||
|  |         the caller's value of `%ebp` is the same as F's value for `%ebp`, so no | ||
|  |         steps are necessary to recover it. | ||
|  | *   If the _has\_program\_string_ field of a `STACK WIN` record is not zero, | ||
|  |     then the record's final field is a string containing a program to be | ||
|  |     interpreted to recover the caller's frame. The comments in the | ||
|  |     [postfix\_evaluator.h](../src/processor/postfix_evaluator.h#40) | ||
|  |     header file explain the language in which the program is written. You should | ||
|  |     place the following variables in the dictionary before interpreting the | ||
|  |     program: | ||
|  |     *   `$ebp` and `$esp` should be the values of the `%ebp` and `%esp` | ||
|  |         registers in F. | ||
|  |     *   `.cbParams`, `.cbSavedRegs`, and `.cbLocals`, should be the values of | ||
|  |         the `STACK WIN` record's _parameter\_size_, _saved\_register\_size_, and | ||
|  |         _local\_size_ fields. | ||
|  |     *   `.raSearchStart` should be set to the address on the stack to begin | ||
|  |         scanning for a return address, if necessary. The Breakpad processor sets | ||
|  |         this to the value of `%esp` in F, plus the _frame\_size_ value mentioned | ||
|  |         above. | ||
|  | 
 | ||
|  | > If the program stores values for `$eip`, `$esp`, `$ebp`, `$ebx`, `$esi`, or
 | ||
|  | > `$edi`, then those are the values of the given registers in the caller. If the
 | ||
|  | > value of `$eip` is zero, that indicates that the end of the stack has been
 | ||
|  | > reached.
 | ||
|  | 
 | ||
|  | The Breakpad processor checks that the value yielded by the above for the | ||
|  | calling frame's instruction address refers to known code; if the address seems | ||
|  | to be bogus, then it uses a heuristic search to find F's return address and | ||
|  | stack base. | ||
|  | 
 | ||
|  | # `STACK CFI` records
 | ||
|  | 
 | ||
|  | `STACK CFI` ("Call Frame Information") records describe how to walk the stack | ||
|  | when execution is at a given machine instruction. These records take one of two | ||
|  | forms: | ||
|  | 
 | ||
|  | > `STACK CFI INIT` _address_ _size_ _register<sub>1</sub>_:
 | ||
|  | > _expression<sub>1</sub>_ _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
 | ||
|  | > | ||
|  | > `STACK CFI` _address_ _register<sub>1</sub>_: _expression<sub>1</sub>_
 | ||
|  | > _register<sub>2</sub>_: _expression<sub>2</sub>_ ...
 | ||
|  | 
 | ||
|  | For example: | ||
|  | 
 | ||
|  | ``` | ||
|  | STACK CFI INIT 804c4b0 40 .cfa: $esp 4 + $eip: .cfa 4 - ^ | ||
|  | STACK CFI 804c4b1 .cfa: $esp 8 + $ebp: .cfa 8 - ^ | ||
|  | ``` | ||
|  | 
 | ||
|  | The _address_ and _size_ fields are hexadecimal numbers. Each | ||
|  | _register_<sub>i</sub> is the name of a register or pseudoregister. Each | ||
|  | _expression_ is a Breakpad postfix expression, which may contain spaces, but | ||
|  | never ends with a colon. (The appropriate register names for a given | ||
|  | architecture are determined when `STACK CFI` records are first enabled for that | ||
|  | architecture, and should be documented in the appropriate | ||
|  | `stackwalker_`_architecture_`.cc` source file.) | ||
|  | 
 | ||
|  | STACK CFI records describe, at each machine instruction in a given function, how | ||
|  | to recover the values the machine registers had in the function's caller. | ||
|  | Naturally, some registers' values are simply lost, but there are three cases in | ||
|  | which they can be recovered: | ||
|  | 
 | ||
|  | *   You can always recover the program counter, because that's the function's | ||
|  |     return address. If the function is ever going to return, the PC must be | ||
|  |     saved somewhere. | ||
|  | 
 | ||
|  | *   You can always recover the stack pointer. The function is responsible for | ||
|  |     popping its stack frame before it returns to the caller, so it must be able | ||
|  |     to restore this, as well. | ||
|  | 
 | ||
|  | *   You should be able to recover the values of callee-saves registers. These | ||
|  |     are registers whose values the callee must preserve, either by saving them | ||
|  |     in its own stack frame before using them and re-loading them before | ||
|  |     returning, or by not using them at all. | ||
|  | 
 | ||
|  | (As an exception, note that functions which never return may not save any of | ||
|  | this data. It may not be possible to walk the stack past such functions' stack | ||
|  | frames.) | ||
|  | 
 | ||
|  | Given rules for recovering the values of a function's caller's registers, we can | ||
|  | walk up the stack. Starting with the current set of registers --- the PC of the | ||
|  | instruction we're currently executing, the current stack pointer, etc. --- we | ||
|  | use CFI to recover the values those registers had in the caller of the current | ||
|  | frame. This gives us a PC in the caller whose CFI we can look up; we apply the | ||
|  | process again to find that function's caller; and so on. | ||
|  | 
 | ||
|  | Concretely, CFI records represent a table with a row for each machine | ||
|  | instruction address and a column for each register. The table entry for a given | ||
|  | address and register contains a rule describing how, when the PC is at that | ||
|  | address, to restore the value that register had in the caller. | ||
|  | 
 | ||
|  | There are some special columns: | ||
|  | 
 | ||
|  | *   A column named `.cfa`, for "Canonical Frame Address", tells how to compute | ||
|  |     the base address of the frame; other entries can refer to the CFA in their | ||
|  |     rules. | ||
|  | 
 | ||
|  | *   A column named `.ra` represents the return address. | ||
|  | 
 | ||
|  | For example, suppose we have a machine with 32-bit registers, one-byte | ||
|  | instructions, a stack that grows downwards, and an assembly language that | ||
|  | resembles C. Suppose further that we have a function whose machine code looks | ||
|  | like this: | ||
|  | 
 | ||
|  | ``` | ||
|  | func:                                ; entry point; return address at sp | ||
|  | func+0:      sp -= 16                ; allocate space for stack frame | ||
|  | func+1:      sp[12] = r0             ; save 4-byte r0 at sp+12 | ||
|  |              ...                     ; stuff that doesn't affect stack | ||
|  | func+10:     sp -= 4; *sp = x        ; push some 4-byte x on the stack | ||
|  |              ...                     ; stuff that doesn't affect stack | ||
|  | func+20:     r0 = sp[16]             ; restore saved r0 | ||
|  | func+21:     sp += 20                ; pop whole stack frame | ||
|  | func+22:     pc = *sp; sp += 4       ; pop return address and jump to it | ||
|  | ``` | ||
|  | 
 | ||
|  | The following table would describe the function above: | ||
|  | 
 | ||
|  | | **code address** | **.cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **.ra**  | | ||
|  | |:-----------------|:---------|:------------------------|:------------------------|:----|:---------| | ||
|  | | func+0           | sp       |                         |                         |     | `cfa[0]` | | ||
|  | | func+1           | sp+16    |                         |                         |     | `cfa[0]` | | ||
|  | | func+2           | sp+16    | `cfa[-4]`               |                         |     | `cfa[0]` | | ||
|  | | func+11          | sp+20    | `cfa[-4]`               |                         |     | `cfa[0]` | | ||
|  | | func+21          | sp+20    |                         |                         |     | `cfa[0]` | | ||
|  | | func+22          | sp       |                         |                         |     | `cfa[0]` | | ||
|  | 
 | ||
|  | Some things to note here: | ||
|  | 
 | ||
|  | *   Each row describes the state of affairs **before** executing the instruction | ||
|  |     at the given address. Thus, the row for func+0 describes the state before we | ||
|  |     execute the first instruction, which allocates the stack frame. In the next | ||
|  |     row, the formula for computing the CFA has changed, reflecting the | ||
|  |     allocation. | ||
|  | 
 | ||
|  | *   The other entries are written in terms of the CFA; this allows them to | ||
|  |     remain unchanged as the stack pointer gets bumped around. For example, to | ||
|  |     find the caller's value for r0 (on Google Code) at func+2, we would first | ||
|  |     compute the CFA by adding 16 to the sp, and then subtract four from that to | ||
|  |     find the address at which r0 (on Google Code) was saved. | ||
|  | 
 | ||
|  | *   Although the example doesn't show this, most calling conventions designate | ||
|  |     "callee-saves" and "caller-saves" registers. The callee must restore the | ||
|  |     values of "callee-saves" registers before returning (if it uses them at | ||
|  |     all), whereas the callee is free to use "caller-saves" registers without | ||
|  |     restoring their values. A function that uses caller-saves registers | ||
|  |     typically does not save their original values at all; in this case, the CFI | ||
|  |     marks such registers' values as "unrecoverable". | ||
|  | 
 | ||
|  | *   Exactly where the CFA points in the frame --- at the return address? below | ||
|  |     it? At some fixed point within the frame? --- is a question of definition | ||
|  |     that depends on the architecture and ABI in use. But by definition, the CFA | ||
|  |     remains constant throughout the lifetime of the frame. It's up to | ||
|  |     architecture- specific code to know what significance to assign the CFA, if | ||
|  |     any. | ||
|  | 
 | ||
|  | To save space, the most common type of CFI record only mentions the table | ||
|  | entries at which changes take place. So for the above, the CFI data would only | ||
|  | actually mention the non-blank entries here: | ||
|  | 
 | ||
|  | | **insn** | **cfa** | **r0 (on Google Code)** | **r1 (on Google Code)** | ... | **ra**   | | ||
|  | |:---------|:--------|:------------------------|:------------------------|:----|:---------| | ||
|  | | func+0   | sp      |                         |                         |     | `cfa[0]` | | ||
|  | | func+1   | sp+16   |                         |                         |     |          | | ||
|  | | func+2   |         | `cfa[-4]`               |                         |     |          | | ||
|  | | func+11  | sp+20   |                         |                         |     |          | | ||
|  | | func+21  |         | r0 (on Google Code)     |                         |     |          | | ||
|  | | func+22  | sp      |                         |                         |     |          | | ||
|  | 
 | ||
|  | A `STACK CFI INIT` record indicates that, at the machine instruction at | ||
|  | _address_, belonging to some function, the value that _register<sub>n</sub>_ had | ||
|  | in that function's caller can be recovered by evaluating | ||
|  | _expression<sub>n</sub>_. The values of any callee-saves registers not mentioned | ||
|  | are assumed to be unchanged. (`STACK CFI` records never mention caller-saves | ||
|  | registers.) These rules apply starting at _address_ and continue up to, but not | ||
|  | including, the address given in the next `STACK CFI` record. The _size_ field is | ||
|  | the total number of bytes of machine code covered by this record and any | ||
|  | subsequent `STACK CFI` records (until the next `STACK CFI INIT` record). The | ||
|  | _address_ field is relative to the module's load address. | ||
|  | 
 | ||
|  | A `STACK CFI` record (no `INIT`) is the same, except that it mentions only those | ||
|  | registers whose recovery rules have changed from the previous CFI record. There | ||
|  | must be a prior `STACK CFI INIT` or `STACK CFI` record in the symbol file. The | ||
|  | _address_ field of this record must be greater than that of the previous record, | ||
|  | and it must not be at or beyond the end of the range given by the most recent | ||
|  | `STACK CFI INIT` record. The address is relative to the module's load address. | ||
|  | 
 | ||
|  | Each expression is a breakpad-style postfix expression. Expressions may contain | ||
|  | spaces, but their tokens may not end with colons. When an expression mentions a | ||
|  | register, it refers to the value of that register in the callee, even if a prior | ||
|  | name/expression pair gives that register's value in the caller. The exception is | ||
|  | `.cfa`, which refers to the canonical frame address computed by the .cfa rule in | ||
|  | force at the current instruction. | ||
|  | 
 | ||
|  | The special expression `.undef` indicates that the given register's value cannot | ||
|  | be recovered. | ||
|  | 
 | ||
|  | The register names preceding the expressions are always followed by colons. The | ||
|  | expressions themselves never contain tokens ending with colons. | ||
|  | 
 | ||
|  | There are two special register names: | ||
|  | 
 | ||
|  | *   `.cfa` ("Canonical Frame Address") is the base address of the stack frame. | ||
|  |     Other registers' rules may refer to this. If no rule is provided for the | ||
|  |     stack pointer, the value of `.cfa` is the caller's stack pointer. | ||
|  | 
 | ||
|  | *   `.ra` is the return address. This is the value of the restored program | ||
|  |     counter. We use `.ra` instead of the architecture-specific name for the | ||
|  |     program counter. | ||
|  | 
 | ||
|  | The Breakpad stack walker requires that there be rules in force for `.cfa` and | ||
|  | `.ra` at every code address from which it unwinds. If those rules are not | ||
|  | present, the stack walker will ignore the `STACK CFI` data, and try to use a | ||
|  | different strategy. | ||
|  | 
 | ||
|  | So the CFI for the example function above would be as follows, if `func` were at | ||
|  | address 0x1000 (relative to the module's load address): | ||
|  | 
 | ||
|  | ``` | ||
|  | STACK CFI INIT 1000 .cfa: $sp .ra: .cfa ^ | ||
|  | STACK CFI      1001 .cfa: $sp 16 + | ||
|  | STACK CFI      1002 $r0: .cfa 4 - ^ | ||
|  | STACK CFI      100b .cfa: $sp 20 + | ||
|  | STACK CFI      1015 $r0: $r0 | ||
|  | STACK CFI      1016 .cfa: $sp | ||
|  | ``` |