PDBCopy
PDBCopy is a tool that copies one pdb file to another stripping all of the rich symbolic content and leaving only the publics or subset of the publics in the output. The program removes all source and line number information, local and global symbols, and type data. All that remains are the publics. To use PDBCopy, it is important to understand exactly what the publics are.
The publics table is a simple two column table listing functions and variables that need to be accessed across source files (or obj modules) in an executable image. The simplest way to cause something to show up in the publics is to declare it as an "extern" in a C program. The only information stored in a pdb about a public is its name and address. Furthermore, if the program is C++, it will be decorated. The C compiler and MASM also decorate source code objects in other ways. Examples include preceding the item with an underscore character or following the name with an @<number> designation. The entry stub of an imported function shows up in the publics. It is composed of the name of the function preceded by "__imp_". An important limitation about publics is that no size information is available on them. There is no way for the debugger or any other tool to discover anything other than the address of the beginning of the object. The importance of this will be clarified later.
You can use a program such as dbh.exe to display the contents of the publics. Here we see an import, a function, and a simple variable.
1017a08 : 16 __imp__FCICreate
23 ?pathcpy@@YGXPAGPBG1K@Z
1017d00 : 15 ?gcontext@@3_KA
Most people will use PDBCopy to remove all symbols from a pdb file and leave a small amount of publics that the developer wants to make available. When doing so, one should consider the following example.
0x1000 base of module
0x2000 FuncOne - first public to expose
0x2039 end of FuncOne
0x3000 FuncSix - second public to expose
0x3059 end of FuncSix
0x3999 end of module
In this example, there are two functions that we want to expose called FuncOne and FuncSix. We want to expose them so that when these functions are found on the stack of a crash dump, that the two functions are inert pass-through functions that should not be considered as the cause of the crash. The way the debugger works is that if an address is found on the stack, then the symbol with the closest lower address is selected. Since publics contain no size information, there is no way for the debugger to know if the address actually falls within the boundaries of the object that the public points to. So if we have an address of 0x2030, the debugger will find FuncOne. Now if we have an address of 0x2050, the debugger will also match FuncOne for that address, even though we can see from the chart that 0x2050 does not exist within the actual bounds for FuncOne.
Consequently, it is desirable to also expose as publics, the addresses of the functions that immediately follow the functions that you want to expose. That way we can avoid false positives. So our example would look like this.
0x1000 base of module
0x2000 FuncOne - first public to expose
0x2039 end of FuncOne
0x2040 FuncTwo - marks the end of the first public to expose
0x3000 FuncSix - second public to expose
0x3059 end of FuncSix
0x3060 FuncSeven - marks the end of the second public to expose
0x3999 end of module
Now, if we find an address of 0x2033, we know that this is a pass through-function and we should continue walking the stack to find the cause of the crash. However, if the address is 0x2052, we know that this is not one of the pass-through functions and that we should pass the information on to the driver owner.
If you don't want to expose the names of functions after the ones that you do want to expose, you could change your source code to insert dummy functions immediately after the ones you are interested in.
Presuming the module is called foo, here are some examples of the stack frames we might get from a pdb that has been modified as described here.
0x1070 foo+0x70
0x2020 foo!FuncOne+20
0x2050 foo!FuncTwo+10
0x3002 foo!FuncSix+2
0x3070 foo!FuncSeven+10
By adding the publics for the symbols you are interested in as well as the symbols that follow, you are able to selectively identify when a stack frame falls within a place that interests you, without exposing all the symbols of your module.
PDBCopy takes few parameters. Here is the command line help.
PDBCopy
usage: PDBCopy <source_pdb> <destination_pdb> [-p] [-s] [-vc6] [-?]
[-p] remove private debug information
[-s] create new signature
[-vc6] use mspdb60.dll
[-f:] filter specific public symbols out of stripped pdb
[-F:] leave only specific public symbols in stripped pdb
[-?] display this message
If you have a list of publics that you want to preserve, you can use an input file specify them. The format of the file is one line of text for every function to preserve. Here is an example.
_SymLoadModuleExW@36
_SymLoadModule64@28
_SymLoadModuleEx@36
_SymLoadModule@24
A command line that removes all symbols and all publics except the ones listed in files.txt would look like this.
pdbcopy.exe dbghelp.pdb dbghelp.stripped.pdb -p -F:@files.txt
This generates the new pdb as dbghelp.stripped.pdb. Remember that to use the stripped pdb, it should be renamed back to the original name. Also the -s parameter should never be used for these purposes. This would prevent the symbol server from working with it.
Final note: PDBCopy relies on the existence of mspdb80.dll in your executable path. This file is normally found in Visual Studio. If you want to process symbols that were generated with an older compiler, it will use mspdb60.dll which can be found along with the older corresponding tools.
dbh
Dbh is a program that exposes many of the symbol API functions in dbghelp.dll. It can used to examine and find the symbols for an executable image. The command line syntax is as follows.
usage: dbh [-v] [-n] [-c] [-d] [-p] [targetmodule] [command]
[-v] [-n] display noisy symbol spew
[-d] load decorated publics
[-p:XXXX] attaches to process ID XXXX
[targetmodule] load symbols for specified module
[command] execute command and exit
You can learn about the syntax of the actual dbh commands by typing ? from the dbh prompt.
dbh commands :
help : prints this message
q quit : quits this program
v verbose <on/off> : controls debug spew
load <modname> : loads the requested module
u unload : unloads the current module
x enum <mask> : enumerates all matching symbols
n name <symname> : finds a symbol by it's name
a addr <addr> : finds a symbol by it's hex address
m enumaddr <addr> : lists all symbols with a certain hex address
b base <address> : sets the new default base address
s next <add/nam> : finds the symbol after the passed sym
p prev <add/nam> : finds the symbol before the passed sym
l line <file#num> : finds the matching line number
laddr <address> : finds a source line by it's corresponding hex address
j linenext : goes to the next line after the current
k lineprev : goes to the line previous to the current
f ff <path> <file> : finds file in path
ffpath <file> : finds file in symbol path
r src <mask> : lists source files
+ add <name addr> : adds symbols with passed name and address
y ss : executes a symbol server command
m enumaddr <addr> : enum all symbols for address
z locals <name> : enum all scoped symbols for a named function
map <name> : call MapDebugInfo on the named file
multi <name> : loads the requested module 1000 times
t type <name> : lists the type information for the symbol
i info : displays information about the loaded module
o obj : displays object files in the loaded module
e elines : enumerates lines for an obj and source file
srch <mask> <tag> : enumerates all symbols for a matching SymTag
srchaddr <addr> : enumerates all symbols for a matching address
dtag : displays all the symtag values
undec <name> : undecorates a given symbol name
findexe <name> <path> : locates an image in the symbol path
setpath <path> : sets the symbol search path
getpath : gets the symbol search path
dir <path> : calls EnumDirTree on the path
dir <fname> <path> : calls EnumDirTree to find filename on path
index <val> : finds symbol with matching index value
scope <add/nam> : finds the parent of a symbol
etypes <mask> : enumerates all matching types
enummod <mask> : enumerates all modules (there will only be one
elem <file> <file> : finds the symbol server element to store
srvind <file> : finds the symbol server index for store
symsrv <srv><file> : stores file in symbol server store
sympath <path> : tests if path is to a symbol store
storeadd <img> <store> : adds an image to a symbol store
getsym <img> <sym> : finds the matching symbol for and image
srclines <file> <line> : finds matching source lines
mod <base> : changes default module
refresh : refreshes the module list
home <path> : sets the home directory
However for our purposes, we are interested in its ability to dump the contents of the publics table. This is best achieved with this simple command.
dbh.exe -d yourmodule.pdb enum *
This will enumerate all publics that match the "*" mask. If you ran this command on a target that had normal symbols, it would expose them in the same way instead of the privates. I.E. dbh.exe favors privates over publics, just as the debugger does. The enum command, like many other dbh commands accepts simple regular expressions. By putting the enum * command on the dbh command line, you cause dbh to operate in batch mode. Consequently the program will dump the publics to the standard output and immediately exit. You can redirect the output through the sort.exe filter to get this list sorted by address.
dbh.exe -d yourmodule.pdb enum * | sort
With these capabilities, you should be able to find the decorated names of the functions you want to expose using pdbcopy. You can also find the functions that follow the ones you want to expose by sorting by address.
|