Main Page | Report this Page
Linux Forum Index  »  Linux Development - System  »  C/C++: Determine [My] Module's Load Address...
Page 1 of 1    

C/C++: Determine [My] Module's Load Address...

Author Message
Jeffrey Walton...
Posted: Sat Nov 07, 2009 5:00 am
Guest
Hi All,

I'm trying to determine my module's load address at runtime. By
'module's load address', I mean byte[0] of the in-memory image (ie,
the first byte of the Elf32_Ehdr). I believe I want information from
the struct module in kernel/modules.c. I did find sys_query_module,
but it has been depricated.

Everything I've found on the web is kernel-centric [1,2], and Stevens
does not cover it in Advanced Unix Programming. In the Windows world,
I would use __ImageBase (fixed up by the link-loader) or
GetModuleHandle(...).

Can anyone point me to the proper syscall? (Or to a forum that fields
C/C++ and Linux API questions).

Thanks,
Jeffrey Walton

[1] LKML: "Richard B. Johnson": Re: determining load address of module
[2] Linux-Kernel Archive: Re: determining load address of module
 
Alan Curry...
Posted: Sat Nov 07, 2009 12:44 pm
Guest
In article <e5661e7b-0927-4cda-b1e9-652274633910 at (no spam) t18g2000vbj.googlegroups.com>,
Jeffrey Walton <noloader at (no spam) gmail.com> wrote:
Quote:
Hi All,

I'm trying to determine my module's load address at runtime. By
'module's load address', I mean byte[0] of the in-memory image (ie,
the first byte of the Elf32_Ehdr). I believe I want information from
the struct module in kernel/modules.c. I did find sys_query_module,
but it has been depricated.

Everything I've found on the web is kernel-centric [1,2], and Stevens

That's because you're using the word "module" in a foreign way. We don't use
it that way. Here, "module" means kernel module 99.44% of the time.

You can probably get what you want by parsing /proc/self/maps. The lack of a
well-known function to do this query should tip you off that it's not
considered a normal thing to ask. If you're writing something like a
debugger, fine. Otherwise, what's the purpose of finding the in-memory copy
of an ELF header? What are you going to do with that information that you
can't do without it? The dynamic linker should fix up any pointers you need
within your address space. Doing it manually is icky. (Oh, if you're writing
a dynamic linker that's fine too. An icky job!)

--
Alan Curry
 
Jeffrey Walton...
Posted: Sat Nov 07, 2009 3:08 pm
Guest
Hi Alan,

On Nov 7, 5:44 pm, pac... at (no spam) kosh.dhis.org (Alan Curry) wrote:
Quote:
In article <e5661e7b-0927-4cda-b1e9-652274633... at (no spam) t18g2000vbj.googlegroups..com>,
Jeffrey Walton  <noloa... at (no spam) gmail.com> wrote:

Hi All,

I'm trying to determine my module's load address at runtime. By
'module's load address', I mean byte[0] of the in-memory image (ie,
the first byte of the Elf32_Ehdr). I believe I want information from
the struct module in kernel/modules.c. I did find sys_query_module,
but it has been depricated.

Everything I've found on the web is kernel-centric [1,2], and Stevens

That's because you're using the word "module" in a foreign way. We don't use
it that way. Here, "module" means kernel module 99.44% of the time.
My bad. Would 'image' be a better term in the Linux world?


Quote:
The lack of a well-known function to do this query should tip
you off that it's not considered a normal thing to ask.
Agreed.


Quote:
If you're writing something like a debugger, fine. Otherwise, what's the
purpose of finding the in-memory copy of an ELF header? What are you
going to do with that information that you can't do without it?
FIPS integrity checks. Locating a particular section in memory is an

early smoke test.

Quote:
The dynamic linker should fix up any pointers you need within your
address space.

Doing it manually is icky. (Oh, if you're writing a dynamic linker that's
fine too. An icky job!)
Agreed.


I thought I found the load address in struct r_debug::r_ldbase (from
elf.h). But when I iterated the array of r_debugs, I found the base
address for ld-linux.so.2.

Using dl_iterate_phdr(3), the first header returned to my callback
from dl_iterate_phdr relates to my image's load address (the remaining
headers appear to be SO's). Assuming 4KB pages, it can be found in the
virtual address of dlpi_phdr:

static int callback(struct dl_phdr_info *info, size_t size, void
*data)
{
printf("base address=%10p\n", info->dlpi_phdr->p_vaddr & ~0xFFF);
printf("name=%s (%d segments)\n", info->dlpi_name, info-
Quote:
dlpi_phnum);
...

}

I believe, with a high degree of certainty, the image is being loaded
at 0x8048000:
(gdb) print (char*) 0x8048000
$1 = 0x8048000 "\177ELF\001\001\001"

This begs two questions. First, is 0x8048000 (for x86) always the
address base (or an address I can control from the linker)? Second,
does dl_iterate_phdr(3) always return the image's base information on
the *first* invocation of the callback.

Jeff
 
Alan Curry...
Posted: Sat Nov 07, 2009 4:59 pm
Guest
In article <d9c7a43a-46aa-45cd-98f4-059f5bd6cbcf at (no spam) m35g2000vbi.googlegroups.com>,
Jeffrey Walton <noloader at (no spam) gmail.com> wrote:
Quote:
Hi Alan,

On Nov 7, 5:44 pm, pac... at (no spam) kosh.dhis.org (Alan Curry) wrote:
In article
e5661e7b-0927-4cda-b1e9-652274633... at (no spam) t18g2000vbj.googlegroups.com>,
Jeffrey Walton  <noloa... at (no spam) gmail.com> wrote:

Everything I've found on the web is kernel-centric [1,2], and Stevens

That's because you're using the word "module" in a foreign way. We don't use
it that way. Here, "module" means kernel module 99.44% of the time.
My bad. Would 'image' be a better term in the Linux world?

I'm not sure what the definition of "module" is where you come from so I
can't translate it. It seems to include "main executable" and "shared
library" as subcases.

Quote:

Using dl_iterate_phdr(3), the first header returned to my callback

Oh you found a nice function to do the query after all.

Quote:

I believe, with a high degree of certainty, the image is being loaded
at 0x8048000:
(gdb) print (char*) 0x8048000
$1 = 0x8048000 "\177ELF\001\001\001"

This begs two questions. First, is 0x8048000 (for x86) always the
address base (or an address I can control from the linker)? Second,
does dl_iterate_phdr(3) always return the image's base information on
the *first* invocation of the callback.

0x8048000 has been the default for a while. Other archs do have different
defaults. I only vaguely remember the time when it used to be something
different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000
depending on linker options). You can override it when linking your program,
but as long as the program isn't rebuilt, the main executable will load at
the same address every time. Shared libraries can move around between
invocations (are position-independent), but the main program body won't.

readelf -l or objdump -p can show you where the program's segments will be
mapped. The interesting ones are the ones marked LOAD.

As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not
going to guess.

--
Alan Curry
 
Jeffrey Walton...
Posted: Sat Nov 07, 2009 6:36 pm
Guest
On Nov 7, 9:59 pm, pac... at (no spam) kosh.dhis.org (Alan Curry) wrote:
Quote:
In article <d9c7a43a-46aa-45cd-98f4-059f5bd6c... at (no spam) m35g2000vbi.googlegroups..com>,
Jeffrey Walton  <noloa... at (no spam) gmail.com> wrote:


[SNIP]

This begs two questions. First, is 0x8048000 (for x86) always the
address base (or an address I can control from the linker)? Second,
does dl_iterate_phdr(3) always return the image's base information on
the *first* invocation of the callback.

0x8048000 has been the default for a while. Other archs do have different
defaults. I only vaguely remember the time when it used to be something
different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000
depending on linker options). You can override it when linking your program,
but as long as the program isn't rebuilt, the main executable will load at
the same address every time. Shared libraries can move around between
invocations (are position-independent), but the main program body won't.

readelf -l or objdump -p can show you where the program's segments will be
mapped. The interesting ones are the ones marked LOAD.
Lots of objdump and readelf seemed to the trick. You're right about

LOAD - and I also needed flags = PF_R|PF_X to separate the code from
the data segment.

Quote:
As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not
going to guess.
With LOAD and PF_R|PF_X, I can find it every time.


Thanks for your help. I know I have a couple more questions for
tomorrow :)

Jeff
 
Tauno Voipio...
Posted: Sun Nov 08, 2009 3:35 pm
Guest
Jeffrey Walton wrote:
Quote:
On Nov 7, 9:59 pm, pac... at (no spam) kosh.dhis.org (Alan Curry) wrote:
In article <d9c7a43a-46aa-45cd-98f4-059f5bd6c... at (no spam) m35g2000vbi.googlegroups.com>,
Jeffrey Walton <noloa... at (no spam) gmail.com> wrote:


[SNIP]

This begs two questions. First, is 0x8048000 (for x86) always the
address base (or an address I can control from the linker)? Second,
does dl_iterate_phdr(3) always return the image's base information on
the *first* invocation of the callback.
0x8048000 has been the default for a while. Other archs do have different
defaults. I only vaguely remember the time when it used to be something
different on i386 (0x8000000 with libc5? pre-ELF it was either 0 or 0x1000
depending on linker options). You can override it when linking your program,
but as long as the program isn't rebuilt, the main executable will load at
the same address every time. Shared libraries can move around between
invocations (are position-independent), but the main program body won't.

readelf -l or objdump -p can show you where the program's segments will be
mapped. The interesting ones are the ones marked LOAD.
Lots of objdump and readelf seemed to the trick. You're right about
LOAD - and I also needed flags = PF_R|PF_X to separate the code from
the data segment.

As for the behavior of dl_iterate_phdr, I didn't know it existed so I'm not
going to guess.
With LOAD and PF_R|PF_X, I can find it every time.

Thanks for your help. I know I have a couple more questions for
tomorrow :)

Jeff


Befor you proceed too far, please think that you're
running on a demand-paged virtual-memory system.

You did not say it, but I guess that the relevant
processor architecture is Intel 386+.

The run executables are loaded at the same virtual
address, but the real physical addresses will be
determined dynamically at run-time. For different
processes running the same executable, the physical
addresses may be the same (for read-only sections).

The dynamic libraries are linked at most suitable
virtual addresses. The same library may be located
at different virtual addresses in different processes
at the same time. This is why the dynamic libraries
have to be position-independent code.

--

Tauno Voipio
tauno voipio (at) iki fi
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Thu Dec 10, 2009 11:12 pm