ELF File Format
What is a Program and How does it run on CPU?
Program is a mix of code and data that makes the CPU do what we want, aka super powers.
Sadly CPU only speaks Zeros and Ones that what all people say 🤷♂️, but the real language that we can talk to CPU still being able to keep our sanity is Instruction Set.
Ahh, sorry for the deviation.
So yeah zeros and ones.. we need to store them somewhere, thats where Memory kicks in.
Memory can be devices such as Hard Disk, DDR, Floppy Disks, Pen Drives, SD Cards and even Puch Holes Cards.. yep those too. Generally Modern system uses RAM i.e Random Access Memory to execute the program from. Now CPU goes through all those ones and zeros one after another and does magic as we expect it to do.
Thats it, thats how stupid, i mean simple the CPU is.
Now how to load that program in memory? we can’t just say here is the code and data, run this. Yes we can do that but thats not secure and flexible enough when we an OS comes into picture and wants to run many programs at same time aka super power++.
OS manages memory of each program seperately so programs dont change other programs… I thought it was cool to change one program from another program. Then i thought, if a program from random internet site got into my anime player and shown isekai, it would be disasterous. So safety is a big priority in OSs.
We still want a format where we can say where code should go, where data should go but let the OS take care of loading it in memory and making sure it runs.
One of those format is ELF. Those ELF formated programs in form of Files are called ELF files.
What are ELF Files and Where do we See Them?
ELF files are files which contains code and data in binary, but with additional information about how these two has to placed in memory, for the program to run by CPU. Those programs we run in linux or BSD like ls, cat, touch, mkdir and even those sketchy ones we download from internet, most probably are ELF files.
From high-end linux servers used in cloud to gaming console such as play stations and even the cheap MCUs. All of them us e ELF files to run the programs.
I forgot to tell the full form btw, its Executable and Linkable Format
(formerly named Extensible Linking Format).
It was chosen by the grandad of OSs.. UNIX
as standard binary file format for programs.
Its very flexible and extensible and blah blah… but the feautre I like the most is that, it doesn’t care what CPU are you runing it on. Its not like ELF files are only for x86 or ARM or RISV, or even LINUX or UNIX likes. The same format works every where its supported, btw if you go to wiki page you will know there a lot of things that support it.
What is in ELF File?
Lets take the example of cat
program.
cat
is a program which lets see the contents of file.
for example hello.txt
has hello from txt file
in it. thenLets hexdump it.
Not much we can understand right here. Lets tell hexdump to show if it sees any know ascii charcters values in file
We find something intresting on first 4 bytes of this program, it says .ELF
. This is the standard for ELF files to start with .ELF
and its called ELF identifier. not only cat, many other programs start with .ELF.
Btw hexdump is also a program so it should also contain .ELF in starting.
So what are those bits after .ELF you may ask, they are the information of the program. Like
- which CPU can run this program.
- what type of program this is.
- from where on this file CPU should start executing.
- which parts of this file should be loaded into memory.
- how many memory loadable parts are there. etc..
All these info are stored at starting of File called as Header. Before diving into the fields of the header we need to know two important words Segment and Section.
What is Segment And Section ?
We can simply say segment is a the part of the file which is to be loaded in memory for execution aka program code and data. So section becomes the part which is additional information about a the code and data that is used while relocating the program.
yep thats confusing. here are some points which will help in us understanding better (ofc got these from stackoverflow).
-
segments contain information that is necessary for runtime execution, while sections for linking an relocation
-
section: tell the linker if a section is either:
- raw data to be loaded into memory, e.g. .data, .text, etc.
- or formatted metadata about other sections, that will be used by the linker, but disappear at runtime e.g. .symtab, .srttab, .rela.text
-
segment: tells the operating system:
- where should a segment be loaded into virtual memory
- what permissions the segments have (read, write, execute). Remember that this can be efficiently enforced by the processor.
Now we know what section and segment is. ELF headers contains the information about where these sections and segments are located in file.
So finally we can say ELF files contains header, sections and segments.
How are ELF Files Strucutured?
Now we need to know all the fields of the ELF files, it will take forever for me to write it down and its boring to read them all. So i suggest you to watch a youtube video where the explaination is great and read linux reference specification for further understanding.
After watching the video go through the code below I wrote for getting the loadable segments in memeory. Try to understand how it works in code and its in C.
The best programming language bestowed upon us by gods, C
typedef uint16_t HalfWord;
typedef uint32_t Word;
typedef uint32_t Address;
typedef uint32_t Offset;
#define EI_NIDENT 16
typedef struct
{
unsigned char e_ident[EI_NIDENT];
HalfWord e_type;
HalfWord e_machine;
Word e_version;
Address e_entry;
Offset e_phoff;
Offset e_shoff;
Word e_flags;
HalfWord e_ehsize;
HalfWord e_phentsize;
HalfWord e_phnum;
HalfWord e_shentsize;
HalfWord e_shnum;
HalfWord e_shstrndx;
} ElfHeader;
typedef struct
{
Word p_type;
Offset p_offset;
Address p_vaddr;
Address p_paddr;
Word p_filesz;
Word p_memsz;
Word p_flags;
Word p_align;
} ElfPhdr;
typedef struct
{
Word sh_name;
Word sh_type;
Word sh_flags;
Address sh_addr;
Offset sh_offset;
Word sh_size;
Word sh_link;
Word sh_info;
Word sh_addralign;
Word sh_entsize;
} ElfShdr;
typedef enum {
SEG_READ = 0x4,
SEG_WRITE = 0x2,
SEG_READ_AND_WRITE = 0x6,
/* more options exist but for now we arent
concerned about them
SEG_EXECUTE */
}SegFlags;
typedef struct {
uint32_t addr;
uint32_t size;
SegFlags flag;
uint8_t* mem_arr;
} MemSegment;
typedef struct {
MemSegment **listSegs;
int nSegs;
uint32_t enrtyPointAddr;
} Memory;
Memory* MemoryGetFromProg(uint8_t *program) {
/* extracting the ELF header */
ElfHeader* hdr = (ElfHeader*)program;
/* init ELF program header list */
ElfPhdr* phdrList[hdr->e_phnum];
/* init Memory struct */
Memory* mem = (Memory*)malloc(sizeof(Memory));
/* finding the number of loadable segments */
int t = 0;
for (size_t i = 0; i < hdr->e_phnum; i++) {
phdrList[i] = (ElfPhdr*)(program + hdr->e_phoff + (i * hdr->e_phentsize));
if (phdrList[i]->p_type == 0x01 ) { /* Load type p_type == 1 */
t++;
}
}
if (t == 0) {
printf("error: there is no loadable segment in program\n");
return 0;
}
/* init the memory segments list that are loadable */
mem->listSegs = (MemSegment**)malloc(t * sizeof(MemSegment*));
mem->nSegs = 0;
mem->enrtyPointAddr = hdr->e_entry;
/* extracting the data to be loaded in memory */
for (size_t i = 0; i < hdr->e_phnum; i++) {
if (phdrList[i]->p_type == 0x01 ) { /* Load type p_type == 1 */
mem->listSegs[mem->nSegs] = (MemSegment*)malloc(sizeof(MemSegment));
mem->listSegs[mem->nSegs]->addr = phdrList[i]->p_vaddr;
mem->listSegs[mem->nSegs]->size = phdrList[i]->p_memsz;
mem->listSegs[mem->nSegs]->mem_arr = (program+phdrList[i]->p_offset);
mem->listSegs[mem->nSegs]->flag = phdrList[i]->p_flags;
mem->nSegs++;
}
}
return mem;
}
Conclusion
Hope you now understand a little more about how you programs work and got you little more intrigued about how our computer work.
Have a great day.