Last little fixes

Finally I managed to draw the main menu in the screen.

I note two things:

  1. The message Demo is from a different game version! is being printed continuously
  2. If I start a new game, I got another crash related to accessing memory I don’t own.

The first one is easy to fix, searching for the error string I find it in file g_game.c at line 1592. With the debugger I find out that the WAD is expecting a game with version 109 and the code is reporting version 110. So is simple, just change the value in file doomdef.h at line 34.

Now, the next error is a harder to find out, again an error caused by assuming the target platform is 32 bits.

Problem appears in different places every time I run the game, for example one I can identify is located at p_setup.c in line 310. Now, here hell could be started. Previous issues regarding pointers were easy to fix because the heap of Z was empty/almost empty. Now at this point of the code a lot of memory have been requested to Z. I came out with a solution that isn’t elegant at all, but worked.

I modified memblock_t so it can store a string with the name of its owner, Z_Malloc() in z_zone.h and z_zone.c to receive the name of the owner and store it in the modified memblock_t and print it to a log file and finally also modified Z_Free() to print also when memory is freed, by who and where. Then I had to modify every call to Z_Malloc() and manually pass a string to identify the owner with the format: “variablename@filename:line”.

After reading with more detail the code in z_zone.h I realized there was a more elegant way for doing all this with macros but I’ve already made it my way so, who cares?

After all this I ran the game and checked where the problem was. After analyzing the stack when the game crashes I realize is because all the information in the “rover” used by Z to find free space in the heap is messed up. So I go verifying every entry in the output log and I find that the last variable that allocated memory decided to use more than it requested.

The variable is “linebuffer” in p_setup.c at line 537.

linebuffer=Z_Malloc(total*4,PU_LEVEL,0);

Again, the error is assuming a pointer has size of four bytes. This is easily fixed using the sizeof() operator. After applying this fix, the game runs without problem (still it throws a lot of garbage about sounds but that’ll be fixed later).

I published a branch with my ugly solution to dump the user of Z_Malloc().

Dynamic memory and Doom

There are two interesting files in the source code: z_zone.h and z_zone.c. These two files contain the interface an implementation to the dynamic memory system of the engine. From now on I’ll call this system Z, for short.

As far as I know back in 1993, when Doom was released, computational resources where scarce, including RAM memory. I think the idea to implement Z was assure enough memory for the game from the beggining and prevent any bottleneck memory allocation could cause.

Now there are two things that worry/confuse me regarding Z:

  1. Not all memory allocations are done using this Z. There are some malloc calls here and there in the code.
  2. Not all the memory alloced for the program is freed, specially the one used for Z.

Anyway, the game doesn’t crash so is fine, I think…

Let’s dig into the code, the interface has a struct and nine functions to manage Z. I only checked out three (the more important):

  1. memblock_t – Is the struct that contains data about a “memory chunk” like: what is the size, who is the owner and the contiguous “chunks”. Basically is a double linked list with metadata about the memory allocated.
  2. Z_Init() – Initializes Z. Allocates memory from the operating system (with a malloc via I_ZoneBase) and sets the data about the memory allocated: the size a special memblock_t called “blocklist” that contains pointers to the first “memory chunk” and the last one and a “rover” that is a pointer to the first free “memory chunk”.
  3. Z_Malloc() – Is the equivalent to malloc in Z (yeah, that was hard to find out). It starts by using a very smart way to round the size requested to a multiple of four and the adds the size of memblock_t. The it searches for enough free space and then claims it. It assigns at the beginning of the “memory chunk”  a memblock_t  and sets the data and then returns the rest of the “memory chunk” as a void pointer.
  4. Z_Free() – Again, the equivalent to free in Z, it pretty simple: it only retrieves the memblock_t for the “memory chunk” that was passed and marks it as free.

The implementation itself, but so far this one have been the one that I required to check to get the game working.

The 64-bit conspiracy (III)

Hit the “Run”, again, and know this time XQuartz opens! A new window appears on the screen! Crash…

EXC_ARITHMETIC (code=EXC_I386_DVI, subcode=0x0)

Let’s check were the error is, the file is r_main.c, at line 719:

pspriteiscale = FRACUNIT*SCREENWIDTH/viewwidth;

A division by zero? How…? Why..? How the f…? Oh God, I think I know where this is going.

Back when I was trying to Doom compile for the first time I had an error with the variable defaults in file m_misc.c, because there were values that couldn’t be resolved at compile time, so I removed the values of the variable hoping to fix it when the time came. One of the values assigned is called “screenblocks”, and the variable viewwidth is assigned using the value from a variable called setblocks some lines above.

It seems the solution is to assign the default values, but as I noted in my previous post if I initialize the defaults variable, the code fails to compile. I’ll check again how it fails, here is the full initialization code:

default_t	defaults[] =
{
    {"mouse_sensitivity",&mouseSensitivity, 5},
    {"sfx_volume",&snd_SfxVolume, 8},
    {"music_volume",&snd_MusicVolume, 8},
    {"show_messages",&showMessages, 1},

#ifdef NORMALUNIX
    {"key_right",&key_right, KEY_RIGHTARROW},
    {"key_left",&key_left, KEY_LEFTARROW},
    {"key_up",&key_up, KEY_UPARROW},
    {"key_down",&key_down, KEY_DOWNARROW},
    {"key_strafeleft",&key_strafeleft, ','},
    {"key_straferight",&key_straferight, '.'},

    {"key_fire",&key_fire, KEY_RCTRL},
    {"key_use",&key_use, ' '},
    {"key_strafe",&key_strafe, KEY_RALT},
    {"key_speed",&key_speed, KEY_RSHIFT},

// UNIX hack, to be removed. 
#ifdef SNDSERV
    {"sndserver", (int *) &sndserver_filename, (int) "sndserver"},
    {"mb_used", &mb_used, 2},
#endif

#endif

#ifdef LINUX
    {"mousedev", (int*)&mousedev, (int)"/dev/ttyS0"},
    {"mousetype", (int*)&mousetype, (int)"microsoft"},
#endif

    {"use_mouse",&usemouse, 1},
    {"mouseb_fire",&mousebfire,0},
    {"mouseb_strafe",&mousebstrafe,1},
    {"mouseb_forward",&mousebforward,2},

    {"use_joystick",&usejoystick, 0},
    {"joyb_fire",&joybfire,0},
    {"joyb_strafe",&joybstrafe,1},
    {"joyb_use",&joybuse,3},
    {"joyb_speed",&joybspeed,2},

    {"screenblocks",&screenblocks, 9},
    {"detaillevel",&detailLevel, 0},

    {"snd_channels",&numChannels, 3},

    {"usegamma",&usegamma, 0},

    {"chatmacro0", (int *) &chat_macros[0], (int) HUSTR_CHATMACRO0 },
    {"chatmacro1", (int *) &chat_macros[1], (int) HUSTR_CHATMACRO1 },
    {"chatmacro2", (int *) &chat_macros[2], (int) HUSTR_CHATMACRO2 },
    {"chatmacro3", (int *) &chat_macros[3], (int) HUSTR_CHATMACRO3 },
    {"chatmacro4", (int *) &chat_macros[4], (int) HUSTR_CHATMACRO4 },
    {"chatmacro5", (int *) &chat_macros[5], (int) HUSTR_CHATMACRO5 },
    {"chatmacro6", (int *) &chat_macros[6], (int) HUSTR_CHATMACRO6 },
    {"chatmacro7", (int *) &chat_macros[7], (int) HUSTR_CHATMACRO7 },
    {"chatmacro8", (int *) &chat_macros[8], (int) HUSTR_CHATMACRO8 },
    {"chatmacro9", (int *) &chat_macros[9], (int) HUSTR_CHATMACRO9 }

};

So I’ll try to assign value by value to verify which is the one that causes the error. If I only assign the first value the code compiles, so is discarded to be the cause of the error. Assign the second value too, also the code compiles. Again, everything goes ok for the third, the fourth and so on. Until I reach the value called “sndserver”. One of these three values has the fault. The first is a string, like the others so is not that, the second is a pointer and all the others have pointers, that’s fine. The third is a string casted to an integer, and strings are pointers to character, that means is a pointer casted to…

Oh God no, the ride never ends!

The solution requires a mix of the two last crashes related to pointers in a 64-bit platform. First we will need to change the variable in the structure from integer to intptr_t and cast to that data type instead of cast to integer.

I make that change and, everything compiles!

Now, let’s press the “Run” button…

The 64-bit conspiracy (II)

After realizing I have to move the source code from 32-bit to 64-bit I press the “Run” button once again.

And once again another issue related to the 64-bit transition appeared, the program stops executing in file w_wad.c, line 461, but following the stack trace the real issue is in r_data.c in line 644:

void R_InitColormaps (void)
{
    int	lump, length;

    // Load in the light tables, 
    //  256 byte align tables.
    lump = W_GetNumForName("COLORMAP"); 
    length = W_LumpLength (lump) + 255; 
    colormaps = Z_Malloc (length, PU_STATIC, 0);
    colormaps = (byte *)( ((int)colormaps + 255)&~0xff);
    W_ReadLump (lump,colormaps); 
}

The fix is again hard to spot but easy to fix. The second time colormaps is assigned, its previous value is casted to an integer, in other words is casted as a 32-bit. The fix is simple, the C standard has a data type in the header stdint.h called uintptr_t, which is an unsigned integer that can hold an entire pointer, so when is compiled to 32-bits its size is four bytes and when is compiled in 64-bits its size is eight bytes.

After fixing this, the same issue appears again but now in file r_draw.c at line 466 the bright side is I know how to fix it.

Licensing stuff

I was thinking about dumping the data from wadutil to a file when I realized I haven’t applied a license to wadutil. Then I realized something else, the GPL license is not in the repository, looking a little closely to the code I realize the version of the Doom source code I forked is under the Doom Source License (DSL) and not the GPL.

From what I read, the difference between DSL and GPL could be summarized in that DSL is more restrictive than GPL, there are more limitations on how you can distribute the code.

I searched the internet for a copy of the GPL version of Doom source code, which seems to be equals to the DSL version, so there is no problem to just switch the license in the code (I think) and add the GPL license to the repository.

Now for wadutil I choose LGPL instead of GPL because is more flexible when you want to link against other licenses.

So now officially wadutil is under the LGPL license.

The 64-bit conspiracy

Last problem took me a time to crack.

Everything seemed right. To reach the cause of the error, I had to open the WAD file in a hex editor and view the hex values with the Xcode debugger. It seems I’ll have to deal again with this kind of errors.

Press “Run” once again and I got another crash, in r_data.c at line 344:

	for ( ; x<x2 ; x++)
	{
	    patchcount[x]++;
	    collump[x] = patch->patch;
	    colofs[x] = LONG(realpatch->columnofs[x-x1])+3;
	}

For some reason, assigning a value to collump causes the program to crash with the message:

EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

I think this means I’m accessing a not valid memory address. So with the debugger I check the value of x is just zero, so it crashes in the first iteration of the loop. So maybe the memory was no assigned, so I’ll go back to the moment collump was assigned and check it. It turns out to be a long path: collump is assigned to an entry of the global variable texturecolumnlump which is assigned in line 482 along with other global variables:

    textures = Z_Malloc (numtextures*4, PU_STATIC, 0);
    texturecolumnlump = Z_Malloc (numtextures*4, PU_STATIC, 0);
    texturecolumnofs = Z_Malloc (numtextures*4, PU_STATIC, 0);
    texturecomposite = Z_Malloc (numtextures*4, PU_STATIC, 0);
    texturecompositesize = Z_Malloc (numtextures*4, PU_STATIC, 0);
    texturewidthmask = Z_Malloc (numtextures*4, PU_STATIC, 0);
    textureheight = Z_Malloc (numtextures*4, PU_STATIC, 0);

Everything seem alright here, right? No! Let’s go up to the definition of these variables (line 142):

texture_t**	textures;

int*			texturewidthmask;
// needed for texture pegging
fixed_t*		textureheight;		
int*			texturecompositesize;
short**			texturecolumnlump;
unsigned short**	texturecolumnofs;
byte**			texturecomposite;

All of them are arrays! Or even worst, arrays of pointers! That means this instruction:

    texturecolumnlump = Z_Malloc (numtextures*4, PU_STATIC, 0);

Is only allocating memory for half the pointers! Once I find this the fix is pretty easy, and in the way I fix other bugs:

    textures = Z_Malloc (numtextures * sizeof(texture_t *), PU_STATIC, 0);
    texturecolumnlump = Z_Malloc (numtextures * sizeof(short *), PU_STATIC, 0);
    texturecolumnofs = Z_Malloc (numtextures * sizeof(unsigned short *), PU_STATIC, 0);
    texturecomposite = Z_Malloc (numtextures * sizeof(byte *), PU_STATIC, 0);
    texturecompositesize = Z_Malloc (numtextures * sizeof(int), PU_STATIC, 0);
    texturewidthmask = Z_Malloc (numtextures * sizeof(int), PU_STATIC, 0);
    textureheight = Z_Malloc (numtextures * sizeof(fixed_t), PU_STATIC, 0);

Now, this is something that worries me. It means a lot of things are hard coded to work in 32 bits. And all of these things have to be fixed before the game can even run!

I know it could be fix easy by changing the target architecture back to 32 bits, but I want it to run in 64 bits as all the modern programs for OS X run on 64 bits, anyway I’ll be careful and I won’t hard code anything to 64 bits, who knows maybe someone will want to compile the original Doom in 128 bits.

The evil pointer

Last time I fixed a crash when the engine tried to shutdown the video system without even having turning it on. With that fixed, I’ll move on to an try to load a WAD into the engine, for this purpose I’ll use the shareware version of Doom: DOOM1.WAD.

I put the WAD file in the same directory of the output binary, I launch it and get the following message:

Error: R_InitTextures: Missing patch in texture DOORBLU

The error is produced in the file r_data.c, at line 544:

for (j=0 ; j<texture->patchcount ; j++, mpatch++, patch++) {
	patch->originx = SHORT(mpatch->originx);
	patch->originy = SHORT(mpatch->originy);
	patch->patch = patchlookup[SHORT(mpatch->patch)];

	if (patch->patch == -1) {
		I_Error ("R_InitTextures: Missing patch in texture %s", texture->name);
	}
}

So struct called patch that has a very descriptive variable called patch seems to be invalid. Is nice the fact that just before that, such variable is assigned because we can start to trace back the problem, there are two variables involved:

  1. patchlookup: It contains the position of each lump containing a patch listed in PNAMES.
  2. mpatch->patch: Each texture can contain data of how patches are applied to the texture. mpatch is a pointer to that data and the variable patch of the structure seems to point the patch number in the patchlookup array.

Now, according to the dump wadutil produces, the first value for mpatch should be the following:

  • originx: 0
  • originy: 112
  • patch: 61
  • stepdir: 1
  • colormap: 0

But the values that the debugger show are completely different:

  • originx: 0
  • originy: 0
  • patch: 256
  • stepdir: 128
  • colormap: 0

The conclusion is simple, the data is read incorrectly. The way the engine reads the data is plain simple, it allocates a big chunk of memory and copies the contents of the WAD into that chunk of memory. Then the chunk of memory is casted to the format of the data when needed.

The fix come to be lame: The structure that contains the info for the texture has a pointer.

typedef struct
{
    char		name[8];
    boolean		masked;	
    short		width;
    short		height;
    void		**columndirectory;	// OBSOLETE
    short		patchcount;
    mappatch_t	patches[1];
} maptexture_t;

It seems lame, but is a delicate matter. Doom was built for 32 bit machines, and the pointers are four bytes long in that architecture. But I’m building for 64 bit machines and the pointers are eight bytes long! This provokes the variable columndirectory to “steal” the data from the patachcount variable! (yes, that obsolete variable).

When people say C is insecure and not safe, they mean things like this.

The fix is easy, just change the variable type to an integer so its size is again four bytes. I want to note one thing here: any other struct that uses a pointer will contain incorrect data! Bad news are this is very possible and is very likely I will have to fix more errors like this. Good news are now I know this can happen and I can fix it… I guess…

WAD structure

So, the next issue I have has to do with the textures that are loaded from the WAD file, so I think is a good idea to learn about WAD files before continue with this journey and for that I made a little app to dump data from the WAD called wadutil (again, a very original name) and the source code is available in the VROOM GitHub.

In this post I’ll explain the portions of the WAD file that wadutil dumps at the moment of writing this post.

Before jumping to the general structure of the WAD I want to toss first some concept that I think are necessary to understand how they work:

  • WAD Type: There is two types of WAD, internal WADs (also known as IWAD) that contain the data for the game and patch WADs (also known as PWAD) that replace data from the IWADs.
  • Lump: All the data necessary for the engine to run the game is contained in the WAD file. This data is varied, it goes from simple numeric values to the audio and music in the game. All this data is stored in a “lumps” which is just a chunk of bytes stored in the WAD.
  • Directory: It contains a list of data about the all the lumps. Each entry in the directory refers to one lump and tells the name, size and position on the file.

Once this information is clear, let’s jump the the structure of a WAD file:

  1. The header: It only contains the type of WAD, the count of lumps and the position of the directory in the file. It’s size if only twelve bytes.
  2. The lumps: This is the biggest portion of the file and contains all the lumps, one after another. The size is variable according to how much lumps are in the file and the size of each one.
  3. The directory: Is the last element in the WAD file, it contains the data about the lumps one after another.

That’s all. Pretty simple. Kind of.

Currently wadutil only dumps two kind of lumps:

  1. PNAMES: This lump contains a list of names for the textures’ patches. That simple. Each patch instead of including its own name, includes an integer that points its name in the PNAMES lump.
  2. TEXTURE1 and TEXTURE2: This lump contains a list of textures, and each texture contains a list of patches.

First run

I know the engine requires a WAD to work, but still I want to know that I works!

Let’s hit “Run” in Xcode, stuff appears on the terminal output!

vroom first run output

Oh, it crashes with a “EXC_BAD_ACCESS”, which means is playing with memory that is not allocated. The error is located in file i_video.c, at line 167 when calling XShmDetach():

void I_ShutdownGraphics(void)
{
  // Detach from X server
  if (!XShmDetach(X_display, &X_shminfo))
	    I_Error("XShmDetach() failed in I_ShutdownGraphics()");

  // Release shared memory.
  shmdt(X_shminfo.shmaddr);
  shmctl(X_shminfo.shmid, IPC_RMID, 0);

  // Paranoia.
  image->data = NULL;
}

With the debugger I learn that X_display is set to NULL. After a long search in X11’s documentation and code I can’t find what’s wrong. Something tells me that the crash is caused because X_display is null, so to test my theory I initialize X_display with just before the call to XShmDetach():

X_display = XOpenDisplay(0);

if (!XShmDetach(X_display, &X_shminfo))
...

And now it crashes in the last line of the function (image is also null). So the fast fix here is to check if X_display is null, then there is no graphics to shutdown:

void I_ShutdownGraphics(void)
{
  if (!X_display) return;

  // Detach from X server
  if (!XShmDetach(X_display, &X_shminfo))
	    I_Error("XShmDetach() failed in I_ShutdownGraphics()");

  // Release shared memory.
  shmdt(X_shminfo.shmaddr);
  shmctl(X_shminfo.shmid, IPC_RMID, 0);

  // Paranoia.
  image->data = NULL;
}

Hit “Run” and finally everything executes just as expected:

vroom success output

 

As a side note, that function still has a bug, it was called by I_Error and it calls I_Error again, I’m not sure in what case should XShmDetach fail, but it is possible to cause an infinite loop.