If you haven't heard of binaural recording, put on a good pair of stereo headphones and watch this video (or rather, listen to it). Pretty striking difference in terms of immersion from the audio we're used to hearing in games and movies. The idea behind it is to record using two microphones, one placed at each ear (perhaps even in ear-shaped molds to mimic precisely how sound travels through the ear canal), and when playing back the sound, play the left ear recording through the left stereo channel and the right ear recording through the right channel.
Part of the reason binaural recordings sound so much more realistic than standard stereo is that they capture the slight difference between time the sound takes to travel to each ear. For example, if a sound is coming from your right, it probably won't sound much louder in your right ear, but it will arrive at your right ear a tiny bit sooner. This slight difference is how your brain determines where the sound is coming from.
I decided to take a stab at implementing this effect in software. (I wonder if there will ever be "programmable audio shaders".) I started by playing a (looping) mono sound. I'll call the array of samples input, and the stream of samples I ultimately send to the sound card will be called left and right. If t is our time, in terms of the sample rate, then the regular way to play the sound would be
left[t] = right[t] = input[t]
In order to get the binaural effect, we want to offset each output channel based on the distance that "ear" is from the source. This is because each ear is hearing the sound slightly "in the past". So how much do we offset by? To figure that out, we need to know a few things, such as the position of the source, the position of each ear, and the speed of sound, which is 340.29 meters/sec. So to figure out how long the sound took to travel from the source to the ear, we compute
offset_seconds = distance(ear_position, source_position) / 340.29
To convert this into number of samples, we just multiply by the sample rate:
offset_samples = offset_seconds * samples_per_second
So our code would look something like
left_offset = (distance(left_ear_position, source_position / 340.29) * samples_per_second
right_offset = (distance(right_ear_position, source_position / 340.29) * samples_per_second
left[t] = input[t - left_offset]
right[t] = input[t - right_offset]
Example: if the source is directly to your right and the ears are 15cm apart, left_offset would be 0.15 / 340.29 = 0.00044 seconds larger than right_offset. At 44.1kHz, left_offset is behind right_offset by about 19.5 samples.
Of course, there are a lot of other details I left out (which I don't know much about) such as interpolation between samples and interpolation of distances, but this is the basic idea. And as a bonus, you get the doppler effect for free, since as you move around, the offsets are changing, causing pitch shifts.
Here's my implementation. Controls are ASDW and move mouse to look around (quit with Alt-F4, since the mouse sticks in the center). I put in basic linear interpolation for samples and distance changes, which sounds absolutely awful, but it's a lot better than no interpolation.
It looks like there's been some research on this in the last few years. This video is pretty impressive... sounding!