Sunday, March 9, 2008

Explaining garbage collection

Imagine you have a lot of garbage in your house. You have so much garbage that there's no room to put any new stuff. This isn't garbage that you've thrown in the garbage can, and it's a little unclear what's garbage and what's not.

See, you've built a bunch of Rube Goldberg contraptions in your house, and some machines share parts. Your washing machine might use a pencil somewhere to write down how much longer it has until it's done, and your drier might use the same pencil on a different pad of paper. Even if you decide you don't want the washing machine, you can't just throw out the pencil (though you can throw out the washer's pad of paper.) So how do you determine what you can safely throw away?

It'd be really tedious to do this yourself, so you bring out your handy dandy garbage collection robot. You give the robot a list of Rube Goldberg machines that you use, and the robot will look at what uses what. It'll throw out everything that is unused once it figures everything out.

One strategy the robot could use is to write down a list of all of the objects in your house. The robot will then go down the list of machines that you use and put a mark next to each one you say you use. Until everything's been visited that has a mark, the robot then goes to each marked object and marks everything that it uses. When it's done, it can go back to all of the unmarked objects and thrown them out. This is known as "mark and sweep" garbage collection.

Another strategy is, when there's no more room, to build a new house, and then put in it a copy everything that you use. Then, you look at everything that those Rube Goldberg machines use and make a copy of them, for use in the new house. This doesn't have to be so inefficient, since you can actually reuse the first house again the next time garbage collection happens. This is known as "copying" garbage collection.

The last strategy is to just tell the robot, when you're building your machines, what uses what, and when you start and stop using things. The robot will keep a count of how many things use what. If only one machine is using something, or if you're using it directly and nothing else is, and then that usage stops, then you tell the robot and it throws out that thing. When it throws that piece of garbage out, it has to remember that each thing that that object uses is now used by one fewer thing. This is called "reference counting", because you count how many things refer to a particular object.

Now imagine all of this in a computer. Computer programs often amount to Rube Goldberg machines on a much larger scale. Different pieces of RAM refer to other pieces of RAM, and it's hard for the programmer to tell what she can safely throw out. By using a garbage collector, this can all be handled automatically.

No comments: