When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Sunday, March 15, 2009

Cursor vs Flyweight

It is a pretty frequent problem: you need a million instances (ok, a billion if you are on a larger machine), and all the data are available, but you are not sure about overloading your Garbage Collector (I'm talking about Java, but in C you also have to deal with pesky memory issues... I guess). Will the GC handle a million (or a billion) of records that just come and go?

A known Flyweight Design Pattern, in its Java incarnation, suggests to do this: create an instance and cache it in a WeakReferenceMap, as if the weakness of the reference saves any memory at any given moment. No; we still have to create all those instances:

for (Record r : allMyRecords) {
// do something
// forget about record r
}


A typical implementation would be like this:

RecordSet allMyRecords;


where

class RecordSet implements Iterable<Record> {
...
Iterator<Record> iterator() {
...
Record next() {
// get the record from somewhere and return it:
Record theRecord = new Record(...);
return theRecord;
}
...
}
...
}


Say, we assume that all the data are actually in memory, but have a different form, so that for a record class like

interface Record {
long id();
String firstName();
String lastName();
String phone();
String email();
}


we actually have some kind of storage that keeps the data:

Map<Long, String> names;
Map<Long, String> phones;
Map<Long, String> emails;


If we implement the class Record so that it stores first/last name, phone and email, we duplicate a lot of data; so maybe it would be enough to store just something, just an id, and retrieve the rest on demand. For this, we will have a more-or-less lightweight implementation:

class LightRecord implements Record {
long id;
String phone() { return phones.get(id);}
String email() { return emails.get(id);}
String firstName() { return names.get(id).split(" ")[0];} // not very smart
String firstName() { return names.get(id).split(" ")[1];} // not very smart either
}


So we will store at least million (a billion) longs, and probably more. Wasting our memory, even if GC takes care of it. On a cellphone you probably should not count on gc.

So what can we do? Here's a solution that does not take any extra memory. It's like a cursor. It points to an imaginary record. Of course this record changes all the time, so if you want to really save an instance, do it via copy constructor. Otherwise, use it and forget it:


class RecordSet implements Iterable<Record> {
long currentId;
Record currentRecord = new Record() {
String phone() { return phones.get(currentId);}
String email() { return emails.get(currentId);}
String firstName() { return names.get(currentId).split(" ")[0];}
String firstName() { return names.get(currentId).split(" ")[1];}
}
...
Iterator<Record> iterator() {
...
Record next() {
currentId = ...;
return currentRecord;
}
...
}
...
}


So, the only time a constructor is called is when the iterator is created. Neat, eh?

The solution is not new. It is probably 50 years old. It is just properly rephrased in Java.

Followers

Subscribe To My Podcast

whos.amung.us