When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Tuesday, June 09, 2015


Yesterday I encountered a bug.

We have a software that runs a browser via Selenium, scraping certain websites; we have a neat library for extracting this or that info etc.

Like, I find an element by selector, such that the element's content matches a regex, and the like. Or I find the whole array of elements satisfying certain predicates.

On the server side I can build predicates, using logical connectives.

The results of search are available, but how? We have a piece of code that for each element creates a unique css selector; using these selector I can access such elements many times, check if they are visible etc.

So far so good. Except that we are in the XXI century, in the age of AJAX and Web 2.0 (or is it 2.1 already?), and the page is changing all the time. So the path just does not work in these dynamic circumstances. It does not identify the element, it describes how we got there some time ago.

It's like saying: "to get there, walk past two houses, turn left and then turn right when you see a crow on a fence, and then walk to the house with the cat in the window". Unlike in movies, in real life not every quest ends in a success.

Of course one would ask: you found the element, fine, you have it now; use it. But my problem here is, we found the DOM element in JavaScript, now what? We control all this from the server, so we need to identify, across the process boundaries, what element we are talking about. One cheap solution would be using "our last found one", but this does not work well for collections.

Also, when the page reloads, all these things just disappear; the name/identifier does not identify anything; the path leads to some other place. It's like if your computer is suddenly replaced while you were editing your blog entry.

And in general, if there is some storage, and there is a client, the client should somehow be able to identify the resource from outside. So the naming should be a little bit more global.

Actually, a similar thing happens in directories in any windows/unix/macos: we have paths, not pointers to resources. A directory is just a namespace containing names pointing to resources (and other namespaces). Some of the names may be soft links, that is, aliases pointing into other namespaces to resources that may not even exist.

Hard links, on the other hand, point directly to existing resources. While the path(s) may change, the hard link still points to the same thing.

So, what I want to say, we have to disambiguate paths and names, or, rather, paths and identifiers. A path may lead nowhere; an identifier, at least in theory, references an existing entity. It would be great if an entity could hold its own identifier; in the databases there is a habit of having it... which is related to ORM, since once an artifact has an identifier, it is thought of as an object.

E.g. we have Organization and Person, each has an id; how about the relationship, Person -> Organization, should every record of such a relationship have an identifier? Opinions vary. From the theoretic (relational-theoretic, set-theoretic) point of view, this is just stupid; see why.

If we have a function f, how do we represent it in both set theory and a relational database? We provide a graph, a set (a table) consisting of pairs (x,f(x)), for each x. These pairs, points of the function graph, are not objects in the usual casual sense. In some theories they are, but mostly they are not. So why do we need to give identities in other cases?

There is one aspect of it that explains why. Imagine we have some kind of AI, and its knowledge base consists of such relations. Then each piece of knowledge is an entity; we can combine them, and build bigger chunks of knowledge out of it. This activity is similar to building relationships out of relationships, if we look at it from a mathematical point of view, but even so, how do we refer elements of a relationship? In this case they become entities and need an identification.

We can get an impression that if we give global identifiers to each imaginable entity, we are done and safe. I doubt that it is even possible. There's something like the Axiom of Choice that tells us it is possible, and there are examples where the Axiom of Choice does not hold (it only holds in theories that include the Axiom of Choice).

Here's an example, half anecdotal.

There are two kinds of seagulls in England, let's call them kind A and kind B. Since seagulls fly all around, they meet their relatives in Ireland and in Iceland and in Greenland etc. So in Ireland and in Iceland and in Greenland there are also two kinds of seagulls, A and B. They are not exactly the same as those living in London, but kind of close; both kinds slightly change with longitude. And as we fly around the globe, we see these seagulls, two kinds of them, slightly changing, but keeping pretty close similarity with their neighbors to the East and to the West; this function is continuous.

So when we fly around, get to Alaska, and then Chukotka, and then Siberia, then Kola peninsula, Norway... still two kinds, A and B. And then those gulls that live in, say, Bergen, they are also of two kinds, A and B, and they look (and are) pretty close to the kinds in London.

Except that the Bergen A is the close relative to the London B, and the Bergen B is the close relative to the London A.

What it says in mathematical terms, there is No Global Section. The example is from an old paper by Andre Scedrov. This is actually about topology, and the Axiom of Choice, and this is the example where AC does not hold even for the choice of 2.

Summary: global identification is not always possible.

We can also try paths. You stand in a circle. The person on the left of you has someone on the left; that person is also on the left of you, right? Go ahead along this path, and you'll discover that the person on your right is on your left. And you are, too. That's the problem with paths.

No comments:


Subscribe To My Podcast