Life, Programming, Everything

When I find something interesting and new, I post it here - that's mostly programming, of course, not everything.

Tuesday, October 04, 2016

Union of Functors and a Functor on Union

Union of Functors and a Functor on Union

Intro

On 10/3/2016 Paul Phillips posted an interesting question, https://twitter.com/extempore2/status/783161511724273664?cn=cmVwbHk%3D&refsrc=email - how are \/[F[A], G[A]] and F[A\/B]related. Here are my musings on this. In 2011 I already wrote this: http://vpatryshev.blogspot.com/2011/05/monad-genome-part-1.html - pretty much close to the topic; but it's better to just go into details.

Basics

Let's fix some category C; assume that it has all the necessary limits or colimits.

Union is a special case of a coproduct; it's a functor CCC that is right adjoint to diagonal Δ:C → CC. The diagonal maps an x to (x,x)

This diagonal actually can be defined like this: given 1+11, where 1 is a unit, a terminal category, its exponential, C1+1C1, is the same diagonal Δ.

Union of Two Endofunctors

An endofunctor on category C is a point in CC; and a union of two endofunctors has the following signature: CCCC → Cwhich is the same as (CC)C → CC, or C(1+1C → CC

In Scala its type can be written as (Bool => C => C) => C => C.  Here Bool is a cheap implementation of 1+1.

Since the union is calculated pointwise in this case, this functor, C(1+1C → CC, is a right adjoint to the diagonal  C→ C(1+1C.

Endofunctor following a Union

For the case where we first build a union and then apply a functor, we need three components: two objects and a functor. So we have CCCC → CC - here we map a triple (A,B,F) to F[A\/B].

Rewriting it, we have C1+1+C → CC- this is the signature of our operation. In other words, we have 
((Bool \/ C) => C) => C => C.


Tuesday, June 09, 2015

Name

Yesterday I encountered a bug.

We have a software that runs a browser via Selenium, scraping certain websites; we have a neat library for extracting this or that info etc.

Like, I find an element by selector, such that the element's content matches a regex, and the like. Or I find the whole array of elements satisfying certain predicates.

On the server side I can build predicates, using logical connectives.

The results of search are available, but how? We have a piece of code that for each element creates a unique css selector; using these selector I can access such elements many times, check if they are visible etc.

So far so good. Except that we are in the XXI century, in the age of AJAX and Web 2.0 (or is it 2.1 already?), and the page is changing all the time. So the path just does not work in these dynamic circumstances. It does not identify the element, it describes how we got there some time ago.

It's like saying: "to get there, walk past two houses, turn left and then turn right when you see a crow on a fence, and then walk to the house with the cat in the window". Unlike in movies, in real life not every quest ends in a success.

Of course one would ask: you found the element, fine, you have it now; use it. But my problem here is, we found the DOM element in JavaScript, now what? We control all this from the server, so we need to identify, across the process boundaries, what element we are talking about. One cheap solution would be using "our last found one", but this does not work well for collections.

Also, when the page reloads, all these things just disappear; the name/identifier does not identify anything; the path leads to some other place. It's like if your computer is suddenly replaced while you were editing your blog entry.

And in general, if there is some storage, and there is a client, the client should somehow be able to identify the resource from outside. So the naming should be a little bit more global.

Actually, a similar thing happens in directories in any windows/unix/macos: we have paths, not pointers to resources. A directory is just a namespace containing names pointing to resources (and other namespaces). Some of the names may be soft links, that is, aliases pointing into other namespaces to resources that may not even exist.

Hard links, on the other hand, point directly to existing resources. While the path(s) may change, the hard link still points to the same thing.

So, what I want to say, we have to disambiguate paths and names, or, rather, paths and identifiers. A path may lead nowhere; an identifier, at least in theory, references an existing entity. It would be great if an entity could hold its own identifier; in the databases there is a habit of having it... which is related to ORM, since once an artifact has an identifier, it is thought of as an object.

E.g. we have Organization and Person, each has an id; how about the relationship, Person -> Organization, should every record of such a relationship have an identifier? Opinions vary. From the theoretic (relational-theoretic, set-theoretic) point of view, this is just stupid; see why.

If we have a function f, how do we represent it in both set theory and a relational database? We provide a graph, a set (a table) consisting of pairs (x,f(x)), for each x. These pairs, points of the function graph, are not objects in the usual casual sense. In some theories they are, but mostly they are not. So why do we need to give identities in other cases?

There is one aspect of it that explains why. Imagine we have some kind of AI, and its knowledge base consists of such relations. Then each piece of knowledge is an entity; we can combine them, and build bigger chunks of knowledge out of it. This activity is similar to building relationships out of relationships, if we look at it from a mathematical point of view, but even so, how do we refer elements of a relationship? In this case they become entities and need an identification.

We can get an impression that if we give global identifiers to each imaginable entity, we are done and safe. I doubt that it is even possible. There's something like the Axiom of Choice that tells us it is possible, and there are examples where the Axiom of Choice does not hold (it only holds in theories that include the Axiom of Choice).

Here's an example, half anecdotal.

There are two kinds of seagulls in England, let's call them kind A and kind B. Since seagulls fly all around, they meet their relatives in Ireland and in Iceland and in Greenland etc. So in Ireland and in Iceland and in Greenland there are also two kinds of seagulls, A and B. They are not exactly the same as those living in London, but kind of close; both kinds slightly change with longitude. And as we fly around the globe, we see these seagulls, two kinds of them, slightly changing, but keeping pretty close similarity with their neighbors to the East and to the West; this function is continuous.

So when we fly around, get to Alaska, and then Chukotka, and then Siberia, then Kola peninsula, Norway... still two kinds, A and B. And then those gulls that live in, say, Bergen, they are also of two kinds, A and B, and they look (and are) pretty close to the kinds in London.

Except that the Bergen A is the close relative to the London B, and the Bergen B is the close relative to the London A.

What it says in mathematical terms, there is No Global Section. The example is from an old paper by Andre Scedrov. This is actually about topology, and the Axiom of Choice, and this is the example where AC does not hold even for the choice of 2.

Summary: global identification is not always possible.

We can also try paths. You stand in a circle. The person on the left of you has someone on the left; that person is also on the left of you, right? Go ahead along this path, and you'll discover that the person on your right is on your left. And you are, too. That's the problem with paths.

Sunday, June 10, 2012

Cake Pattern Practices

Intro

Cake Pattern is a Scala-specific solution to Dependency Injection problem. In Java, from where Scala has inherited a lot, DI problem is solved by either passing environment as a parameter (Env monad), or using specific ad-hoc tools, like Guice. Scala is a much more powerful language, and Cake Pattern is available here. In my opinion, Cake Pattern is the best. It is similar to passing parameters, but differently. The first publication was by Jonas Bonér in 2008, Real-World Scala: Dependency Injection (DI) I personally found the article kind of hard to read; hope this one is easier. The pattern itself is easy.

Real World Example

We have an application which depends on configuration and credential files, and talks to a server via some proprietary API. We want to work on this application. By working I mean freely refactoring it from its initial prototype state to a well-written piece of code, enriched with logging, statistics, status reporting, etc., and keep it alive, that is, modifiable. One cannot seriously do it using real world connections; tests take minutes to run, connections are brittle etc. So we want to be able to run logic without the need to connect to anything. And this includes also System.out output and System.exit() calls. How do we do it? Let's start with the "real world" part.

Abstracting The File System

The full source code can be found on github, here is a small piece.
import io.{Source => ioS}
import java.io.{PrintWriter, File}

trait FS {

  implicit def file(path: String)                       = new File(path)
  implicit def asFile(file: TextFile)                   = file.file
  implicit def textFile(file: File):       TextFile     = TextFile(file)
  implicit def textFile(path: String):     TextFile     = TextFile(file(path))
  def tempFile(prefix: String)                          = TextFile(File.createTempFile(prefix, "tmp"))

  def exists(file: File): Boolean = file.exists

  protected class Entry(val file: File) {
    lazy val canonicalFile = file.getCanonicalFile.getAbsoluteFile
    def parent = Folder(canonicalFile.getParentFile)
    def delete = file.delete
    override def toString = canonicalFile.toString
  }

  case class Folder(path: File) extends Entry(path) {
    def file(path: String): TextFile = TextFile((canonicalFile /: (path split "/")) (new File(_, _)))

  class TextFile(file: File) extends Entry(file) {
    def text = ioS.fromFile(file).mkString

    def text_=(s: String) {
      val out = new PrintWriter(file, "UTF-8")
      try{ out.print(s) } finally{ out.close }
    }
  }
}

object FS extends FS
To use it in your application, instead of "manually" finding/opening/closing files, one just has to write
import FS._
somewhere in the client code.

Mocking the File System

To avoid creating multiple files just to make sure the code works in various circumstances, we better mock the file system. Here's an example:
val mockFS = new FS {
    class MockFile(path: String, content: String) extends TextFile(new File(path)) {
      override def text = content
    }
    val files = new scala.collection.mutable.HashMap[String, MockFile]
    def add(file: MockFile) { files.put(file.getPath, file)}
    def file(pair: (String, String)) { add(new MockFile(pair._1, pair._2)) }

    file("badcredo.txt"           -> "once upon a midnight dreary while I pondered weak and weary")
    file("credoNoUser.txt"        -> """
    password = "Guess me if you can!"
    realm = fb
    """)
    file("credoBadUser.txt"        -> """
    password = "Guess me if you can!"
    realm = fb
    username=
    """)
    file("credoBadPassword.txt"        -> """
    password =
    realm = fb
    username=badpwd
    """)

    override def exists(file: File) = true // egg or chicken problem
    override def textFile(path: String) = files.getOrElse(path, throw new FileNotFoundException("what is this " + path + "?"))
  }
We can throw in tons of such cases, covering all the situations, which is less probable if you have to deal with dozens of actual files. Now good, but how can we replace a file system with this mock file system? This is how.

Eating The Cake

trait MyApp {
  val FileSystem
  /*...*/
  def login(cred: String) {
    val (userid, password, realm) = parse(FileSystem.textFile(cred).text)
    /*...*/
}

object MyApp {
  val FileSystem = FS
}
Here
MyApp
is a trait, with an accompanying object. In production circumstances we use the accompanying object, which instantiates FileSystem as FS, the object representing a regular file system. In the trait, FileSystem is represented as an abstract value; so if we extend MyApp in our test suite, we can pass our test file system:
val SUT extends MyApp {
  val FileSystem = mockFS
}
Now we can manipulate the app to our pleasure.

Intercepting Everything

In addition, in our test suite we can define this:
  trait Testable {
    val console = new StringWriter
    val out = new PrintWriter(console)
    def printed = console.toString
    class Exit(code: Int) extends Exception(code)
    def print(s: String) = out.print(s)
    def println(s: String) = out.println(s)
    def exit(code: Int) = throw new Exit(code);
And then write
val SUT extends MyApp with Testable {
  val FileSystem = mockFS
}
Provided that we did define the default implementations of println, exit etc, we are now in control of the application.

Rinse, Repeat

But what if MyApp uses another piece of code that talks to an external service? We have to do some instrumentation in that other piece of code, using the same cake pattern; the default implementation would be coming from the accompanying object, but in the case of test suite we can instantiate with something completely different, and pass it along to MyApp.

Questions?

Sunday, March 18, 2012

Miles Sabin's type negation in practice

There was a discussion recently on scala-lang regarding how to ensure that your function returns a non-null value. Here's a solution from Miles Sabin, with my comments.

the source


// Encoding for "A is not a subtype of B"
trait <:!<[A, B]

// Uses ambiguity to rule out the cases we're trying to exclude
implicit def nsub[A, B] : A <:!< B = null
implicit def nsubAmbig1[A, B >: A] : A <:!< B = null
implicit def nsubAmbig2[A, B >: A] : A <:!< B = null

// Type alias for context bound
type |¬|[T] = {
type λ[U] = U <:!< T
}


def foo[T, R : |¬|[Unit]#λ](t : T)(f : T => R) = f(t)

foo(23)(_ + 1) // OK
foo(23)(println) // Doesn't compile


So, how it works.

First we define the trait <:!< which looks like a negation of <:< - in Scala this notation means that one type can be, via implicits or whatever, converted into another.
Note that when you declare a type, and the type depends on two parameters, you can write the type in the infix form, e.g. String Map Int; in our case, instead of val x: <:!<[A, B] we can write val x: A <:!< B

E.g.

scala> def f(m: String Map Int) = Map("one" -> 1)
f: (m: Map[java.lang.String,Int])scala.collection.immutable.Map[java.lang.String,Int]


Why do we need a trait that does not specify any functionality? See below.

We define three implicits. The first one, implicit def nsub[A, B] : A <:!< B = null, is applicable in a case of any two types A and B; if we had just this one, the compiler would be never confused. Now we add two more to confuse the cat the compiler:

implicit def nsubAmbig1[A, B >: A] : A <:!< B = null
implicit def nsubAmbig2[A, B >: A] : A <:!< B = null


Their role is that whenever we encounter a context where <:!<[A, B] is required, and B is a supertype of A, the cat the compiler gets confused. It does not get confused if there is no such relationship between the two types. Namely, we get this error:

:27: error: ambiguous implicit values:
both method nsubAmbig1 in object $iw of type [A, B >: A]=> <:!<[A,B]
and method nsubAmbig2 in object $iw of type [A, B >: A]=> <:!<[A,B]
match expected type <:!<[Unit,Unit]


So, what do we do to use this feature? We declare a type function (seems like a new term) that is negation type for type T: whenever we apply this function to type U, it only allows to compile if U is not a subtype of T.

type |¬|[T] = {
type λ[U] = U <:!< T
}


This is a kind of a type trap; let's see how we use it.

Declare a function foo that takes first parameter of type T, and second parameter a function from T to R:


def foo[T, R](t : T)(f : T => R) = f(t)


We can call it as val n24 = foo(23)(_ + 1) or as foo(23)(println). Now how do we make sure that it does not take a function that returns Unit? We have to delimit the second type parameter, R, so that it's never a Unit or a subclass of Unit. This is how we do it:

def foo[T, R : |¬|[Unit]#λ](t : T)(f : T => R) = f(t)


Now the second example, foo(23)(println), won't compile. Ta-da!

Questions?

Wednesday, August 24, 2011

Either OOP or Liskov (choose one)

A while ago I read some Abadi-Cardelli, and illustrated what I learned by an example how Java does not have subsumptions principle (that is, if A is a subtype of B, an instance of A can be safely substituted everywhere where an instance of A is required).

So I had posted a blog entry with the example.

There's another post on this on blogger.

A deeper look into the issue can be found in "Is the Java Type System Sound? by Sophia Drossopoulou and Susan Eisenbach.
The paper shows that no, it is not, and also builds a sublanguage of Java where subsumption holds.

From this article I've got the following demonstration sample:


class A {}
class A1 extends A {}
class B {
char f(A x) { System.out.println("f(A." + x + ")"); return 'f'; }
int f(A1 y) { System.out.println("f(A1." + y + ")"); return 12345; }
void g(A1 z) { System.out.println("Called g"); System.out.println(f(z)); System.out.println("out of g"); }
void h(A u, A1 v) { System.out.println("Called h"); System.out.println(f(u)); System.out.println(f(v)); System.out.println("out of h"); }
}

@Test
public void testJavaSubsumption() {
new B().g(new A1());
new B().h(new A1(), new A1());
}


The output is here:

Called g
f(A1.pigi.examples.library.tests.QueriesTest$A1@6526804e)
12345
out of g
Called h
f(A.pigi.examples.library.tests.QueriesTest$A1@42b1b4c3)
f
f(A1.pigi.examples.library.tests.QueriesTest$A1@20d2906a)
12345
out of h


I wonder though how long will the Earth population hold to the popular belief that one can have both OOP, with reasonable enough "dispatch", and substitution principle. One cannot.

Questions?

P.S. Actually, the issue was discussed in details By Oleg Kiselyov a while ago: http://okmij.org/ftp/Computation/Subtyping/.

Saturday, July 30, 2011

Generic Data Model

Generic Data Model



Introduction



15 Years ago I had a problem designing a database that was supposed to hold data from various questionnaires; the questions were slightly varying every time and had no
regular structure, reflecting the weird ways of thinking of marketing people. The solution was to produce something more or less generic, storing “entities” and “attributes” separately.

(there was some discussion of xml and corba, now totally irrelevant)

Tables



The model consists of three tables: ENTITY, ATTRIBUTE, CONTEXT.


ENTITY Table



This is the main table in the database:

create table ENTITY (PK long, CLASS varchar, CREATED timestamp);

All the entities that we are going to have stored in our database have a representative in ENTITY table. An entity represents an object of our model. An object has an inalienable property, class. The term ‘type’ would probably suit better. An entity cannot change its class; actually, all the fields in entity record are read-only.

ATTRIBUTE Table



This table contains attributes of the entities. Records consist of name-value pairs and reference ENTITY table:

create table ATTRIBUTE (ENTITY long, NAME varchar, VALUE varchar);

The usage of this table is probably obvious. Note that if NAME is null, it means that the attribute contains the value of the entity itself. Two other fields are non-null.

CONTEXT Table



On one hand, this table can be considered as containing all the collections of the database; but on the other hand, the purpose of this table is wider. It contains all the naming contexts, and it can be used for navigation as well as for storing master-detail relationships.

create table CONTEXT (NAME varchar, OWNER long, MEMBER long, MEMBERID varchar);

Here NAME is the name of the context. If you are interested in collections, it is probably the name of collection owned by OWNER. OWNER is obviously the entity that owns the context, the “master”. MEMBER points to the member of the collection; in this case MEMBERID is irrelevant. But we can consider this as a Hash, in which case MEMBERID becomes a key, and MEMBER is the corresponding value. In the Naming Service paradigm, MEMBERID is the name of the entity MEMBER in the context NAME.

Conclusion



These days the idea of key-value pairs has become ubiquitous; it was not so in 1996 or in 2001 when I wrote this. So I'm dropping the argument in favor of this, now obvious, solution.

Friday, July 15, 2011

Chain Calls in Java - a better solution

A while ago I wrote "Java Solutions: inheriting chain calls"; it was about what to do if we want to make it possible to write calls like
return new StringBuffer().append("Now is ").append(new Date()).append(", and ticking...").toString();


and be able to subclass at the same time.

See, the problem is that if you setter returns this:

  MyBaseEntity withExcitingNewFeature(ExcitingNewFeature f) {
    features.add(f);
    return this;
  }


and then you subclass MyBaseEntity, like MyDerivedEntity extends MyBaseClass, you won't be able to write
  MyDerivedEntity mde = new MyDerivedEntity(entityId).withExcitingNewFeature(feature);


Since, well, the class is different.

So in my previous post I suggested to make MyBaseEntity type-dependent (that is, have a generic):
 public abstract class MyBaseEntity<T extends MyBaseEntity> 


and declare a method
   abstract T self();


So that every subclass declaration would look like this:
  public class MyDerivedEntity extends MyBaseEntity<MyDerivedEntity>


and would define

  MyDerivedEntity self() {
    return this;
  }


Yes, it works. And probably for 2008, with is now ancient history, when slow people slowly typed their slow code, not caring about the speed of development... in short, it was long ago.

Now there's a better solution. No need to redefine self() in all possible subclasses. That was just stupid. The only place where it should be defined is the base class:

  T self() {
    return (T) this;
  }


And no need to declare MyBaseEntity abstract.

There's also a bug in that old post; the generic type should have been bounded, as Ran noted in his comment.

Of course all setters should look like this:
  <T extends MyBaseEntity> T withExcitingNewFeature(ExcitingNewFeature f) {
    features.add(f);
    return self();
  }

Followers

Subscribe To My Podcast

whos.amung.us