I've finally dug-up some information about how Alcatel-Lucent "Enhanced Reality" solutions (which we previously discussed here and here) work - the secret is video calls. When a user "scans" an image (in the above case, a movie-poster) she actually video-calls Alcatel-Lucent and transmits her camera input to its servers. Alcatel-Lucent's servers try to match the incoming image to known "augmented" images, and sends back information, a ringtone, or in the case above a video stream back to the user.
Obviously, this technique requires a very fast network connection (especially for a "textured effect" as seen in the embedded clip), and may be difficult to scale. However, at least in lab conditions, it looks wonderful.