Using cloud recognition in combination with video drawables

Recently we received a couple of bug reports that video drawables are not visible but there audio track is playing. We asked a lot of developers to send us there Architect Worlds in order to find out how our API is used and how this behaviour can occur. After evaluating all these Architect Worlds we now want to share our results here.

There are a couple of things to remember when using video drawables and especially in combination with cloud recognition.


Video drawable in general:

Video drawables can start playback if the 'play()' function is called but there ARObject is not in the onEnterFieldOfVision state. This is expected behaviour. If you want that the video is only playing in case it's visible to the user, please use the 'onEnterFieldOfVisionand 'onExitFieldOfVision'trigger of your 'AR.Trackable2Dobject'or 'GeoObject'to call the 'play()/resume()'and 'pause()'function. You find an example implementation for this in our Video - Playback States example.


Another important point to mention here is the 'enabled'property. The current implementation simply disables the video rendering but it's audio track is still playing in the background. So instead of only setting the `enabled` property to 'false', please also call the 'pause()'function of 'AR.VideoDrawable'.



Video drawable in combination with cloud recognition

Cloud recognition just adds a small layer of complexity to the above paragraph. It's important to know that cloud recognition uses similar content to what can be found in a .wtc file. Each successful server response replaces the existing internal .wtc content with a new one. In case a 'AR.Trackable2DObject'is created with content A and a new content B arrives at the client, the 'AR.Trackable2DObject'can no longer receive 'onEnterFieldOfVisionand 'onExitFieldOfVision'events. Thats the reason why all existing 'AR.Trackable2DObjects'should be deleted in case a new server response is received. Example Cloud Recognition - Basic Recognition OnClick demonstrates this in line 75. Also all augmentations are deleted (line 55 and 59) every time a new server response is received. This ensures that there are no augmentations or trackables left over that are no longer usable.


In case augmentations and trackables are not deleted, they are still alive in the SDK but since the trackable can no longer fire it's events, augmentations are not visible anymore in case the reference image they belong to is visible in the current camera frame. So the video drawable could no longer be started or stopped based on the state of the vision trigger.  


If the video drawable already received a `play()` function call and internally started loading it's content before the tracking data will be replaced by the next server response, it would start playing back it's audio track as soon as enough video data is loaded. The `pause()` function could still be called but not from the vision trigger anymore (remember, the trackable is already disabled by the new tracking data). If the video drawable is deleted once the new tracking data arrives, this behaviour would not occur anymore because the object is deleted and therefore does not perform any loading or playback operation.


To summarise this paragraph: It's very important that all augmentations and trackable are deleted in case a successful server response is received.


Below is a sequence diagram that shows a situation where video drawables would continue playing there audio track although its video frame is not rendered anymore.

6K8ziHfAVLM95kysXm9i4XNKXMfK6boI1Q.png

What's happening:


The sequence diagram shows the usage of an Architect World that sends a camera frame to the server every time the user taps on a button. 


The first two times target A is recognized. Target B is recognized when the third request is send. In case target A is recognized, a video drawable and trackable is created. Target B has no augmentation defined (for simplification). Also noteworthy here is the fact that each response from the server invalidates the currently loaded tracking data (as described before).


The first client/server interaction simply starts a video drawable without any problems. Once the server response is processed, a video drawable is created and starts loading it's content. If enough video data is received, the video drawable starts rendering the video and playing back the audio track.


The second client/server interaction is still fine. It simply replaces the previous video drawable with a new one and calls it's `play()` function (and again starts loading the same video data because the deletion of the first one completely removed all of it's already loaded video data).


The critical client/server interaction is the third one. The user initiated the server communication before the second response arrived at the client. So while the second video drawable started loading it's content, the third server response arrived at the client. This caused the trackable, that was created with the second response, to be unreachable. It's attached video drawable would never be visible although the reference image might be in the current camera frame (Because it's reference image data is already overwritten by the third response). Because the second video drawable already started it's loading procedure, it starts audio playback as soon as enough video data is received at the client. The result is that the user can hear the audio track of the second video drawable but is unable to see the frames rendered although he might have the reference image for target A in the current camera frame.


To fix this problem, simply delete all augmentations and trackable as soon as a new server response is received.

Login or Signup to post a comment