Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting zeromq for drake-visualizer #586

Open
rdeits opened this issue Jan 30, 2018 · 11 comments
Open

Supporting zeromq for drake-visualizer #586

rdeits opened this issue Jan 30, 2018 · 11 comments

Comments

@rdeits
Copy link
Contributor

rdeits commented Jan 30, 2018

I'm still using drake-visualizer (in RemoteTreeViewer mode) as my visualizer daily, and it's great, but I'm finding more and more that I'm running into issues trying to visualize meshes or pointclouds that are too large for LCM. We could try to fix LCM to enable message chunking, but since the tree viewer is already using its own message encoding, it's really not getting any benefit from LCM at all. @patmarion I know we talked about using ZMQ in the past, and I think being able to use tcp or ipc would probably resolve the issues I'm having with LCM+udp. As I recall, the issue was needing to spawn a new event loop to handle the ZMQ requests, since the threading python module doesn't work in Director. Is that still the case?

How would I go about making this happen?

@patmarion
Copy link
Member

I think threading can be used in Director successfully. Last year I wrote a class named TaskRunner that helps run threads in Director:

from director.taskrunner import TaskRunner

def test():
    while True:
        print 'on thread'
        time.sleep(1.0)

t = TaskRunner()
t.callOnThread(test)

(note, this class was only in spartan, but I just merged a PR to add it to director/master)

The tricky part with threads in Director is that the Qt C++ event loop has control, not the Python runtime. The python thread starts but has limited opportunities to get scheduled, so your thread will be sluggish. The other problem is that interacting with C++ objects (like the vtk visualization window) is best done exclusively from the main python thread.

The TaskRunner solves this problem by ensuring the Qt event loop pumps the Python runtime. Thread performance won't be quite as good as if you run your program directly with the python interpreter, but it works pretty well for most purposes, and I think it works great if your thread just does blocking IO like zeromq / sockets.

Sample (written for python3):

import sys
import time
import zmq


def server():
    context = zmq.Context.instance()
    sock = context.socket(zmq.REP)
    sock.bind('tcp://*:8089')

    while True:
        message = sock.recv().decode('utf-8')
        print('received message:', message)
        sock.send_string('ack ' + message)


def client():
    context = zmq.Context.instance()
    sock = context.socket(zmq.REQ)
    sock.connect('tcp://localhost:8089')

    for i in range(100):
        message = 'message {}'.format(i)
        sock.send_string(message)
        response = sock.recv().decode('utf-8')
        print('response:', response)
        time.sleep(1.0)


if __name__ == '__main__':

    from director import taskrunner

    taskRunner = taskrunner.TaskRunner()
    taskRunner.callOnThread(server)
    taskRunner.callOnThread(client)

@rdeits
Copy link
Contributor Author

rdeits commented Jan 31, 2018

This is perfect, thanks! I'll try it out today.

@rdeits
Copy link
Contributor Author

rdeits commented Jan 31, 2018

Cool, it works! I'm getting some warnings when I try to actually create geometry from the taskrunner thread, though:

QPixmap: It is not safe to use pixmaps outside the GUI thread
QObject::setParent: Cannot set parent, new parent is in a different thread

Is that to be expected, given what you mentioned about interacting with vtk from the python main thread? The geometry does show up correctly, despite the warnings. My implementation is here: https://github.com/RobotLocomotion/director/compare/master...rdeits:treeviewer-zmq?expand=1

@rdeits
Copy link
Contributor Author

rdeits commented Jan 31, 2018

Wow, this is already amazing. Using ZeroMQ with MsgPack + msgpack-numpy for arrays, I can serialize, transmit, unserialize, and render 1,000,000 points in 50ms.

@patmarion
Copy link
Member

You could ignore the warnings, but i think they are correctly identifying potential issues where your thread is interacting with objects owned by the main thread. You thread and the main thread aren't running concurrently due to the way Python schedules things, but it could still be a problem.

If you want to avoid issues like this, a good strategy is to use your thread just for zeromq blocking IO, but do all the message processing on the main thread. The TaskRunner has a helper for this:

def threadFunction():
  message = waitForMessage()
  taskRunner.callOnMain(lambda: processMessage(message))

callOnMain returns instantly, all it does is schedule your function to be called on the main thread eventually. The TaskRunner uses a 60hz timer to periodically call these scheduled functions. Or, you can manage your own timer to do periodic processing:

msgs = []

def processPendingMessages():
  while msgs:
    msg = msgs.pop()

def threadFunction():
  while True:
    time.sleep(1.0)
    msgs.append('message')

# produce messages on thread
taskRunner.callOnThread(threadFunction)

# periodically process messages on main
timer = TimerCallback(callback=processPendingMessages, targetFps=60)
timer.start()

@patmarion
Copy link
Member

btw, i fixed the cdash issue for director's travis-ci, so now travis-ci passes and uploads binaries to bintray again. if you need binaries with taskrunner they are:

https://bintray.com/patmarion/director/director/0.1.0-266-g071a233#files

@rdeits
Copy link
Contributor Author

rdeits commented Jan 31, 2018

Got it, thanks!

@rdeits
Copy link
Contributor Author

rdeits commented Feb 1, 2018

Ok, I've nearly got it all working properly. One issue I noticed is that sending a large number of draw commands bogs down and eventually crashes the whole Director app. I've traced it down to the fact that TaskRunner.callOnMain() calls self.timer.start() every time. That seems to cause some cumulative degradation of the app performance and eventually crashes all of Director. It looks like this is because TimerCallback.isActive() is always returning False inside the thread callback, so every call to start() results in a new call to self.timer.connect().

Removing the call to self.timer.start() inside callOnMain() totally resolves the issue, but I'm not sure if it's the right thing to do.

@patmarion
Copy link
Member

I am not able to repeat this issue, but I think I must have some issues with the implementation of callOnMain(), it's implemented in a more complicated way than it should be (using another class I wrote called asynctaskrunner.

I am going to recommend that you implement it without using callOnMain() and instead use the pattern that I showed in the 2nd example of this comment:

#586 (comment)

In that pattern, there are no timers starting and stopping, just one timer that is started once.

@patmarion
Copy link
Member

can you try this diff:

index e00f64e..740bfd3 100644
--- a/src/python/director/taskrunner.py
+++ b/src/python/director/taskrunner.py
@@ -22,6 +22,7 @@ class TaskRunner(object):
         self.pendingTasks = []
         self.threads = []
         self.timer = TimerCallback(callback=self._onTimer, targetFps=1/self.interval)
+        self.timer.disableScheduledTimer()
 
         # call timer.start here to initialize the QTimer now on the main thread
         self.timer.start()

if you are building from source, don't forget to run make again after modifying the python source

@rdeits
Copy link
Contributor Author

rdeits commented Feb 2, 2018

Thanks! Both of your suggestions fixed the issue. I think I like the pattern from #586 (comment) a bit better, so I'll probably go with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants