def: a person whose work provides necessary support to the primary activities of an organization, institution, or industry.
Create a social robot which conforms to very simple human social rules and recognizes social cues such as eye contact, facial expressions, speaking and known objects.
The overall goal is to mimic simple human social interaction:
- Move in a humanly natural way, not as a jerky robot
- Respond to people and known objects with attention
- Remember important faces, objects and be able to reference them
- Recognize and respond to facial expressions (Happy, Sad)
- Socially listen to a multi-person conversation following who is talking
- Respond to eye contact by looking back at someone
- Represent a simple set of emotions based on sensor inputs (Boredum, frustration, happiness, anger)
- Make my kids laugh
- ...
- Check out code base
- Install all dependencies from setup script.
- Run
python src/AC3.py
to start the robot - In your browser, connect to the IP address to watch what is happening
Having gone through this before in multiple projects, the goal overall is to create a very easy to support set of hardware and software so there are not hidden elements which are going to break or be forgotten in the future. The decisions below are designed to make it easily supportable, cost effective and understandable for new people.
- Hardware is off the shelf using hobby servos, standard power supplies, etc
- Complexity of mechanics is simple, allowing me to assembly it easily and replace broken parts
- Processing is raspberry pi with standard daughter boards and minimal custom wiring
- Cameras don't try to push the limits, but instead use standard hardware and libraries
The system contains full onboard processing so there are no external computers needed. It also has a full web interface allowing for easy understanding of what is going on inside the system by going to the support URL.
- 2x - Raspberry Pi 3
One of the raspberry Pi's is for core processing and the other will be dedicated to environment processing. A 3rd may be necessary if the processing load is too much for controlling the robot and doing vision processing.
One of the biggest challenges in embedded systems is being able to understand and interact with them successfully. Therefore, I am going to expose the key elements in a password protected web interface.
Here is the API documentation.
To change the password for the web server interface, run this below in the /src/webserver
directory.
python -c "import hashlib; import getpass; print(hashlib.sha512(getpass.getpass())).hexdigest()" > password.txt
The system will use two cameras to enable both full environment awareness and targeted vision. The reason is that for environmental awareness, background subtraction is the most important step. Knowing what elements matter and what are just walls. If a camera is moving on servos, it is very difficult to guess which pixels correspond to foreground or background data without 3D pixels (Maybe a future project :-) ). Therefore, by using a wide angle static camera, a standard background removal can be done to remove non-salient objects, color clustering can be done to segment the image into elements and then those can be clustered into people, objects, etc.
- 1x - Raspberry Camera with 180 degree wide lens to track the entire range in front of the robot. It will be statically mounted on the front of the robot to give a fixed frame of reference for controlling gross movement. The image will be flattened and normalized to create a linear map of the environment from -90 degrees (left) to +90 degrees (right) and vertically from 0 degrees (flat) to 90 degrees (vertical).
- 1x = Raspberry Camera with narrow lens mounted in the robot's eye which is actuated by the servos to create direct eye contact with objects and accurate tracking.
- Background removal to identify salient areas around the robot
- Movement calculation to identify more-salient elements
- Face detection to identify key elements to look at
- Face recognition to allow specific faces to pop
- Bright color object detection to allow tracking of colored objects
- opencv k-means clustering
- python implementation
- Python OpenCV k-means
- meanshift and camshift allows for easier tracking of known blobs such as objects and faces
''' sudo apt-get install python-opencv libjpeg-dev
Robotic head with 5 DOF raspberry Pi robotic server and motion, video, sensor controller.
- neck rotate - Rotates the whole head left and right 180 degrees
- neck lean - Moves the head forward and backwards 30 degrees
- head tilt - Rotates the head up and down
- eye rotate - Rotates the eye left and right very quickly
- eye iris - Shrinks and enlarges the opening for the eye to show emotion
- 1x - Adafruit 16 channel servo controller for Raspberry Pi
- 4x - RobotGeek 180 degree servo
- 1x - RobotGeek Snapper Arm (For the bottom 3 servos, bearing and skeleton)
NOTE: Even though servos have a 0 to 3.3 v control signal where 12-bits is 0 to 4095, for these, that will blow it up. The actual range of the server is 150 to 600 on RobotGeek servos. Therefore we need to map that to our positions correctly.
From here: https://www.raspberrypi.org/forums/viewtopic.php?t=32826
.5 ms / 4.8 usec = 104 the number required by our program to position the servo at 0 degreees 1.5 msec / 4.8 usec = 312 the number required by our program to position the servo at 90 degrees 2.5 msec / 4.8 usec = 521 the number required by our program to position the servo at 180 degrees
The contorller only updates at 50 hz and it seems that the actual position control of servos is only accurate to about 0.5 degrees which means that the whole thing can jitter a LOT. To account for this, we need to adjust the interpolation algorithms.
A few things I have seen online:
- Add a 100uF cap across the power/ground to slow any surges in current making it not jitter as much. I don't think this will matter as much on the larger power supply I have.
- It takes multiple commands before some servos actually start moving, so there can be even bigger delays than commanded.
There are multiple
- Small speaker
There are two ways that speech recognition can be implemented. Either local(Sphinx) or cloud based (Amazon, Google). Cloud-based recognition will always be more accurate however there is a larger delay between speech and recognition. If local recognition is to be used, then a small vocabulary should be specified.
- Python Speech Recognition w/ Sphinx
- pocketsphinx_continuous is a cross-platfrom application which can be executed to listen for a vocabulary