My main focus was in identifying, installing, and running open source software for human voice synthesis and recognition. While voice synthesis is pretty straightforward voice recognitions has a lot of issues.
Voice Synthesis involves coding along with extensive, sometimes massive, data files that are collection of various voice patterns typical to the pronunciation and accent of different English speaking male and female speakers. Other languages are also available (German, Russian, French, Spanish etc.) and additional resources allow to submit a small list of typed words that would be coded by an online software that returns a few files: text file that can be used to include into Festival, and some other files. I am still working on testing this.
Voice recognition is full of problems. One item that is required for proper experimentation is a quiet room; second, a native English speaker would be a better option for experimentation. This, however, may still be an issue later if a different speaker speaks to the robot, it may have hard time understating the commands. Another issue is that even in ideal conditions recognition is less than 80%. Typically it is lower than 50% for an inexperienced speaker. Open source software that we will try to use is Open Ears that also requires massive data files to compare the incoming sound patterns with the previously recognized patterns. It typically compares form 20,000 to 300,000 variations of pronunciation of combinations of sounds. The process can takes from a few seconds to a few minutes to produce a text out of pronounced words.
There is another option that people have tried to experiment. The personal assistant program called JARVIS (modeled after AI Jarvis from "Iron Man.") It is a free program available on multiple platforms and can be freely developed (specifically profile, voice commands, responses, and more in depth coding of programs that would work in the background). Projects that people attempted to do involved home automation where voice commands would turn on and off the lights, electronic equipment, electronic locks, and perform various tasks on a computer from checking the weather or a calendar to creating files and typing the dictated text. With proper coding, software support, and porting to the cloud, it could potentially grow into a very decent personal assistant that resembles AI. I have been trying to expand the list of available Jarvis commands. With a certain level of success, I was able to make it open and close the internet as well as to give appropriate answers to my questions. Jarvis is not AI, but the background code can be sophisticated enough that it will mimic AI.
Overall, I leaned a lot about the ways to manipulate sounds and use Linux/Ubuntu to recognize and synthesize human voice command.
Further development: I have to search for engineering or most likely computer science articles that may involve similar research using similar or different tools. Additionally, we have to make sure that the software functions properly and develop a more user friendly environment. And finally, we should be trying to link the commands for gestures with the voice reproduction. Voice recognition would be the next level of research if the above mentioned is successfully accomplished.
Lisa Ali and Michael Zitolo
We finally started working in Dr. Kapila’s Mechatronics Lab. There were so many different project (research) possibilities available in this lab. We selected to work with Caesar, the lab’s robot, and he’s getting an upgrade! The robot will be given arms driven by six servomotors that will give it the ability to pick up objects. Caesars eye’s (cameras) are also being restructured so that they can turn independently from one another. We were given the task of helping these components of the robot communicate.
We started our research by reviewing prior research literature about robotic arms, object detection from cameras, etc., and we watched many videos to help us out with the software part of our project. We spent a lot of time working on installing openCV, a library of programming functions used for real-time computer vision, and learning how to use it. Although this week was a bit of a learning curve for us, we were able to create a program that used one camera by the middle of the week. By the end of the week, we were able to write a program that used two cameras that we configured to detect different objects.
On Friday everyone involved with the Mechatronics Lab had a group meeting with Dr. Kapila where we all shared what we’ve been working on and how much we’ve accomplished. It was nice to hear about the accomplishments everyone made along with their struggles along the way.
This past week saw a great deal of research done on the CAESAR robot and how it can be applied to teach emotional recognition skills to children on the autism spectrum. After an analysis of case studies about autism therapy robots and speaking to several companies that manufacture such robots, I drafted a report detailing how CAESAR can function with an accompanying app to demonstrate six universal facial expressions as determined by psychologist Paul Ekman. The reports also discusses what the app looks like and how it is to be used, how the app and CAESAR can be tested with children who have varying levels of ASD concerns, and how CAESAR compares in terms of cost and in function to other available autism-therapy robots. The report concludes with an entrepreneurial component detailing whom CAESAR can be marketed to along with incentives for new customers who might at first be uncomfortable using a robot.