Monday, August 16, 2010

Weekly Report 13

  • Take couple of days off.
  • Work further to add other API and refine work.
  • Provide support and help to anybody interested in this work.

Monday, August 9, 2010

Weekly Report 12

Status and Accomplishment
  • Completed porting all of my algorithms to C6accel. Added new algorithm cvCvtColor() to support CV_RGB2GRY operation. Added VLIB support to C6accel library. Used VLIB_integralImage8() for cvIntegral() implementation. Added flag to control its use. User need to get VLIB access after requesting on Implemented chaining of OpeCV APIs cvCvtColor() and cvSobel(). DSP_cvCvtColor_cvSobel() demonstrates this implementation. This scheme reduces codec-engine overhead between API calls.
  • Worked on the documentation of API and procedure to add new API to the existing library. Following is the links to documentation and
  • Worked on the application to demonstrate the use of these API.
  • All the API call now has almost constant execution time of ~380 usec for establishing asynchronous DSP function call. The ARM and DSP should be synchronized using DSP_cvSyncDSP() before output is accessed. This can give performance boost up of greater than 10x in any algorithm for image size of 640x480 if data dependency is tackeled in application wisely.
  • Waiting for C6accel tag to be created before releasing code for evaluation.
  • Refine document.
  • Review of code.
  • Look into DFT algorithm to avoid race condition as I am able to see result only when CE_DEBUG=3.
  • Some suggestion on my issue with DFT algorithm, as mentioned in Plans section, would be helpful.

Monday, August 2, 2010

Weekly Report 11

Status and Accomplishment
  • OpenCV now allocates memory using CMEM in the continious region. This saves overhead of copying the buffer. Same buffer allocated by OpenCV can be passed to the DSP. Functionality is fine but during the exit of the main() process, there is following error message, 'CMEM Error: CMEM_exit() already called, check stderr output for earlier CMEM failure messages (possibly version mismatch).'
  • Most of the time was used to investigate on ASYNC DSP call. All the DSP_OpenCV call are made ASYNC. There will be now 2 API. One is the native OpenCV that is synchronous and use ARM while the other is ASYNC call to DSP_OpenCV. This gives the opportunity of parallel execution of task; frees the ARM for some other task. A API is provided to synchronize between DSP and ARM. Setting up ASYNC call only takes 274us for 320 x 240 monochrome image and 305us for 640x480 monochrome image. While the synchronous processing of native OpenCV sobel 3x3 algo takes 2655 us and 8820us respectively. This gives benefit of >10x and >28x performance respectively on a algorithm if task is scheduled properly considering the latency of DSP processing.
  • I am now working with C6accel library. Thanks to C6accel team for their support and providing some tweaks as per my need.
  • Work more on the performance and provide benchmark for implemented algorithms.
  • Work on documentation.
  • Work with the application.
  • No blocker as for now.

Monday, July 26, 2010

Weekly Report 10

Status and Accomplishment
  • Tested the c6accel library. Found to have similar performance compared to mine. But was not better than opencv native library. The relative performance measured was 4209167/4577332 for 16-bit sobel. With continious memory allocation the performance was 4209167/4498566.
  • Tried to remove some of the cache writeback but couldnot see any difference in performance.
  • So, instead of only using DSP for the algorithm, I assigned the task between the 2 processors by dividing the data to work on . Created 2 thread one calling the ARM side and other calling the DSP side. There was a slight improve in the performance but still not better than native ARM side code. The performance achieved this time was 4209167/4467565. The half of the output image was visually dissimilar to the other half in terms of edge contrasts. I will upload this picture.
  • I think only way I can gain performance is by working on both the processor. Creating 2 API for same function, one for DSP and the other for ARM.
  • Worked on the application part too. Started coding for it.
  • Instead on only working on DSP, I am planning to use it for task offloading. Creating asynchronous API and fetching the result later.
  • Look into performance and application.
  • Still not able to beat the ARM performance.

Monday, July 19, 2010

Weekly Report 9

Status and Accomplishment
  • Simplified the build instruction. Worked on the recipe to build the project.
  • Worked on UNIVERSAL_processAsync(). Since I was passing a whole buffer, async call doesnot seem to work for this scheme. As which function will be called next is uncertain. I am now breaking down the buffer in chunks and working through it. But there is still confusion on the size of the buffer. For 7x7 soble I need at least 7 rows to pass where as for 3x3 I need at least 3 and for DFT, 1 is ok.
  • Tried to work on memory allocation of OpenCV on continious memroy. I am getting seg fault somewhere in Memory_alloc() and need to figure it out.
  • Try to come up with some solution for async call and continious memroy allocation.
  • Plan for the application part.
  • Since I did not hear further from kitware, I encorporated the integration part in a makefile. Using this file, the integration and re-build can be done.
  • Need to work on the above mentioned problem and come up with some best solution.

Monday, July 12, 2010

Weekly Report 8

Status and Accomplishment
  • Implemented 2-d DFT algorithm. Currently the ouptupt of DFT are scaled, as DSPLIB gives scaled output. I am planning to change the kernel for DFT and IDFT in DSPLIB for non-scaling so that the overhead of scaling back on my algorithm will be reduced. Reviewed integral and soble algorithm.
  • Committed the source code and application example for all the algorithm. Instruction was updated accordingly.
  • Integrated the library with the existing OpenCV2.1and the newly built opencv library supports the DSP for 3 algorithm.The patch is provided at the patch sub-directory inside the trunk. However, due to workaround with the CMAKE issue there is some extra work to be done to build the new library. It is mentioned on the build instructions.
  • Seems like my instruction is little bit complex. So I am planning to simplify it. Try to update the bitbake recipie that Koen had wrote so that OpenCV could be re-built using new patch.
  • Look into Optimization. Try async-universalprocess.
  • Correspondence with Kitware is still going on regarding CMAKE issue. For, now I am editing the makefiles generated by CMAKE for the integration of the libraries.

Monday, July 5, 2010

Weekly Report 7

Status and Accomplishment
  • Looked further into performance factors. Looked into DMAN3 and ACPY3, tried to implement it, but later gave up considering that there won't be much effect on performance implementing these for internal buffers.
  • Implemented cvIntegral and commited at It works fine for image depth of 8-bit. I need to looked into its performance now.
  • Implemented DFT algorithm. Since, there is floating point normalization, floating to Q15, scaling the result, Q15 to floating conversion and unnormalization, I doubt about its performance. This extra task comes due to use of C64x+DSP. Implementing 2-D DFT is taking little more time than expected.
  • Beside couple of days at the beginning, my aim this week is to integrate the libraries with the OpenCV library. I had given it up last week after working on it for almost a day. Main focus this week will be on integration. Beside that, I will also look into code clean-up and error checking.
  • As I am planning to look into integration this week. I may need some help in this part. The main blocker last week was CMAKE build system. Since then I was looking into different issues and look forward to solve it coming week.

Monday, June 28, 2010

Weekly Report 6

Status and Accomplishment
  • Worked on the performance benchmarking of sobel algorithm. To my surprise, the performance was found to be lower than the non-dsp OpenCV sobel algorithm. The design architecture of calling codec-engine was then changed. Earlier, the codec-engine was opened and then closed after processing the algorithm each time it was called. Now, the codec-engine remains open through the execution and then closed at the end when all the processing is done. The performance inproved and is close to non-dsp OpenCV implementation. To process and then display the video "tree.avi", that comes along with OpenCV examples, frame by frame with 50 ms wait time in between, it takes around 6 seconds compared to 5 seconds with non-dsp. I am still looking into factors to boost up the performance.
  • Extended the sobel algorithm. It is now capable of working with 5x5 and 7x7 kernel.
  • I am currently working on implementing cvIntegral and extending DFT. Tested the algorithm for calculating integral image. Some more work is needed so that it could be applied to images.
  • Looked into integration of my library with existing OpenCV library. Did some modification to existing library to conditionally call my-library after some error checking and environment-variable checkup. Had some issues with CMAKE which I have mentioned on blockers.
  • Look further into performance hurdles and try to overcome it.
  • Expand Integral algorithm and DFT algorithm.
  • Look into their performance and compare it with non-dsp OpenCV algorithm.
  • When trying to building OpenCV library after integrating my library with existing algorithm, the linker was not able to find my library. After wasting almost a day in that, I gave it up moved on to other task. I will look into more details of CMAKE build procedure and needed changes to be done, when I am done with other algorithms. Meanwhile, I am planning to look into it only during free time.

Sunday, June 20, 2010

Weekly Report 5

Status and Accomplishment
  • Separated ARM side library and application.Tested with different images. Modified Makefile to generate an ARM side library and app executable seperately. The source can be found at
  • Wrote building procedure which can be found at
  • Modified the library so that it could be applied to the images captured or loaded using OpenCV API. Tested the library with image frames from webcam. Sobel algorithm works well for images of type CV_8UC1 and CV_8SC1. Its output is in the same format as the input type, however the depth change to 16 bit on original OpenCV library. I need to convert the image depth for seemless integration with OpenCV.
  • Benchmark the performance of my library. Compare its performance with that of non-dsp OpenCV library.
  • Integrate my library with OpenCV library.
  • As my earlier DFT was 1-D, need to work on it so that it could be applied to images.
  • So far there is no blocker, but I may have some problem while integrating my library with OpenCV library. Still I don't have clear picture how it is to be implemented.

Sunday, June 13, 2010

Weekly Report 4

Status and Accomplishment
  • Successfully captured frames from Logitech webcam pro 9000 using opencv library functions.
  • Tested the code for 1-D DFT using DSPLib. Also worte code for IUNIVERSAL using Sobel_3x3,5x5 and 7x7 using IMGLib. Tested functioning of sobel_3x3 and found to work well. Wrote application for the arm side to pass an image pointer after saving it in a continious memory block and save the processed image later.
  • Executed QualiTI test for the IUNIVERSAL library package for the written code. Made some changes to comply with XDAIS standard.
  • Seperate the app and arm-side library for integration with opencv with proper namespace.
  • Look into opencv data-structure and do necessary data mangling so that I could process it using functions from DSPLib and IMGLib.
  • Further explore the webcam and process the image received from the webcam using IUNIVERSAL based engine.
  • I have a working copy of opencv in my beagleboard, which I got from opkg installer. But I want to build one using openembedded. When I tried to "bitbake opencv" I got some error almost at the end. I am not able to solve it. The error log can be found at .
  • Capturing continious frames from my webcam is giving me some error message regarding invalid hauffman code. Also there is some flickering in the image periodically which is believe is due to this error. I am not sure how to solve it. Also there are few other errors as I have mentioned at

Tuesday, June 8, 2010

Capturing Frames from webcams using OpenCV

After many unsuccessful try, today I finally captured some frames from my usb webcam. The webcam I am using is Logitech Webcam Pro 9000. New linux kernel, like 2.6.32 supports UVC compliant webcams. The webcam is detected as soon as you plug in the usb to the self powered USB hub. I could also see that my webcam is detected by typing

root@beagleboard:~# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 0409:005a NEC Corp. HighSpeed Hub
Bus 002 Device 003: ID 413c:1002 Dell Computer Corp. Keyboard Hub
Bus 002 Device 004: ID 0461:4d15 Primax Electronics, Ltd Dell Optical Mouse
Bus 002 Device 005: ID 046d:0809 Logitech, Inc.
Bus 002 Device 006: ID 0b95:772a ASIX Electronics Corp.
Bus 002 Device 007: ID 413c:2002 Dell Computer Corp. SK-8125 Keyboard

and for further confirmation that the UVC module is loaded I typed the following

root@beagleboard:/media/work/opencv# lsmod
Module Size Used by
g_ether 24916 0
nfsd 241018 8
nfs_acl 2173 1 nfsd
exportfs 3092 1 nfsd
ipv6 249183 10
rfcomm 33488 0
hidp 11193 0
l2cap 30104 4 rfcomm,hidp
bluetooth 49221 3 rfcomm,hidp,l2cap
rfkill 15030 1 bluetooth
rtc_twl 4451 0
rtc_core 12599 1 rtc_twl
uvcvideo 55469 0
mailbox_mach 4183 0
mailbox 3609 1 mailbox_mach

I had to struggle a little to successfully capture frames from my webcam. I searched for many
resources and finally came across . In this thread, a group member has pointed to setup procedure given by Don Lewis. Many thanks to the group member and Don Lewis. It was a relief after so many unsuccessful
try. I tried to follow exactly what it said. While compiling the program I got following warnings
but there were no errors.

root@beagleboard:/media/work/opencv# gcc -I/usr/include/opencv -g Capture.c -o
test -lml -lcvaux -lhighgui -lcv -lcxcore
In file included from /usr/include/opencv/cxcore.h:70,
from /usr/include/opencv/cv.h:58,
from Capture.c:5:
/usr/include/opencv/cxtypes.h: In function 'cvRound':
/usr/include/opencv/cxtypes.h:228: warning: incompatible implicit declaration of
built-in function 'lrint' root@beagleboard:/media/work/opencv#

However, execution failed giving following errors and messages on the console.

root@beagleboard:/media/work/opencv# ./test
libv4lconvert: warning more framesizes then I can handle!
libv4lconvert: warning more framesizes then I can handle!
[27996.920166] keyboard.c: can't emulate rawmode for keycode 212
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
libv4l2: error allocating conversion buffer
mmap: Cannot allocate memory
[27997.052886] uvcvideo: Failed to query (1) UVC control 4 (unit 1) : -32 (exp. 4).
HIGHGUI ERROR: V4L2: Failed to set control "10094850": Input/output error (value 305)
HIGHGUI WARNING: Setting property 10094850 through v4l2 failed. Trying with v4l1 .
HIGHGUI ERROR: V4L: property #10094850 is not supported
munmap: Invalid argument munmap: Invalid argument
munmap: Invalid argument munmap: Invalid argument
Unable to stop the stream.: Bad file descriptor
munmap: Invalid argument
munmap: Invalid argument
munmap: Invalid argument
munmap: Invalid argument
libv4lconvert: warning more framesizes then I can handle!
libv4lconvert: warning more framesizes then I can handle!
[27997.192169] keyboard.c: can't emulate rawmode for keycode 212
libv4l1: error allocating v4l1 buffer: Cannot allocate memory
HIGHGUI ERROR: V4L: Mapping Memmory from video source error: Invalid argument
HIGHGUI ERROR: V4L: Initial Capture Error: Unable to load initial memory buffers
. failed to get a video frame
OpenCV Error: Null pointer (NULL array pointer is passed) in cvGetMat, file /OE/
src/cxcore/cxarray.cpp, line 2376 terminate called after throwing an instance of 'cv::Exception'
what(): /OE/angstrom-dev/work/armv7a-angstrom-linux-gnueabi/opencv-2.1.0+svnr
3058-r0/opencv/src/cxcore/cxarray.cpp:2376: error: (-27) NULL array pointer is p
assed in function cvGetMat

Initially I could not figure out what caused to abort. I gave a try to some other example mentioned at Still the problem was same.
I then the changed the "mem" argument for uboot. Initially I had set "mem=80M". Then changed it for "mem=128M" in my uboot. Booted my Beagleboard and then run the test. This time, I could successfully capture frames although I could still see some messages on my console,

root@beagleboard:/media/work/opencv# ./test
libv4lconvert: warning more framesizes then I can handle!
libv4lconvert: warning more framesizes then I can handle!
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
[ 138.964233] uvcvideo: Failed to query (1) UVC control 4 (unit 1) : -32 (exp.
HIGHGUI ERROR: V4L2: Failed to set control "10094850": Input/output error (value
HIGHGUI WARNING: Setting property 10094850 through v4l2 failed. Trying with v4l1
HIGHGUI ERROR: V4L: property #10094850 is not supported

I am still not sure how to fix these warnings. I searched for this issue and find out many topics regarding it but still has not come across the solution. Ffor now I am able to capture some frames using my webcam and further need to work on adding this to my application. One of the images I captured using my webcam is the one at the top.

Monday, June 7, 2010

Weekly Report 3

Status and Accomplishment

  1. Starting of the week was devoted to resolve the issue with JTAG debugging.
  2. Then, I looked into detail of IUNIVERSAL examples.
  3. Wrote code for 1-D DFT using IUNIVERSAL taking reference as FIR example and BitBlit project. Had some issues with linking with the library files for DSPLib. It was resolved by the end of the week. I could see some output but that does not confirms to what I am supposed to get. Tried to debug it but had few issues while debugging. I have mentioned the issues below. I need to look further into it.
  4. By the end of the week, I bought a Logitech webcam, and I am trying to figure out how to capture a frame. Tried couple of times to capture frames, taking references from resources on the internet, but no success.
  1. Look further into the code I have written and expand it to accomodate other functions from IMGLib.
  2. Learn debugging on multi-core.
  3. Interface webcam and captrue frames.
  1. Although I did not get much time to look into debugging, I had some issues about debugging on multicore. I tried to debug the code that I had compiled in Linux using CCS. Loaded the code. But I could only see assembly code, which was very tough for me to manage and keep track. Also, I could not figure out how to debug as process in DSP side depends upon the signal from ARM side.

Wednesday, June 2, 2010

Debugging with CCS4 using XDS510USB+ for Beagleboard.

It had been a while that I had not posted anything. For more that a week since the official date for coding, I was stuck on JTAG debugging and later with RTSC wizard. Since, the problem is solved now I think it is worth mentioning here and can be useful for someone who would like to use CCS4 for debugging the stuff going around in beagleboard using XDS510USB+. I will try to eleborate the problem I faced and how I overcame it.

Problem: After I lunched the debugger, I was able to connect to the CortexA8 of the beagleboard but not to the DSP. And later not able to load my program even I could connect to the DSP.


Updated my CCS4 to Version

Used gel files: sdomap35xx_c64plus.gel for DSP


Whenever I tried to connect to the DSP of beagleboard, it gave error message containing " cpu clock...". I knew it that the DSP is in the reset mode and it should be released from the reset mode. But the gel script to reset did not work for me. I could see 'Scripts->OMAP35xx Functions->C64xPlusRelease_FromReset'. When ever I executed this, there was no result and any message on the console. Then I modified one of the dsplink example as in, powered on the beagleboard, loaded the necessary modules to run the DSPLink example. Started CCS4, started the debug session. In the beagle boarad, i executed the sample example which release the DSP from reset and and keeps it in a infinite loop. On the code composer studio, I the clicked 'Connect Target'. The CCS4 was then connected to the DSP. But still I was not able to load my program. Trying to load my program gave me diffirent error message like ..DSP is in wait-in-reset mode...any some time other.

The solution to this was simply to wait. I was so desperate to test my DSP that I never bothered to wait. If I try to play with the DSP and load my program before the example code finish its execution (which took almost 2 minutes as the DSP was in infinite loop and gpp side exit with failed status), then I could not do anything after that. Any attempt to run execution and tricks was unsuccessful as I always used to try to load my program and did not wait that long. After the example is finished executing the wait was over. I could then load any program and step through.

Also, it is worth mentioning and I don't know why, after I terminate the debug session, my beagleboard is dead and I have to hard reset it to give back life.

Sunday, May 30, 2010

Weekly Report 2

Status and Accomplishments
  • Build and run DSPLink, Codec-Engine and iUNIVERSAL examples.
  • Tested DSPLib for DFT on 64x+ cycle accurate simulator.
  • Tested RTSC codec package wizard and RTSC server wizard.
  • Looked into iUNIVERSAL examples for better understanding.
  • Look further into iUNIVERSAL API .
  • Modify and rebuild iUNIVERSAL examples that could use few of the DSPLib and IMGLib functions.
  • Use CCS and JTAG for debugging.
  • Look into xDAIS standard.
  • Was unable to debug using JTAG. Looking for help on JTAG debugging. Also posted for help on
  • Was unable to generate DSP executable using RTSC server package wizard. Also posted for help on

Wednesday, May 19, 2010

My First Test With SVN

Today, I did my first test with SVN (A revision control system). Following was the procedure followed.

# sudo apt-get install subversion
# svnadmin create --fs-type fsfs /media/disk/OE/dspacceleration
# cd /media/disk/OE/dspacceleration
#svn checkout opencv-dsp-acceleration
#svn import -m "Testing svn_first time" /media/disk/OE/work

I was prompted with username and password. Google account password did not work for this. I had to find the password at the and then click on password to see the password. Finally, the test file was imported.

Monday, May 17, 2010

Are you using OpenEmbedded based development inside VMware Linux??

If you are also using openembedded based development environment inside virtual Linux, before starting with the development you should pay attention to few thing. I came across different issues. Here I try to summarize it.

1) Make sure you have plenty of disk space to accommodate the build procedure. Plenty does not mean few Gigabytes. It should be minimum of 30GB but adding some more GB to it will not harm. I had allocated 20GB separately for openembedded but turn out to be havoc in between of the build procedure.
--> To solve this issue, I had to allocate more space. I now have 50GB and things seems to be fine now.
2) If you are using VMware Player,(suggested is VMware Workstation) you cannot directly expand your disk size unless you have the version 3 or new. You can go to setting before launching your VMware Linux and then then expand your drive. But again you have to create or extend partition inside guest OS(Ubuntu in my case)using 'gparted' or similar partition tool.
3)Don't provide symbolic link to your newly created disk space. You should rather start building after you #cd OE_Base_Dir_On_New_Partition. Otherwise, you may end-up with error similar to
make: GNUMakefile: Too many levels of symbolic links
make: stat: GNUMakefile: Too many levels of symbolic links.
make: ***No rule to make target 'GNUMakefile'. Stop.
Fatal: oe_runmake failed.
There is some issue with coreutils-native-7.2.-r1 package. The GNUMakefile is symbolically linked to itself if symbolic link is used within build path.
4)Try to allocate more that 512 MB of memory while working with VMware. You may end up with the following error
cc1: out of memory allocating XXXXXXX bytes after a total of YYYYYYYY bytes
--> Issue solved by increasing the memory size using VMware setting.

As OpenEmbedded based development generally take lots of time, plan and act ahead to save time!!!!

GSoC 2010 : OpenCV DSP Acceleration

Warm Greeting to all,
I am writing blog for the first time. I knew that I had to start it some day. And I think the day has come. I am not used to blogging and I apologize in advance if I am not able to convey my ideas clearly.

This summer, summer of 2010, I will be working on OpenCV DSP acceleration. My proposal for OpenCV DSP Acceleration was selected for Google Summer of Code 2010. My mentoring organization is I would like to thank my mentors Leonardo Estevez, Katie Roberts-Hoffman and Luis Gustavo Lira for their decision to give there precious time during this summer.

My Work
The goal of this project is to accelerate OpenCV ( which is a open-source computer vision library) using the on-chip DSP C64x+ on OMAP3530. I will be using Beagleboard which house heterogeneous processor unit like ARM Cortex-A8 and TMS320C64x+ DSP. I am basically planning to port few OpenCV API link cvDFT(), cvSobel(), cvAvgSdv(), cvIntegral() to DSP. These accelerated libraries will then be used to demonstrate to success of the project by building a application for template-matching. This application will capture image using a camera continuously and do the template-matching with stored templates after processing the captured image.

If you think you can help me with fantastic ideas and suggestions, I will really appreciate that. I will be continuously posting my progress and details. Please, don't forget to check back later for updates and for your suggestions.