Monday, June 28, 2010

Weekly Report 6

Status and Accomplishment
  • Worked on the performance benchmarking of sobel algorithm. To my surprise, the performance was found to be lower than the non-dsp OpenCV sobel algorithm. The design architecture of calling codec-engine was then changed. Earlier, the codec-engine was opened and then closed after processing the algorithm each time it was called. Now, the codec-engine remains open through the execution and then closed at the end when all the processing is done. The performance inproved and is close to non-dsp OpenCV implementation. To process and then display the video "tree.avi", that comes along with OpenCV examples, frame by frame with 50 ms wait time in between, it takes around 6 seconds compared to 5 seconds with non-dsp. I am still looking into factors to boost up the performance.
  • Extended the sobel algorithm. It is now capable of working with 5x5 and 7x7 kernel.
  • I am currently working on implementing cvIntegral and extending DFT. Tested the algorithm for calculating integral image. Some more work is needed so that it could be applied to images.
  • Looked into integration of my library with existing OpenCV library. Did some modification to existing library to conditionally call my-library after some error checking and environment-variable checkup. Had some issues with CMAKE which I have mentioned on blockers.
Plans
  • Look further into performance hurdles and try to overcome it.
  • Expand Integral algorithm and DFT algorithm.
  • Look into their performance and compare it with non-dsp OpenCV algorithm.
Blockers
  • When trying to building OpenCV library after integrating my library with existing algorithm, the linker was not able to find my library. After wasting almost a day in that, I gave it up moved on to other task. I will look into more details of CMAKE build procedure and needed changes to be done, when I am done with other algorithms. Meanwhile, I am planning to look into it only during free time.

1 comment:

  1. To improve performance, you might try the asyncronous APIs (i.e., rather than UNIVERSAL_process(), consider queuing up multiple operations with UNIVERSAL_processAsync()/processWait()). http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/ce/latest_2_x/docs/html/group__ti__sdo__ce__universal___u_n_i_v_e_r_s_a_l.html

    Also, depending on your DSP-side algorithm's implementation, you may also get a lift by enabling the recently introduced Server.skelCachingPolicy feature and setting it to .WBINVALL. In some systems, with algs that manage large data buffers, this can improve performance. http://processors.wiki.ti.com/index.php/Codec_Engine_skelCachingPolicy

    ReplyDelete