Monday, August 16, 2010

Weekly Report 13

  • Take couple of days off.
  • Work further to add other API and refine work.
  • Provide support and help to anybody interested in this work.

Monday, August 9, 2010

Weekly Report 12

Status and Accomplishment
  • Completed porting all of my algorithms to C6accel. Added new algorithm cvCvtColor() to support CV_RGB2GRY operation. Added VLIB support to C6accel library. Used VLIB_integralImage8() for cvIntegral() implementation. Added flag to control its use. User need to get VLIB access after requesting on Implemented chaining of OpeCV APIs cvCvtColor() and cvSobel(). DSP_cvCvtColor_cvSobel() demonstrates this implementation. This scheme reduces codec-engine overhead between API calls.
  • Worked on the documentation of API and procedure to add new API to the existing library. Following is the links to documentation and
  • Worked on the application to demonstrate the use of these API.
  • All the API call now has almost constant execution time of ~380 usec for establishing asynchronous DSP function call. The ARM and DSP should be synchronized using DSP_cvSyncDSP() before output is accessed. This can give performance boost up of greater than 10x in any algorithm for image size of 640x480 if data dependency is tackeled in application wisely.
  • Waiting for C6accel tag to be created before releasing code for evaluation.
  • Refine document.
  • Review of code.
  • Look into DFT algorithm to avoid race condition as I am able to see result only when CE_DEBUG=3.
  • Some suggestion on my issue with DFT algorithm, as mentioned in Plans section, would be helpful.

Monday, August 2, 2010

Weekly Report 11

Status and Accomplishment
  • OpenCV now allocates memory using CMEM in the continious region. This saves overhead of copying the buffer. Same buffer allocated by OpenCV can be passed to the DSP. Functionality is fine but during the exit of the main() process, there is following error message, 'CMEM Error: CMEM_exit() already called, check stderr output for earlier CMEM failure messages (possibly version mismatch).'
  • Most of the time was used to investigate on ASYNC DSP call. All the DSP_OpenCV call are made ASYNC. There will be now 2 API. One is the native OpenCV that is synchronous and use ARM while the other is ASYNC call to DSP_OpenCV. This gives the opportunity of parallel execution of task; frees the ARM for some other task. A API is provided to synchronize between DSP and ARM. Setting up ASYNC call only takes 274us for 320 x 240 monochrome image and 305us for 640x480 monochrome image. While the synchronous processing of native OpenCV sobel 3x3 algo takes 2655 us and 8820us respectively. This gives benefit of >10x and >28x performance respectively on a algorithm if task is scheduled properly considering the latency of DSP processing.
  • I am now working with C6accel library. Thanks to C6accel team for their support and providing some tweaks as per my need.
  • Work more on the performance and provide benchmark for implemented algorithms.
  • Work on documentation.
  • Work with the application.
  • No blocker as for now.

Monday, July 26, 2010

Weekly Report 10

Status and Accomplishment
  • Tested the c6accel library. Found to have similar performance compared to mine. But was not better than opencv native library. The relative performance measured was 4209167/4577332 for 16-bit sobel. With continious memory allocation the performance was 4209167/4498566.
  • Tried to remove some of the cache writeback but couldnot see any difference in performance.
  • So, instead of only using DSP for the algorithm, I assigned the task between the 2 processors by dividing the data to work on . Created 2 thread one calling the ARM side and other calling the DSP side. There was a slight improve in the performance but still not better than native ARM side code. The performance achieved this time was 4209167/4467565. The half of the output image was visually dissimilar to the other half in terms of edge contrasts. I will upload this picture.
  • I think only way I can gain performance is by working on both the processor. Creating 2 API for same function, one for DSP and the other for ARM.
  • Worked on the application part too. Started coding for it.
  • Instead on only working on DSP, I am planning to use it for task offloading. Creating asynchronous API and fetching the result later.
  • Look into performance and application.
  • Still not able to beat the ARM performance.

Monday, July 19, 2010

Weekly Report 9

Status and Accomplishment
  • Simplified the build instruction. Worked on the recipe to build the project.
  • Worked on UNIVERSAL_processAsync(). Since I was passing a whole buffer, async call doesnot seem to work for this scheme. As which function will be called next is uncertain. I am now breaking down the buffer in chunks and working through it. But there is still confusion on the size of the buffer. For 7x7 soble I need at least 7 rows to pass where as for 3x3 I need at least 3 and for DFT, 1 is ok.
  • Tried to work on memory allocation of OpenCV on continious memroy. I am getting seg fault somewhere in Memory_alloc() and need to figure it out.
  • Try to come up with some solution for async call and continious memroy allocation.
  • Plan for the application part.
  • Since I did not hear further from kitware, I encorporated the integration part in a makefile. Using this file, the integration and re-build can be done.
  • Need to work on the above mentioned problem and come up with some best solution.

Monday, July 12, 2010

Weekly Report 8

Status and Accomplishment
  • Implemented 2-d DFT algorithm. Currently the ouptupt of DFT are scaled, as DSPLIB gives scaled output. I am planning to change the kernel for DFT and IDFT in DSPLIB for non-scaling so that the overhead of scaling back on my algorithm will be reduced. Reviewed integral and soble algorithm.
  • Committed the source code and application example for all the algorithm. Instruction was updated accordingly.
  • Integrated the library with the existing OpenCV2.1and the newly built opencv library supports the DSP for 3 algorithm.The patch is provided at the patch sub-directory inside the trunk. However, due to workaround with the CMAKE issue there is some extra work to be done to build the new library. It is mentioned on the build instructions.
  • Seems like my instruction is little bit complex. So I am planning to simplify it. Try to update the bitbake recipie that Koen had wrote so that OpenCV could be re-built using new patch.
  • Look into Optimization. Try async-universalprocess.
  • Correspondence with Kitware is still going on regarding CMAKE issue. For, now I am editing the makefiles generated by CMAKE for the integration of the libraries.

Monday, July 5, 2010

Weekly Report 7

Status and Accomplishment
  • Looked further into performance factors. Looked into DMAN3 and ACPY3, tried to implement it, but later gave up considering that there won't be much effect on performance implementing these for internal buffers.
  • Implemented cvIntegral and commited at It works fine for image depth of 8-bit. I need to looked into its performance now.
  • Implemented DFT algorithm. Since, there is floating point normalization, floating to Q15, scaling the result, Q15 to floating conversion and unnormalization, I doubt about its performance. This extra task comes due to use of C64x+DSP. Implementing 2-D DFT is taking little more time than expected.
  • Beside couple of days at the beginning, my aim this week is to integrate the libraries with the OpenCV library. I had given it up last week after working on it for almost a day. Main focus this week will be on integration. Beside that, I will also look into code clean-up and error checking.
  • As I am planning to look into integration this week. I may need some help in this part. The main blocker last week was CMAKE build system. Since then I was looking into different issues and look forward to solve it coming week.