skip to main |
skip to sidebar
Status and Accomplishment
- Tested the c6accel library. Found to have similar performance compared to mine. But was not better than opencv native library. The relative performance measured was 4209167/4577332 for 16-bit sobel. With continious memory allocation the performance was 4209167/4498566.
- Tried to remove some of the cache writeback but couldnot see any difference in performance.
- So, instead of only using DSP for the algorithm, I assigned the task between the 2 processors by dividing the data to work on . Created 2 thread one calling the ARM side and other calling the DSP side. There was a slight improve in the performance but still not better than native ARM side code. The performance achieved this time was 4209167/4467565. The half of the output image was visually dissimilar to the other half in terms of edge contrasts. I will upload this picture.
- I think only way I can gain performance is by working on both the processor. Creating 2 API for same function, one for DSP and the other for ARM.
- Worked on the application part too. Started coding for it.
Plans- Instead on only working on DSP, I am planning to use it for task offloading. Creating asynchronous API and fetching the result later.
- Look into performance and application.
Blockers
- Still not able to beat the ARM performance.
Status and Accomplishment
- Simplified the build instruction. Worked on the recipe to build the project.
- Worked on UNIVERSAL_processAsync(). Since I was passing a whole buffer, async call doesnot seem to work for this scheme. As which function will be called next is uncertain. I am now breaking down the buffer in chunks and working through it. But there is still confusion on the size of the buffer. For 7x7 soble I need at least 7 rows to pass where as for 3x3 I need at least 3 and for DFT, 1 is ok.
- Tried to work on memory allocation of OpenCV on continious memroy. I am getting seg fault somewhere in Memory_alloc() and need to figure it out.
Plans
- Try to come up with some solution for async call and continious memroy allocation.
- Plan for the application part.
Blockers
- Since I did not hear further from kitware, I encorporated the integration part in a makefile. Using this file, the integration and re-build can be done.
- Need to work on the above mentioned problem and come up with some best solution.
Status and Accomplishment
- Implemented 2-d DFT algorithm. Currently the ouptupt of DFT are scaled, as DSPLIB gives scaled output. I am planning to change the kernel for DFT and IDFT in DSPLIB for non-scaling so that the overhead of scaling back on my algorithm will be reduced. Reviewed integral and soble algorithm.
- Committed the source code and application example for all the algorithm. Instruction was updated accordingly.
- Integrated the library with the existing OpenCV2.1and the newly built opencv library supports the DSP for 3 algorithm.The patch is provided at the patch sub-directory inside the trunk. However, due to workaround with the CMAKE issue there is some extra work to be done to build the new library. It is mentioned on the build instructions.
Plans
- Seems like my instruction is little bit complex. So I am planning to simplify it. Try to update the bitbake recipie that Koen had wrote so that OpenCV could be re-built using new patch.
- Look into Optimization. Try async-universalprocess.
Blockers
- Correspondence with Kitware is still going on regarding CMAKE issue. For, now I am editing the makefiles generated by CMAKE for the integration of the libraries.
Status and Accomplishment
- Looked further into performance factors. Looked into DMAN3 and ACPY3, tried to implement it, but later gave up considering that there won't be much effect on performance implementing these for internal buffers.
- Implemented cvIntegral and commited at http://code.google.com/p/opencv-dsp-acceleration/source/checkout. It works fine for image depth of 8-bit. I need to looked into its performance now.
- Implemented DFT algorithm. Since, there is floating point normalization, floating to Q15, scaling the result, Q15 to floating conversion and unnormalization, I doubt about its performance. This extra task comes due to use of C64x+DSP. Implementing 2-D DFT is taking little more time than expected.
Plans
- Beside couple of days at the beginning, my aim this week is to integrate the libraries with the OpenCV library. I had given it up last week after working on it for almost a day. Main focus this week will be on integration. Beside that, I will also look into code clean-up and error checking.
Blockers
- As I am planning to look into integration this week. I may need some help in this part. The main blocker last week was CMAKE build system. Since then I was looking into different issues and look forward to solve it coming week.