X. W. NIU, J. FAN
Copyright © 2013 SciRes. OPJ
117
on the hotspot function are designed and mapped as a
co-processor. Experimental results show the fast Ha-
damard transformation is a better candidate for hardware
acceleration. It costs less on-chip resources, saves 76.52%
energy and achieves an 8.09X speedup compared to the
pure software method. FPGA based software profiling
platform will be setup in the future to eliminate draw-
backs of hardware based profiling tools. System behavior
using off-chip memory for data storage will also be stud-
ied.
REFERENCES
[1] R. Saleh, S. Mirabbasi, G. Lemieux, et al., “System-on-
Chip: Reuse and Integration” Proceedings of the IEEE,
Vol. 94, No. 6, 2006, pp. 1050-1069.
doi:10.1109/JPROC.2006.873611
[2] Joint Video Team of ITU-T and ISO/IEC JTC 1, “Draft
ITU-T Recommendation and Final Draft International
Standard of Joint Video Specification (ITU-T Rec. H.264
ISO/IEC 14496-10 AVC),” Joint Video Team (JVT) of
ISO/IEC MPEG and ITU-T VCEG, JVT-G050, March
2003.
[3] G. Stitt, R. Lysecky, F. Vahid, “Dynamic Hardware/
Software Partitioning: A First Approach.” Proceedings of
the 40th conference on Design Automation, 2003, pp.
250-255.
[4] L. Shannon and P. Chow, “Using Reconfigurability to
Achieve Real-Time Profiling for Hardware/Software
Codesign,” Proceedings of the 12th International Sympo-
sium on Field Programmable Gate Arrays, 2004, pp.
190-199.
[5] R. Duarte, C. Liu and X. Niu, “RSA Cryptography Ac-
celeration for Embedded System,” The 6th International
Workshop on Unique Chips and Systems (UCAS-6), in
conjunction with MICRO-43, Atlanta, GA, December 4,
2010.
[6] J. Villarreal, D. Suresh, G. Stitt, F. Vahid, et al., “Im-
proving Software Performance with Configurable Logic
Kluwer,” Journal on Design Automation of Embedded
Systems, Vol. 7, No. 4, 2002, pp. 325-339.
doi:10.1023/A:1020359206122
[7] D. C. Suresh, W. A. Najjar, F. Vahid, et al., “Profiling
Tools for Hardware/Software Partitioning of Embedded
Applications,” Proceedings of Language, Compiler, and
Tool for Embedded Systems, Vol. 38, No. 7, 2003, pp.
189-198.
[8] T. C. Chen, Y. W. Huang and L. G. Chen, “Analysis and
Design of Macroblock Pipelining for H.264/AVC VLSI
Architecture,” Proceedings of International Symposium
on Circuits and Systems, Vol. 2, 2004, pp. 273-276.
[9] R. C. Kordasiewicz and S. Shirani, “ASIC and FPGA
Implementations of H.264 DCT and Quantization
Blocks,” IEEE International Conference on Image Proc-
essing, Vol. 3, 2005, pp. 1020-1023.
[10] Elgato website: [Online]. Available:
http://www.elgato.com/elgato/na/mainmenu/products/Tur
bo264HD/product1.en.html.
[11] H. C. Lin, Y. J. Wang, K. T. Cheng, et al., “Algorithms
and DSP Implementation of H.264/AVC,” Design Auto-
mation, pp. 24-27, 2006.
[12] Iain E. G. Richardson, “H.264 and MPEG-4 Video Com-
pression: Video Coding for Next-generation Multimedia,”
John Wiley & Sons, Ltd. 2003.
[13] D. Marpe, H. Schwarz and T. Wiegand, “Context-Adap-
tive Binary Arithmetic Coding in the H.264/AVC Video
Compression Standard,” IEEE Transactions on Circuits
and Systems for Video Technology, Vol. 13, No. 7, pp.
620-636, 2003.doi:10.1109/TCSVT.2003.815173
[14] J. G. Tong and M. A. S. Khalid, “Profiling CAD Tools: A
Proposed Classification,” Proceeding of the 19th Interna-
tional Conference on Microelectronics, 2007, pp.
253-256.
[15] R. Lysecky, S. Cotterell, and F. Vahid, “A Fast On-Chip
Profiler Memory,” Proceedings of the 39th Conference on
Design Automation, pp. 28-33, 2002.
[16] Jason G. Tong, Mohammed A. S. Khalid, “Profiling tools
for FPGA-Based Embedded Systems: Survey and Quan-
titative Comparison,” Journal of Computers, Vol. 3, No.
6, 2008, pp. 1-14. doi:10.4304/jcp.3.6.1-14
[17] G. B. Newby, “Hardware Acceleration Prospects and
Challenges for High Performance Computing,”
IEEE/ACS International Conference on Computer Sys-
tems and Applications, 2009, pp. 841-844.
[18] ML505/506/507 Platform Manual. Available:
http://www.xilinx.com/support/documentation/boards_an
d_kits/ug347.pdf
[19] Introduct i on of Xilinx LMB. Available:
http://www.xilinx.com/support/documentation/ip_docum
entation/lmb_v10.pdf
[20] Introduction of Xilinx PLB bus. Available:
http://www.xilinx.com/support/documentation/ip_docum
entation/plb_v46.pdf
[21] LogiCORE IP Fast FSL V20 Bus. Available:
http://www.xilinx.com/support/documentation/ip_docum
entation/fsl_v20.pdf
[22] R.C. Gonzalez, R. E. Woods, “Digital Image Processing,”
Prentice Hall, 2nd Edition, Jan, 2002.
[23] Intel Corporation, Using Intel VTune’s Counter Monitor.
January 2005.
[24] K. J. Horadam, “Hadamard Matrices and Their Applica-
tions,” Princeton university press, 2006.