SpringerOpen Newsletter

Receive periodic news and updates relating to SpringerOpen.

This article is part of the series Design and Architectures for Signal and Image Processing 2008.

Open Access Open Badges Research Article

Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis

Nicolas Farrugia1, Franck Mamalet1*, Sébastien Roux1, Fan Yang2 and Michel Paindavoine2

Author Affiliations

1 MAchine to machine technologies Tangible Interactions expertiSe on devices Laboratory (MATIS), Orange Labs, 28 Chemin du Vieux Chène, 38243 Meylan, France

2 Laboratory of Electronics Informatics Image (LE2i), Health-STIC Federative Research Institute (IFR100), Burgundy University-Engineer Science Center, 21078 Dijon, France

For all author emails, please log on.

EURASIP Journal on Embedded Systems 2008, 2008:938256  doi:10.1155/2008/938256

The electronic version of this article is the complete one and can be found online at: http://jes.eurasipjournals.com/content/2008/1/938256

Received:10 March 2008
Revisions received:20 June 2008
Accepted:12 November 2008
Published:24 December 2008

© 2008 The Author(s).

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE), by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second or 35 VGA images per second.

Publisher note

To access the full article, please see PDF.