# ENHANCE IMPLEMENTATION OF EMBEDDED CONCURRENT DES FUNCTIONAL UNITS USING SPATIAL PARALLELISM APPROACH ON FPGA FOR BETTER THROUGHPUT

RANA KHAZAAL KHUDHAIR

UNIVERSITI MALAYSIA PERLIS 2015



# RANA KHAZAAL KHUDHAIR (1432321196)

 $(\mathbb{C})$ A dissertation submitted in partial fulfilment of the requirements for the degree of Master of Science (Embedded System Design Engineering)

School of Computer and Communication Engineering **UNIVERSITI MALAYSIA PERLIS** 2015

| Author's f<br>Date of b<br>Title<br>Academic | irth :           |                             |                                                           |
|----------------------------------------------|------------------|-----------------------------|-----------------------------------------------------------|
| Title                                        | :                |                             |                                                           |
|                                              | :<br>Session :   |                             | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                   |
| Academic                                     | Session :        |                             |                                                           |
|                                              |                  |                             | AN COLOR                                                  |
|                                              |                  |                             | CON I                                                     |
| I hereby (                                   | declare that the | thesis becomes the prop     | erty of Universiti Malaysia Perlis (UniMAP) and to be pla |
|                                              |                  | This thesis is classified a |                                                           |
|                                              | ONFIDENTIAL      | (Contains confide           | ential information under the Official Secret Act 1972)*   |
|                                              | ESTRICTED        | (Contains restric           | sted information as specified by the organization wh      |
|                                              |                  | research was dor            |                                                           |
|                                              | PEN ACCESS       | I agree that m              | y thesis is to be made immediately available as h         |
|                                              | .1.              | 1 1.000                     | pen access (full text)                                    |
| I, the aut                                   | hor, give permis | ssion to the UniMAP to      | reproduce this thesis in whole or in part for the purpose |
| 100 B                                        | A 1940 1940      |                             | ng a period of years, if so requested above).             |
|                                              |                  |                             | 0.46.41.                                                  |
|                                              |                  |                             | Certified by:                                             |
| -                                            | SIGNATI          | JRE                         | SIGNATURE OF SUPERVISOR                                   |
| Ī                                            | NEW IC NO. / P   | ASSPORT NO.)                | NAME OF SUPERVISOR                                        |
| D                                            | )ate :           | ,                           | Date :                                                    |
|                                              |                  |                             |                                                           |

# UNIVERSITI MALAYSIA PERLIS

#### ACKNOWLEDGMENT

This work would not have been possible without the encouragement and support of so many. This is my time to say thank you to you all.

I wish to express my deepest gratitude and appreciation to my supervisors *Dr. Muataz S. Hameed* for his guidance, suggestions, continuous support, and encouragement through the research work.

I also would like to thank the Dean of the School of Computer and Communication Engineering, *Prof. Dr. R. Badlishah Ahmad* and **Programme Chairman of Postgraduate Studies** *Dr. Phaklen Ehkan* and all lecturers of the school for their support.

Many thanks and appreciations are expressed to many people for their support during this work, some of whom I feel deserve a special mention, particularly my husband *Dr. Ahmed Azeez Ahmed* for his invaluable support, encouragement and love. My grateful to my parents dears (*Dr. Khazaal Al-Janabi & Madam. Wedad Muhammed*), also to my sisters and brothers for their support and encouragement. My special thanks to my lovely daughters (*Husna & Asma*) for their patience and encouragement through my study time. I wish for all good luck and happiness in their life.

Finally, I would like to thank our second country *Malaysia* for giving us safety; peace and the chance for completing this study.

П

# **TABLE OF CONTENTS**

|                                                                                                                                                                                                | PAGE |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| THESIS DECLARATION                                                                                                                                                                             | Ι    |
| ACKNOWLEDGMENT                                                                                                                                                                                 | II   |
| TABLE OF CONTENTS                                                                                                                                                                              | III  |
| LIST OF TABLES                                                                                                                                                                                 | VI   |
| LIST OF FIGURES                                                                                                                                                                                | VII  |
| LIST OF ABBREVIATIONS                                                                                                                                                                          | Х    |
| ABSTRAK                                                                                                                                                                                        | XII  |
| ABSTRACT                                                                                                                                                                                       | XIII |
| LIST OF FIGURES<br>LIST OF ABBREVIATIONS<br>ABSTRAK<br>ABSTRACT<br>CHAPTER 1 INTRODUCTION<br>1.1 Overview<br>1.2 Problem Statement<br>1.3 Objectives<br>1.4 Scope<br>1.5 Dissertation Outlines | 1    |
| 1.1 Overview                                                                                                                                                                                   | 1    |
| 1.2 Problem Statement                                                                                                                                                                          | 4    |
| 1.3 Objectives                                                                                                                                                                                 | 5    |
| 1.4 Scope                                                                                                                                                                                      | 5    |
| 1.5 Dissertation Outlines                                                                                                                                                                      | 5    |
| CHAPTER 2 LITERATURE REVIEW                                                                                                                                                                    | 6    |
| 2.1 Introduction                                                                                                                                                                               | 6    |
| 2.2 Parallel Computing Design                                                                                                                                                                  | 6    |
| 2.2.1 Concepts of Parallel Computing                                                                                                                                                           | 8    |
| 2.2.1.1 Machine Model                                                                                                                                                                          | 9    |
| 2.2.1.2 Flynn's Classical Taxonomy                                                                                                                                                             | 11   |
| 2.2.2 Scope of Parallel Computing Applications                                                                                                                                                 | 13   |
| 2.3 Spatial Parallelism                                                                                                                                                                        | 15   |
| 2.4 Encryptions Techniques                                                                                                                                                                     | 17   |

| 2.4.1 Overview of Cryptography                                                                                                                                                                                           | 17 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.4.2 The State of Cryptography                                                                                                                                                                                          | 18 |
| 2.4.3 Need of Cryptography                                                                                                                                                                                               | 18 |
| 2.4.4 Cryptography Techniques                                                                                                                                                                                            | 19 |
| 2.4.4.1 Symmetric Methods                                                                                                                                                                                                | 19 |
| 2.4.4.2 Asymmetric Forms                                                                                                                                                                                                 | 20 |
| 2.5 Data Encryption Standard (DES)                                                                                                                                                                                       | 21 |
| 2.5.1 DES Algorithm History                                                                                                                                                                                              | 21 |
| <ul> <li>2.5 Data Encryption Standard (DES)</li> <li>2.5.1 DES Algorithm History</li> <li>2.5.2 Steps of DES Algorithm Work</li> <li>2.5.3 How DES Works in Detail</li> <li>2.6 Design Challenges using FPGAs</li> </ul> | 23 |
| 2.5.3 How DES Works in Detail                                                                                                                                                                                            | 24 |
| 2.6 Design Challenges using FPGAs                                                                                                                                                                                        | 35 |
| 2.7 Critical Survey on Embedded Platforms using DES                                                                                                                                                                      | 38 |
| 2.7.1 Digital Signal Processing Chip                                                                                                                                                                                     | 38 |
| 2.7.2 FPGA Technology                                                                                                                                                                                                    | 39 |
| 2.7.3 Smart Card                                                                                                                                                                                                         | 42 |
| 2.7.4 Graphic Processing Unit (GPU)                                                                                                                                                                                      | 43 |
| 2.8 Summary                                                                                                                                                                                                              | 44 |
| CHAPTER 3 RESEARCH METHODOLOGY                                                                                                                                                                                           | 45 |
| 3.1 Introduction                                                                                                                                                                                                         | 45 |
| 3.2 Design Overview                                                                                                                                                                                                      | 45 |
| 3.3 Design Flow (Procedure)                                                                                                                                                                                              | 48 |
| 3.3.1 Design Entity (VHDL)                                                                                                                                                                                               | 49 |
| 3.3.2 Verification                                                                                                                                                                                                       | 50 |
| 3.3.3 Synthesis                                                                                                                                                                                                          | 50 |
| 3.3.4 Design Implementation                                                                                                                                                                                              | 50 |

| 3.4 Design Entry (Component Net list)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 51 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.5 Remarks and Notices                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 64 |
| 3.6 FPGA Implementation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | 66 |
| <b>CHAPTER 4 RESULTS AND DISCUSSIONS</b>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 70 |
| 4.1 Introduction                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 70 |
| 4.2 Verification Results (for all Design parts) on CAD tool                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 70 |
| 4.3 Board Testing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 83 |
| 4.4 Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 90 |
| CHAPTER 5 CONCLUSIONS AND RECOMMANDATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 91 |
| <ul> <li>4.3 Board Testing</li> <li>4.4 Summary</li> <li>CHAPTER 5 CONCLUSIONS AND RECOMMANDATIONS</li> <li>5.1 Conclusions</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | 91 |
| 5.2 Future Work                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 92 |
| CHAPTER 5 CONCLUSIONS AND RECOMMANDATIONS<br>5.1 Conclusions<br>5.2 Future Work<br>REFERENCES<br>Christeenis protected by official<br>Christeenis protected by officia | 93 |
| OTHIS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |    |

| NO.        |                                                                                                                | PAGE |
|------------|----------------------------------------------------------------------------------------------------------------|------|
| 2.1        | Input Permute Block Pattern                                                                                    | 25   |
| 2.2        | Permuted Choice 1 (PC_1)                                                                                       | 26   |
| 2.3        | The Sub-Key Rotation Table                                                                                     | 26   |
| 2.4        | Permuted Choice 2 (PC_2)                                                                                       | 27   |
| 2.5        | E_Bit-Selection Table                                                                                          | 29   |
| 2.6        | S-Box1                                                                                                         | 30   |
| 2.7        | S-Boxes (S1S8)                                                                                                 | 31   |
| 2.8        | E_Bit-Selection Table<br>S-Box1<br>S-Boxes (S1S8)<br>P- Permuted Table<br>Inverse Permutation IP <sup>-1</sup> | 34   |
| 2.9        | Inverse Permutation IP <sup>-1</sup>                                                                           | 34   |
| 2.10       | The Comparison of Different DES Algorithm Implementation                                                       | 44   |
|            | by Different Embedded Platform                                                                                 |      |
| 4.1        | Implementation Report of DES Functional Units                                                                  | 83   |
| 4.2        | The First Obtained Output Data as a Cipher Text                                                                | 84   |
| 4.3        | The Obtained Two Blocks of Output Data as a Cipher Text                                                        | 86   |
| 4.4        | The Obtained Four Blocks of Output Data as a Cipher Text                                                       | 87   |
| 4.5        | The Comparison between the present and previous DES                                                            | 90   |
| $\bigcirc$ | Algorithm Implementation                                                                                       |      |

# LIST OF TABLES

# LIST OF FIGURES

| NO.  |                                                            | PAGE |
|------|------------------------------------------------------------|------|
| 2.1  | Exponential Growth of Supercomputing Power                 | 8    |
| 2.2  | von Neumann Architecture                                   | 9    |
| 2.3  | Single System Bus Evolution of the Architecture            | 11   |
| 2.4  | Matrix Defines the 4 Possible Classifications According to | 13   |
|      | Flynn                                                      |      |
| 2.5  | Temporal and Spatial Parallelisms                          | 17   |
| 2.6  | Simplified Model of Symmetric Key Cryptography             | 19   |
| 2.7  | Simplified Model of Asymmetric Key Cryptography            | 20   |
| 2.8  | DES Algorithm Data Flow                                    | 24   |
| 2.9  | Creation of 16 Sub-Keys                                    | 27   |
| 2.10 | Computation of $f$                                         | 33   |
| 2.11 | FPGA Growth and Usage Trends                               | 36   |
| 2.12 | (a) DES Core Algorithm; (b) Single Round Expanded          | 38   |
| 2.13 | TLV320AIC23 Codec Block Diagram (Courtesy                  | 39   |
|      | Texas Instruments)                                         |      |
| 2.14 | Full DES Design Schematic Generated by Xilinx ISE tool     | 40   |
| 2.15 | Voice Scrambling-Descrambling                              | 40   |
| 2.16 | DES Decryption Using Rolled Architecture                   | 41   |
| 2.17 | DES Decryption Using Unrolled Architecture with            | 41   |
|      | Pipelining                                                 |      |
| 2.18 | Block Diagram for DES                                      | 42   |
| 2.19 | Modified DES Algorithm Key Generation Process              | 43   |
| 2.20 | Sketch of NVIDIA GT200 Streaming Processors                | 43   |
|      | Array Architecture                                         |      |
| 3.1  | Top Level Block Diagram                                    | 46   |

| 3.2  | Concurrent DES Functional Units                                                                                                                                       | 47 |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.3  | Design Flow Chart                                                                                                                                                     | 49 |
| 3.4  | Structure of the Code                                                                                                                                                 | 51 |
| 3.5  | RTL View of DES Functional Unit                                                                                                                                       | 52 |
| 3.6  | Initial Permutation                                                                                                                                                   | 53 |
| 3.7  | Key_Scheduling                                                                                                                                                        | 54 |
| 3.8  | PC1 & PC2 Components                                                                                                                                                  | 55 |
| 3.9  | PC1 & PC2 Components<br>Round Function<br>S-Boxes<br>S1:S-Boxes<br>desxor1:des_xor1<br>pp:p_permutation<br>desxor2: des_xor2<br>32 Bits Register<br>Final permutation | 56 |
| 3.10 | S-Boxes                                                                                                                                                               | 57 |
| 3.11 | S1:S-Boxes                                                                                                                                                            | 58 |
| 3.12 | desxor1:des_xor1                                                                                                                                                      | 59 |
| 3.13 | pp:p_permutation                                                                                                                                                      | 60 |
| 3.14 | desxor2: des_xor2                                                                                                                                                     | 61 |
| 3.15 | 32 Bits Register                                                                                                                                                      | 62 |
| 3.16 | Final permutation                                                                                                                                                     | 62 |
| 3.17 | RAM 2-port                                                                                                                                                            | 63 |
| 3.18 | RTL OF THE CLOCK GENERATION                                                                                                                                           | 63 |
| 3.19 | Cyclone III FPGA Starter Board                                                                                                                                        | 66 |
| 3.20 | Other Side of Cyclone III FPGA Starter Board                                                                                                                          | 67 |
| 3.21 | Block Diagram of the NEEK Board                                                                                                                                       | 67 |
| 3.22 | FPGA Architecture                                                                                                                                                     | 68 |
| 4.1A | Verification Result of Implemented Algorithm with1 <sup>st</sup> Input                                                                                                | 71 |
| 4.1B | Verification Result of Implemented Algorithm with1 <sup>st</sup> Input                                                                                                | 72 |
| 4.1C | Verification Result of Implemented Algorithm with1 <sup>st</sup> Input                                                                                                | 73 |
| 4.1D | Verification Result of Implemented Algorithm with1 <sup>st</sup> Input                                                                                                | 74 |
| 4.1E | Verification Result of Implemented Algorithm with1st Input                                                                                                            | 75 |
|      |                                                                                                                                                                       |    |

| 4.2        | Clarified Section of Verification Result of Implemented                 | 75 |
|------------|-------------------------------------------------------------------------|----|
|            | Algorithm with1 <sup>st</sup> Input                                     |    |
| 4.3A       | Verification Result of Implemented Algorithm with 2 <sup>nd</sup> Input | 76 |
| 4.3B       | Verification Result of Implemented Algorithm with 2 <sup>nd</sup> Input | 77 |
| 4.3C       | Verification Result of Implemented Algorithm with 2 <sup>nd</sup> Input | 78 |
| 4.3D       | Verification Result of Implemented Algorithm with 2 <sup>nd</sup> Input | 79 |
| 4.3E       | Verification Result of Implemented Algorithm with 2 <sup>nd</sup> Input | 80 |
| 4.4        | Clarified Section of Verification Result of Implemented                 | 80 |
|            | Algorithm with 2 <sup>nd</sup> Input                                    |    |
| 4.5        | Verification Result of Implemented Algorithm with output                | 81 |
| 4.6        | Timing Analysis of Concurrent DES Functional Units Implementation       | 81 |
| 4.7        | Compilation Report of Concurrent DES Functional Units                   | 82 |
| 4.8        | The Implementation result of One DES Functional Unit                    | 85 |
| 4.9        | The Implementation result of Two DES Functional Units Concurrently      | 86 |
| 4.10       | The Implementation result of Four DES Functional Units Concurrently     | 88 |
|            |                                                                         |    |
|            | The Implementation result of Four DES Functional Units Concurrently     |    |
|            | in is le                                                                |    |
| $\bigcirc$ |                                                                         |    |
| <u> </u>   |                                                                         |    |
|            |                                                                         |    |

### LIST OF ABREVIATIONS

COPYTIENT

- AES Advanced Encryption Standard
- ANSI American National Standard Institute
- ASIC Application Specific Integrated Circuit
- ATM Automated Teller Machine
- CPU Central Processing Unit
- DES Data Encryption Standard
- DSP Digital Signal Processing Chip
- EFF Electronic Frontier Foundation
- ENIAC Electronic Numerical Integrator And Computer
- ep expansion permutation
- fp final permutation
- FPGA Field Programmable Gate Array
- GPU Graphic Processing Unit
- IBM International Business Machines
- IDB Input Data Buffer
- IDEA *International Data Encryption Algorithm*
- ip 🧼 initial permutation
- JTAG Joint Test Action Group
- keysched key scheduling
- MIMD Multiple Instruction, Multiple Data
- MISD Multiple Instruction, Single Data
- NCD Network Computer Devices
- NGD Native Generic Database
- NGU Native Circuit Unit

- NIST National Institute of Standard and Technology
- NSA National Security Agency
- ODB Output Data Buffer
- PAR Place & Route program
- PC-1 Permuted Choice 1
- PC-2 Permuted Choice 2
- PCB Printed Circuit Board
- PLL Phase Loop Lock
- p-permutation pp
- by original copyright RAM Random-Access Memory
- roundfunc round function
- RSA **Rivest-Shamir-Adleman**
- Register Transfer Level RTL
- Substitution Box S-Box
- Secure Digital card SD
- System On Chip SOC
- Single Instruction, Multiple Data SIMD
- Single Instruction, Single Data SISD
- UCF User Constraint File
- VHDL VHSIC Hardware Description Language
- VLSI Very Large Scale Integrated circuits
- WAN A Wide Area Network

#### Meningkatkan Pelaksanaan Selaras Dengan DES Unit Rephrase Menggunakan Pendekatan Spatial Parallelisme di FPGA Untuk Dikendalikan Lebih Baik

#### ABSTRAK

Secara umum, keselamatan adalah berkenaan semua jenis maklumat dan data sistem. Kebanyakan piawaian keselamatan adalah terdiri daripada keterteraan hingga perdagangan dan komunikasi persendirian. Salah satu aspek penting untuk komunikasi yang selamat adalah kunci peribadi kriptografi. Baru-baru ini kebanyakan aplikasi dan piawaian keselamatan ditakrifkan kepada algoritma bebas, iaitu, membolehkan pilihan daripada satu set algoritma kriptografi untuk tujuan yang sama. Semenjak Piawaian Penyulitan Data (DES) adalah sistem kunci peribadi algoritma yang paling banyak digunakan, DES mempunyai peranan penting dalam aplikasi keselamatan. "Field Programmable Gate Arrays" (FPGA) adalah peranti perkakasan pembentukan semula fenomena menarik dalam pembangunan pembenaman. Dalam kajian ini, juga pelaksanaan DES dalam pengoptimuman algoritma yang telah dicapai melalui DES komponen unit replikasi hingga rephrase serentak DES unit berfungsi. Operasi ini telah dijalankan dengan menggunakan pendekatan keselarian spatial. Data input / output telah disimpan dalam RAM yang dipisahkan kepada dua komponen simpanan yang menyokong proses baca dan tulis serentak. Pendekatan ini adalah bagi mempercepatkan pemprosesan data. Tambahan pula, kekerapan yang disokong oleh lembaga telah disalin dari 50 sehingga 200 MHz dengan menggunakan "Phase Locked Loop" (PLL) untuk mengelakkan sebarang kelewatan pelaksanaan DES unit untuk berfungsi. Semua ini telah membawa kepada meningkatkan dan kepantasan pelaksanaan DES algoritma dan peningkatan daya pemprosesan. Reka bentuk dan pelaksanaan dilakukan pada papan siklon III FPGA NEEK. © this item is

#### Enhance Implementation of Embedded Concurrent DES Functional Units using Spatial Parallelism Approach on FPGA for Better Throughput

#### ABSTRACT

In general, the security is concerned of all types of information and data systems. Many standards to security are ranging from military to commerce and private communications. One essential aspect for secure communications is the private key cryptography. Recently most security applications and standards are defined to independent algorithm, which is allowing a choice from a set of cryptographic algorithms for the same purpose. Since Data Encryption Standard (DES) is still the most widely used private-key encryption algorithm, DES has a significant role in security applications. Field Programmable Gate Arrays (FPGA) is reconfigurable hardware devices and interesting phenomenon in embedded development. In the present work, DES algorithm implementation optimization has been achieved through the DES unit components replication to four concurrent DES functional units. This operation has been performed by using a spatial parallelism approach. The input/output data has been stored in the separated RAMs which it is dual port memories that supports the read and write processes concurrently. This approach is speedup the processing of data. Furthermore, the frequency which is supported by the board has been duplicated from 50 up to 200 MHz by utilizing the Phase Locked Loop (PLL) to avoid any delay of DES functional unit implementation. All of this has led to enhance and speedup the implementation of DES algorithm and increase throughput as well. The design and on, on, is protection of the second s implementation is performed on Altera Nios II Embedded Evaluation Kit (NEEK) board.

#### **CHAPTER 1**

#### **INTRODUCTION**

#### **1.1 Overview**

In general, the security is concerned of all types of information and data systems. Through the past years, national security and military matters lead to the requirement of the protected communications. Nowadays, business and private sectors are needed also security requirements. Safety internet communications have an important role in electronic commerce. Many companies have firewalls to secure internal information for businesses from competitors. In the particular sector, many products are obtainable to compete both telephone communications and email (Gaurav, 2012). One of providing security means in information and data systems are Cryptography which is the fundamental technique to protect the digital information data. It is a mechanism that is used to avoid an unsecured access of system of data which it assists to provide responsibility vastness, preciseness and also provide privacy. There are two main operations of Cryptography which are called as encryption and decryption. By using encryption, data is transferred in a way which made it unreadable. These data is recovered by using decryption. In principle, decryption process is achieved rightly by the meant receiver(s). The "strength" or "security" of the encryption form is determined by depending on the validity of this predication (Gaurav, 2012; Amandeep, 2010).

In the past years concerning of volume of information and data which widely increased, fast and secure cryptographic algorithms were evolved to face security threats and measures which became fundamental through performing of digital data and information transactions. The security applications faced an additional challenge because the elevated diversity. Since the cryptography demands not only highly secure algorithms but actually for some applications require a high performance and for others less space (Srinivas, 2007). The most public example through the different cryptographic algorithms in the symmetric ciphers scope is the Data Encryption Standard (DES) algorithm (O'Melia & Adam, 2010; Liu, 2007). Since 1972, the National Institute of Standard and Technology (NIST), was conscious of the possible thread of computer and communications data. The NIST produced a programme for developing a consolidated encryption algorithm. In 1976, the DES was already released which it is presently the most excessively used private-key algorithm and it was became also important portion of many standards e.g., the Secure Socket Layer, Automated Teller Machine (ATM) cell, and for different American National Standard Institute (ANSI) banking standards. The DES algorithm stills significant and will have a mainly role in several more years, though it is not reapproved.

The new security applications and standards are defined to be independent algorithm. Therefore, for a determined security service like privacy, a number of various algorithms are used alternatively. This specified to public-key based applications in addition to private-key applications. It is quite simple to turn cryptography algorithms in software platforms, but it is uneasy on traditional hardware. On the other hand, the solutions of hardware give a good speed and better security. The better solution of this problem is reconfigurable hardware. On reconfigurable hardware, performing cryptographic algorithms gives main benefits over Very Large Scale Integrated circuits (VLSI). Furthermore, implementations of VLSI are speedy and must be designed from the behavioural characterization to the physical layout. They are required a costly and time consuming production operation. Where the time is the important and vital factor of implementation process the software platforms implementation has been offered highly flexibility but also they do not have enough speed for the applications. While the cost and time of VLSI fabrication and design process was reduced through reconfigurable devices and for that they are become attractive. Furthermore, the reconfigurable devices offer a high ability to reprogram and experiment on multiple architectures.

One of the modern reconfigurable hardware is based on Field Programmable Gate Array (FPGA) devices which implement algorithms. Thus, FPGA devices are used for building graceful algorithm applications. Therefore, the same device may be used for various algorithms that it is independent on the algorithms nature. In cryptographic aspect, many various encryption algorithms are realized using FPGA. However, the public-key and private-key algorithms are performed using the same FPGA. The DES algorithm is organized in iterated rounds formed of many bit-level operations like "shift operations, substitutions, permutations, logical operations, etc." (Saqib et al. 2004). DES algorithms were implemented on many platforms though FPGA has characteristics are suited for active implementations. From those platforms: software (O'Melia, & Adam, 2010; Liu, 2007), VLSI (Arich, & Eleuldj, 2002; Tiri, & Verbauwhede, 2005; Weiwei et al. 2009) and reconfigurable hardware using FPGA devices (Mulani & Mane, 2014; Saqib et al. 2004; Raed et al. 2014).

In the present work, the optimization of DES algorithm implementation has been performed through the replicated DES unit components in order to become 4 concurrent DES functional units. This process has been accomplished by utilizing a spatial parallelism approach. The input/output data has been stored in the input/output buffers which it is dual port memories that supports the read and write processes simultaneously. This process helps to speed up the processing of data. Moreover, the frequency that supported by the board has been duplicated from 50 up to 200 MHz by using the Phase Locked Loop (PLL) for avoiding any delaying in implementation of the DES functional unit. All of that has led to enhance and speedup the implementation of DES algorithm for better throughput.

#### **1.2 Problem Statement**

Since the beginning of the 80<sup>th</sup> of the previous century, it was a huge growing in communication and information systems, e.g. wireless communications, electronic payment systems and the other areas of communications. At the same time, the security aspects of communication and information are growing also. The mobile conversations, credit card number and bank transection are few examples of threats imposed by unprotected communication infrastructure. Therefore, the main tool for achieving the require security is cryptography. DES algorithm is one of the important private key algorithms that most widely used and it is also part of many other standards, e.g. ATM cell encryption and various banking standards.

Currently, most of researchers are successfully implemented and analysed the DES algorithm using many embedded platforms, e.g. Digital Signal Processing chip (DSP), smart card, Graphic Processing Unit (GPU) and FPGA technology. However, DES algorithm has been implemented to process one block of plaintext for one time.

Researchers have been directed to use parallel scenario to improve the throughput of DES algorithm by utilizing off-chip approach. Whereas, off-chip is effective and increases the throughput, this approach also increase design size, power consumption and cost as well. Therefore to avoid off-chip parallelism, a platform that utilizing spatial parallelism on-chip for DES algorithm is needed to increase throughput without any extra size, power and cost.

#### **1.3 Objectives**

The objectives of the work are to:

i) To Implement Embedded Concurrent DES Functional units with following specifications:

> a) Lower level of design complexity; achieve high operating frequency and consume minimal chip resources.

> b) Increase throughput by applying spatial parallelism approach.

ii) To verify and evaluate the design performance of the system by using FPGA

by original CAD tool and on board testing.

#### 1.4 Scope

The scopes of this project are:

- 1. A new implementation of embedded DES functional units on FPGA by applying spatial parallelism.
- 2. Achieving lower level of implementation complexity; high operating frequency, less chip resources and scalable throughput.

#### **1.5 Thesis Outlines**

This thesis comprises of five chapters including the overview. Chapter 2 demonstrates two important concepts which have been used in the third chapter. It includes too the description of DES algorithm in details and the importance of embedded platforms and their challenges in comparison with other classic hardware. In chapter 3 was demonstrated the methodology of work in addition to the tools that helped to achieve the project. Chapter 4 describes and discusses the results obtained by the implementation of the algorithm on board. Chapter 5 shows the conclusion and the future work.

#### **CHAPTER 2**

#### LITERATURE REVIEW

#### **2.1 Introduction**

Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems are often be divided into smaller ones, which are then solved concurrently ("in parallel") (Almasi & Gottlieb, 1989). There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling (Adve et al. 2008). The consumption of power by PCs became a very important in the last few years. The parallel computing became the controlling sample in the architecture of computers basically, in the shape of multi processer's core (Asanovic et al. 2006; Cristobal et al. 2014).

## 2.2 Parallel Computing Design

Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a Central Processing Unit (CPU) on one computer. Only one instruction may execute at a time after that instruction is finished, the next is executed (Blaise, 2007).

Parallel computing, on the other hand, uses multiple processing elements simultaneously to solve a problem. This is accomplished by breaking the problem into independent parts so that each processing element may execute its part of the algorithm simultaneously with the others. The processing elements are diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above (Blaise, 2007). The parallelism's role in increasing the speed of computing has been noticed for many decades. Its role in supplying pluralism in data paths and increasing the storage access has been important in commercial purposes. The scalable execution and minimal cost of the parallel platforms is reflected in an extensive assortment of applications. The development of parallel software and hardware has been intensive effort and time. If one wants to observe this in the context of quickly improving uniprocessor acceleration, one is attempted to ask the requirement for parallel computing. In the future, there are some clear directions in the design of hardware, that refer to that the architectures uniprocessor might not be eligible to keep the rate of increasable performance. As a result of the limitation in the physical and computational number therefore, the emanation of combined parallel computing hardware, libraries and environments has basically decreased the time to parallel solution (Ananth et al., 2005).

The major significant for using the parallel computing design is to save time or money, solve more complex problems, provide concurrency, take advantage of nonlocal resources and make better use of underlying parallel. During the past twenty years, the trends indicated by ever faster networks, distributed systems, and multi-processor computer architectures clearly show that parallelism is the future of computing. In this same time period, there has been a greater than 500,000x increase in supercomputer performance, with no end currently in sight as displayed in Fig. 2.1 (Blaise, 2007).

### Performance Development



Exponential growth of supercomputing power as recorded by the TOP500.

Figure 2.1: Exponential Growth of Supercomputing Power (Blaise, 2007)

### 2.2.1 Concepts of Parallel Computing

Since 19<sup>th</sup> Century different parallel computing approaches are started. Thus, the actual beginning of parallel computing is unknown for anyone. Parallel computing was started about the middle of 1980s. During this period of years the parallel computers were began programming as a real parallel mechanism that might contend with the founded super-computers (Womble et al. 1999; Timothy, 2005).

#### 2.2.1.1 Machine Model

The von Neumann model is defined as computer architecture is described since 1945 by the scientist J. von Neumann and others. This architecture was described for electronic digital computer that consisted of parts of CPU that contains a memory to save the instructions and data simultaneously, processor registers and arithmetic logic unit, a control unit that contains register instruction and program counter and I/O mechanisms as displayed in Fig. 2.2 (Godfrey & Hendry, 1993).



Actually, all computers have followed this basic design which is comprised of four main components which are memory, control unit, arithmetic logic unit and Input/Output. However, these components are performing all the processing operations such as Read/write, random access memory which is used to store both program instructions and data. Program instructions are coded data which tell the computer to do