Sunday, January 6, 2013

Compute Unified Device Architecture (CUDA)


At the start of multi core CPU's and GPUs the processor chips have become parallel systems. But speed of the program will be increased, if software exploits parallelism provided by the underlying multiprocessor architecture. Hence there is a big need to design and develop the software so that it uses multithreading, each thread running concurrently on a processor, potentially increasing the speed of the program dramatically. To develop such a scalable parallel applications, a parallel programming model is required that supports parallel multicore programming environment.

CUDA stands for Compute Unified Device Architecture. It is a parallel programming paradigm released in 2007 by NVIDIA. It is used to develop software for graphics processors and is used to develop a variety of general purpose applications for GPUs that are highly parallel in nature and run on hundreds of GPU’s processor cores.

CUDA has some specific functions, called kernels. A kernel can be a function or a full program invoked by the CPU. It is executed “n” number of times in parallel on GPU by using “n” number of threads. CUDA also provides shared memory and also does synchronization among threads.

The CUDA parallel computing platform provides a few simple C and C++ extensions that enable expressing fine-grained and coarse-grained data and task parallelism. The programmer can choose to express the parallelism in high-level languages such as C, C++, and FORTRAN.