OpenCL

OpenCL
Original author(s)	Apple Inc.
Developer(s)	Khronos Group
Stable release	1.1 / June 11, 2010
Operating system	Cross-platform
Type	GPGPU, API
License	Royalty Free
Website	www.khronos.org/opencl

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. Its architecture shares a range of computational interfaces with two competitors, NVidia's Compute Unified Device Architecture and Microsoft's DirectCompute.

OpenCL gives any application access to the Graphics Processing Unit for non-graphical computing. Thus, OpenCL extends the power of the Graphics Processing Unit beyond graphics (General-purpose computing on graphics processing units). OpenCL is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL is managed by the non-profit technology consortium Khronos Group.

History

OpenCL was initially developed by Apple Inc., which holds trademark rights, and refined into an initial proposal in collaboration with technical teams at AMD, IBM, Intel, and Nvidia. Apple submitted this initial proposal to the Khronos Group. On June 16, 2008 the Khronos Compute Working Group was formed^[1] with representatives from CPU, GPU, embedded-processor, and software companies. This group worked for five months to finish the technical details of the specification for OpenCL 1.0 by November 18, 2008.^[2] This technical specification was reviewed by the Khronos members and approved for public release on December 8, 2008.^[3]

OpenCL 1.0 has been released with Mac OS X v10.6 ("Snow Leopard"). According to an Apple press release:^[4]

Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.

AMD has decided to support OpenCL (and DirectX 11) instead of the now deprecated Close to Metal in its Stream framework.^[5]^[6] RapidMind announced their adoption of OpenCL underneath their development platform, in order to support GPUs from multiple vendors with one interface.^[7] On December 9, 2008, Nvidia announced its intention to add full support for the OpenCL 1.0 specification to its GPU Computing Toolkit.^[8]. On October 30, 2009, IBM released its first OpenCL implementation as a part of the XL compilers^[9].

Open CL 1.1 was ratified by the Khronos Group June 14, 2010, [1] and adds significant functionality for enhanced parallel programming flexibility, functionality and performance including:

New data types including 3-component vectors and additional image formats;
Handling commands from multiple host threads and processing buffers across multiple devices;
Operations on regions of a buffer including read, write and copy of 1D, 2D or 3D rectangular regions;
Enhanced use of events to drive and control command execution;
Additional OpenCL C built-in functions such as integer clamp, shuffle and asynchronous strided copies;
Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.

The OpenCL specification is under development at Khronos, which is open to any interested company to join.

Implementation

On December 10, 2008, AMD and Nvidia held the first public OpenCL demonstration, a 75-minute presentation at Siggraph Asia 2008. AMD showed a CPU-accelerated OpenCL demo explaining the scalability of OpenCL on one or more cores while Nvidia showed a GPU-accelerated demo.^[10]^[11]

On March 26, 2009, at GDC 2009, AMD and Havok demonstrated the first working implementation for OpenCL accelerating Havok Cloth on AMD Radeon HD 4000 series GPU.^[12]

On April 20, 2009, Nvidia announced the release of its OpenCL driver and SDK to developers participating in its OpenCL Early Access Program.^[13]

On August 5, 2009, AMD unveiled the first development tools for its OpenCL platform as part of its ATI Stream SDK v2.0 Beta Program.^[14]

On August 28, 2009, Apple released Mac OS X Snow Leopard, which contains a full implementation of OpenCL.^[15]

OpenCL in Snow Leopard will initially be supported on the ATI Radeon HD 4850, ATI Radeon HD 4870 and NVIDIA's Geforce 8600M GT, GeForce 8800 GS, GeForce 8800 GT, GeForce 8800 GTS, Geforce 9400M, GeForce 9600M GT, GeForce GT 120, GeForce GT 130, GeForce GTX 285, Quadro FX 4800, and Quadro FX 5600.^[16]

On September 28, 2009, NVIDIA released its own OpenCL drivers and SDK implementation.

On October 13, 2009, AMD released the fourth beta of the ATI Stream SDK 2.0, which provides a complete OpenCL implementation on both R700/R800 GPUs and SSE3 capable CPUs. The SDK is available for both Linux and Windows. ^[17]

On November 26, 2009, NVIDIA released drivers for OpenCL 1.0 (rev 48).

The Apple^[18], Nvidia^[19], RapidMind^[20] and Gallium3D^[21] implementations of OpenCL are all based on the LLVM Compiler technology and use the Clang Compiler as its frontend.

On December 10, 2009, VIA released their first product supporting OpenCL 1.0 - ChromotionHD 2.0 video processor included in VN1000 chipset.^[22]

On December 21, 2009, AMD released the production version of the ATI Stream SDK 2.0,^[23] which provides OpenCL 1.0 support for R800 GPUs and beta support for R700 GPUs.

On June 1, 2010, ZiiLABS released details of their first OpenCL implementation for the ZMS processor for handheld, embedded and digital home products. ^[24]

On June 30, 2010, IBM released a fully conformant version of OpenCL 1.0^[25].

Example

This example will compute a Fast Fourier Transformation (FFT): ^[26]

// create a compute context with GPU device
context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

// create a command queue
queue = clCreateCommandQueue(context, NULL, 0, NULL);

// allocate the buffer memory objects
memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float)*2*num_entries, srcA, NULL);
memobjs[1] = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float)*2*num_entries, NULL, NULL);

// create the compute program
program = clCreateProgramWithSource(context, 1, &fft1D_1024_kernel_src, NULL, NULL);

// build the compute program executable
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

// create the compute kernel
kernel = clCreateKernel(program, "fft1D_1024", NULL);

// set the args values
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobjs[0]);
clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&memobjs[1]);
clSetKernelArg(kernel, 2, sizeof(float)*(local_work_size[0]+1)*16, NULL);
clSetKernelArg(kernel, 3, sizeof(float)*(local_work_size[0]+1)*16, NULL);

// create N-D range object with work-item dimensions and execute kernel
global_work_size[0] = num_entries;
local_work_size[0] = 64;
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);

The actual calculation: (Based on Fitting FFT onto the G80 Architecture)^[27]

// This kernel computes FFT of length 1024. The 1024 length FFT is decomposed into 
// calls to a radix 16 function, another radix 16 function and then a radix 4 function 

__kernel void fft1D_1024 (__global float2 *in, __global float2 *out, 
                          __local float *sMemx, __local float *sMemy) { 
  int tid = get_local_id(0); 
  int blockIdx = get_group_id(0) * 1024 + tid; 
  float2 data[16]; 

  // starting index of data to/from global memory 
  in = in + blockIdx;  out = out + blockIdx; 
 
  globalLoads(data, in, 64); // coalesced global reads 
  fftRadix16Pass(data);      // in-place radix-16 pass 
  twiddleFactorMul(data, tid, 1024, 0); 

  // local shuffle using local memory 
  localShuffle(data, sMemx, sMemy, tid, (((tid & 15) * 65) + (tid >> 4))); 
  fftRadix16Pass(data);               // in-place radix-16 pass 
  twiddleFactorMul(data, tid, 64, 4); // twiddle factor multiplication 
 
  localShuffle(data, sMemx, sMemy, tid, (((tid >> 4) * 64) + (tid & 15))); 
 
  // four radix-4 function calls 
  fftRadix4Pass(data);      // radix-4 function number 1
  fftRadix4Pass(data + 4);  // radix-4 function number 2
  fftRadix4Pass(data + 8);  // radix-4 function number 3
  fftRadix4Pass(data + 12); // radix-4 function number 4

  // coalesced global writes 
  globalStores(data, out, 64); 
}

A full, open source implementation of an OpenCL FFT can be found on Apple's website^[28]

References

^ "Khronos Launches Heterogeneous Computing Initiative" (Press release). Khronos Group. 2008-06-16. Retrieved 2008-06-18.
^ "OpenCL gets touted in Texas". MacWorld. 2008-11-20. Retrieved 2009-06-12.
^ "The Khronos Group Releases OpenCL 1.0 Specification" (Press release). Khronos Group. 2008-12-08. Retrieved 2009-06-12.
^ "Apple Previews Mac OS X Snow Leopard to Developers" (Press release). Apple Inc. 2008-06-09. Retrieved 2008-06-09.
^ "AMD Drives Adoption of Industry Standards in GPGPU Software Development" (Press release). AMD. 2008-08-06. Retrieved 2008-08-14.
^ "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. 2008-08-06. Retrieved 2008-08-14.
^ "HPCWire: RapidMind Embraces Open Source and Standards Projects". HPCWire. 2008-11-10. Retrieved 2008-11-11.
^ "NVIDIA Adds OpenCL To Its Industry Leading GPU Computing Toolkit" (Press release). Nvidia. 2008-12-09. Retrieved 2008-12-10.
^ "OpenCL Development Kit for Linux on Power". alphaWorks. 2009-10-30. Retrieved 2009-10-30.
^ "OpenCL Demo, AMD CPU". 2008-12-10. Retrieved 2009-03-28.
^ "OpenCL Demo, NVIDIA GPU". 2008-12-10. Retrieved 2009-03-28.
^ "AMD and Havok demo OpenCL accelerated physics". PC Perspective. 2009-03-26. Retrieved 2009-03-28.
^ "NVIDIA Releases OpenCL Driver To Developers". NVIDIA. 2009-04-20. Retrieved 2009-04-27.
^ "AMD does reverse GPGPU, announces OpenCL SDK for x86". Ars Technica. 2009-08-05. Retrieved 2009-08-06.
^ Dan Moren (2009-06-08). "Live Update: WWDC 2009 Keynote". macworld.com. MacWorld. Retrieved 2009-06-12. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ "Mac OS X Snow Leopard – Technical specifications and system requirements". Apple Inc. 2009-06-08. Retrieved 2009-08-25.
^ "ATI Stream Software Development Kit (SDK) v2.0 Beta Program". Retrieved 2009-10-14.
^ "Apple entry on LLVM Users page". Retrieved 2009-08-29.
^ "Nvidia entry on LLVM Users page". Retrieved 2009-08-06.
^ "Rapidmind entry on LLVM Users page". Retrieved 2009-10-01.
^ "Zack Rusin's blog post about the Gallium3D OpenCL implementation". Retrieved 2009-10-01.
^ http://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327
^ "ATI Stream SDK v2.0 with OpenCL™ 1.0 Support". Retrieved 2009-10-23.
^ http://www.ziilabs.com/opencl
^ "Khronos Group Conformant Products".
^ "OpenCL" (PDF). SIGGRAPH2008. 2008-08-14. Retrieved 2008-08-14.
^ "Fitting FFT onto G80 Architecture" (PDF). Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. Retrieved 2008-11-14.
^ . "OpenCL on FFT". Apple. 16 Nov 2009. Retrieved 2009-12-07.

External links

Official website
OpenCL for NVIDIA (Download page)
OpenCL for AMD (Download page)
OpenCL for IBM Cell broadband Engine (download page)
gDEBugger CL - OpenCL Debugger, Profiler and Memory Analyzer Windows, Linux and Mac OS X
Fixstars OpenCL Cross Compiler for SSE CPUs
OpenCL for S3 Chrome announcement
MinGW Package with OpenCL support by Gordon Taft
OpenCL-Z
GPCBenchmark a OpenCL General Purpose Computing benchmark
OpenCL Studio An integrated development environment for OpenCL and OpenGL
PyOpenCL by Goncalo Carvalho
PyOpenCL by Andreas Klöckner
JOCL Java OpenCL Bindings by Michael Bien
JogAmp Java Bindings to OpenAL, OpenCL and OpenGL
CLyther - an extension to the OpenCL language
The Open Toolkit library cross-platform C# OpenGL, OpenAL and OpenCL wrapper for Mono/.Net
GPU Modeling and Development in OpenCL
Intro to GPGPU computing featuring OpenCL and CUDA examples
First public demonstration of OpenCL by NVIDIA on December 12, 2008 at Siggraph Asia
First public demonstration of OpenCL by AMD on December 12, 2008 at Siggraph Asia
First public demonstration of OpenCL by ZiiLABS on June 1, 2010 at Computex Asia
OpenCL: What you need to know – article published in Macworld, August 2008
HPCWire: OpenCL on the Fast Track
OpenCL Computing
OpenCL explained as part of a larger article on Snow Leopard (MacOS X 10.6), published at ArsTechnica in August 2009.
Introductory Tutorial to OpenCL Introductory Tutorial to OpenCL, published at AMD Developer Central in August, 2009
OpenCL Tutorial using OpenCL and Cloo OpenCL tutorial using C# bindings OpenCLTemplate and Cloo, covering from very basic installation and first steps to real life OpenCL examples.

[1] "Khronos Launches Heterogeneous Computing Initiative" (Press release). Khronos Group. 2008-06-16. Retrieved 2008-06-18.

[macWorld-2] "OpenCL gets touted in Texas". MacWorld. 2008-11-20. Retrieved 2009-06-12.

[khronosGroup-3] "The Khronos Group Releases OpenCL 1.0 Specification" (Press release). Khronos Group. 2008-12-08. Retrieved 2009-06-12.

[pressrelease-4] "Apple Previews Mac OS X Snow Leopard to Developers" (Press release). Apple Inc. 2008-06-09. Retrieved 2008-06-09.

[AMDpressrelease-5] "AMD Drives Adoption of Industry Standards in GPGPU Software Development" (Press release). AMD. 2008-08-06. Retrieved 2008-08-14.

[eweekAMD-6] "AMD Backs OpenCL, Microsoft DirectX 11". eWeek. 2008-08-06. Retrieved 2008-08-14.

[RapidMindHPCWire-7] "HPCWire: RapidMind Embraces Open Source and Standards Projects". HPCWire. 2008-11-10. Retrieved 2008-11-11.

[Nvidia_Press_Release_2008-12-09-8] "NVIDIA Adds OpenCL To Its Industry Leading GPU Computing Toolkit" (Press release). Nvidia. 2008-12-09. Retrieved 2008-12-10.

[openclIBM-9] "OpenCL Development Kit for Linux on Power". alphaWorks. 2009-10-30. Retrieved 2009-10-30.

[10] "OpenCL Demo, AMD CPU". 2008-12-10. Retrieved 2009-03-28.

[11] "OpenCL Demo, NVIDIA GPU". 2008-12-10. Retrieved 2009-03-28.

[12] "AMD and Havok demo OpenCL accelerated physics". PC Perspective. 2009-03-26. Retrieved 2009-03-28.

[13] "NVIDIA Releases OpenCL Driver To Developers". NVIDIA. 2009-04-20. Retrieved 2009-04-27.

[14] "AMD does reverse GPGPU, announces OpenCL SDK for x86". Ars Technica. 2009-08-05. Retrieved 2009-08-06.

[15] Dan Moren (2009-06-08). "Live Update: WWDC 2009 Keynote". macworld.com. MacWorld. Retrieved 2009-06-12. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[16] "Mac OS X Snow Leopard – Technical specifications and system requirements". Apple Inc. 2009-06-08. Retrieved 2009-08-25.

[17] "ATI Stream Software Development Kit (SDK) v2.0 Beta Program". Retrieved 2009-10-14.

[18] "Apple entry on LLVM Users page". Retrieved 2009-08-29.

[19] "Nvidia entry on LLVM Users page". Retrieved 2009-08-06.

[20] "Rapidmind entry on LLVM Users page". Retrieved 2009-10-01.

[21] "Zack Rusin's blog post about the Gallium3D OpenCL implementation". Retrieved 2009-10-01.

[22] ttp://www.via.com.tw/en/resources/pressroom/pressrelease.jsp?press_release_no=4327

[23] "ATI Stream SDK v2.0 with OpenCL™ 1.0 Support". Retrieved 2009-10-23.

[24] ttp://www.ziilabs.com/opencl

[25] "Khronos Group Conformant Products".

[siggraph-26] "OpenCL" (PDF). SIGGRAPH2008. 2008-08-14. Retrieved 2008-08-14.

[VolkovKazianFFTG80-27] "Fitting FFT onto G80 Architecture" (PDF). Vasily Volkov and Brian Kazian, UC Berkeley CS258 project report. May 2008. Retrieved 2008-11-14.

[AppleOpenCLFFT-28] . "OpenCL on FFT". Apple. 16 Nov 2009. Retrieved 2009-12-07.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

v t e Khronos Group Standards
Active	EGL glTF NNEF OpenCL OpenVG OpenVX OpenXR SPIR SYCL Vulkan
Inactive	COLLADA OpenGL ES SC WebGL OpenKODE OpenMAX OpenSL ES OpenWF WebCL

History

Implementation

Example

See also

References

External links