Table 1: CPU vs GPU performance comparison. To fully assess the performance improvements brought by the GPU-based implementation, a larger dataset should be considered. In this test, only 1000 voxels were available. A contrario, and as expected, the current and purely sequential CPU based implementation execution time is linearly dependent on the number of voxels processed. Recurring Single Premiums and/or Top-ups will be 100 allocated to the AUA, at no. the CUDA implementation performed more than 18 times faster.įurthermore, looking at Table 1 below which compares the performances of the CUDA kernel (GPU) over the current (CPU) implementation (this time using timemarks in the code with no Matlab profiler enabled, where both implementations are called) with an increasing number of processed voxels, one can note that the CUDA kernel execution time is almost constant, regardless the number of voxels being processed. Policy Year 15 until the end of the Minimum Investment Period (MIP). MyTemporal took 185.546 s to execute, against 10.157 s for its CUDA couterpart MyTemporal_MEX, i.e. In the Matlab profiling output presented above, 1000 voxels were processed, hence the 1000 calls to Temporal_TA. It was implemented through a Matlab EXecutable (MEX) function (named MyTemporal_MEX) that calls the CUDA kernel. The CUDA implementation basically dedicates one thread to one voxel. As exactly the same operations have to be performed for each voxel, it was decided to implement a CUDA kernel to parallelize the time series analysis. Voxel time courses are looped over in MyTemporal.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
May 2023
Categories |