即将召开的High-Performance Graphics 2011上有篇文章叫作High-Performance Software Rasterization on GPUs,作者是NVIDIA Research的Samuli Laine和Tero Karras(没错,Efficient Sparse Voxel Octrees也是他们)。他们在GPU上用CUDA构造出了一个软件光栅化组件,光栅化算法类似于Larrabee(以及开源软件渲染器SALVIA):先把primitive分到多个bin中,然后subdivide成像素级别的大小。不过它把subd阶段分为Coarse和Fine两个阶段,用不同的线程粒度完成。Larrabee里面也有这么分的,不过是连着做完,不是两个独立的阶段。
对光栅化算法感兴趣的同学不妨看看此文。
作者主页
PDF下载
CUDA 4.0 RC1只有注册人员才能下载,今天NVIDIA放出了CUDA 4.0 RC2,任何人都可以下载了。
更多关于CUDA 4.0的新功能,请看我上个月的帖子:CUDA 4.0真技术解析。
上周的帖子刚提到NVIDIA宣布了CUDA 4,昨天就收到NV的邮件说CUDA 4.0 RC可以下载了。developer注册用户可以从http://developer.nvidia.com/object/cuda_4_0_RC_downloads.html找到。
本来不打算说什么,碰巧在某网站看到了一篇所谓的“新特性解析”,典型的一个不懂技术的小编装懂地写软文。所以我不得不在这里拨乱反正,以免国内读者受其误导。
CUDA 4.0的更新主要集中在三方面
简化并行程序的移植
加速多GPU编程
更好的工具链支持
简化并行程序移植
在CUDA(其实还有AMD的stream)出来之前,并行程序移植GPU只能直接用shader,限制诸多,代码不灵活,基本算重写,而不是移植。有了CUDA之后,情况有所好转。在CUDA 4.0下, ...
Today NVIDIA announced the upcoming 4.0 release of CUDA. While most of the major CUDA releases accompanied a new GPU architecture, 4.0 is a software-only release, but that doesn’t mean there aren’t a lot of new features. With this release, NVIDIA is aiming to lower the barrier to entry to parallel programming on GPUs, with new features including easier multi-GPU programming, a unified virtual memory address space, the powerful Thrust C++ template library, and automatic performance analysis in the Visual Profiler tool. Full details follow in the quoted press release below.
SANTA CLARA, ...
From http://developer.nvidia.com/object/gpu-ai-board-games.html
This technology preview is a snapshot of some internal research we have been working on and talking about at various conferences for the past couple years. The level of interest in GPU-accelerated AI has continued to grow, so we are making this (unsupported) snapshot available for developers who would like to experiment with the technology.
The software provided in this technology preview supports GPU accelerated game tree search of both the pruning and backtracking styles. While this technology primaril ...