LATEST CHATTY HEADER
Subscribe to Shacknews Mercury starting at $1/month!
Chrome Shack Community Guidelines Chatty Search
Scroll down to join the conversation.
New to Shacknews? Signup for a Free Account
Already have an account? Login Now
Subscribe to Shacknews Mercury starting at $1/month!
Chrome Shack Community Guidelines Chatty Search
Scroll down to join the conversation.
http://www.techreport.com/discussions.x/19216
tl;dr = They're simply taking the piss with Physx at this point.
Thread Truncated. Click to see all 81 replies.
Writing code that uses SSE instructions requires hand coding assembly that leverages the SSE instruction set. Because not all CPUs have SSE, you need to write a slow path.
Because some CPUs work faster with different variants of SSE, you need to optimally provide multiple versions of SSE functions, one for each different SSE instruction set that you want to support.
Also, SSE imposes restrictions on where data needs to be laid out in memory. In particular, SSE loads and stores mostly require (to stay on the fast path) that your data be aligned on 128-byte boundaries. Unfortunately, there is no cross-platform malloc that guarentees alignment, so you have to write a custom allocator that does internal fragmentation by allocating additional memory to ensure you can hit the different alignment requirements.
Then you need to debug all of this, your slow path, and the GPU path. And after all of this, you're still going to be several orders of magnitude slower on the CPU than you are on the GPU when you have large numbers of objects.
It's not as simple as "lol fucking add threads and SSE dumbass".
The post has been reported. Thank you!
You must be logged in to post.
You must be logged in to post.