Use advantage of SSE2 instructions in gaussian blur node
authorSergey Sharybin <sergey.vfx@gmail.com>
Fri, 13 Jun 2014 18:30:13 +0000 (00:30 +0600)
committerSergey Sharybin <sergey.vfx@gmail.com>
Fri, 13 Jun 2014 18:38:07 +0000 (00:38 +0600)
commita87fb34edaf1a10f5527b6dc8a506a1c9ecbc683
tree06386145cbf7f9dcf6684b3a39722ed4d4e62c4d
parentb0708dd7189dfef21f7f9af5e98b0a7e1369e507
Use advantage of SSE2 instructions in gaussian blur node

This gives around 30% of speedup for gaussian blur node.

Pretty much straightforward implementation inside the node
itself, but needed to implement some additional things:

- Aligned malloc. It's needed to load data onto SSE registers
  faster. based on the aligned_malloc() from Libmv with
  some additional trickery going on to support arbitrary
  alignment (this magic is needed because of MemHead).

  In the practice only 16bit alignment is supported because
  of the lack of aligned malloc with arbitrary alignment
  for OSX. Not a bit deal for now because we need 16 bytes
  alignment at this moment only. Could be tweaked further
  later.

- Memory buffers in compositor are now aligned to 16 bytes.
  Should be harmless for non-SSE cases too. just mentioning.

Reviewers: campbellbarton, lukastoenne, jbakker

Reviewed By: campbellbarton

CC: lockal
Differential Revision: https://developer.blender.org/D564
12 files changed:
intern/guardedalloc/MEM_guardedalloc.h
intern/guardedalloc/intern/mallocn.c
intern/guardedalloc/intern/mallocn_guarded_impl.c
intern/guardedalloc/intern/mallocn_intern.h
intern/guardedalloc/intern/mallocn_lockfree_impl.c
source/blender/compositor/intern/COM_MemoryBuffer.cpp
source/blender/compositor/operations/COM_BlurBaseOperation.cpp
source/blender/compositor/operations/COM_BlurBaseOperation.h
source/blender/compositor/operations/COM_GaussianXBlurOperation.cpp
source/blender/compositor/operations/COM_GaussianXBlurOperation.h
source/blender/compositor/operations/COM_GaussianYBlurOperation.cpp
source/blender/compositor/operations/COM_GaussianYBlurOperation.h