Merge pull request #4 in ~DAOM/physics-layout from more-data-layout-tricks to master* commit '3919d312bd104548721e7bb2b60607d2681e4ac3':
AOSOA: Carefully re-work the separated subroutine headers for Intel
AOSOA: Add AOSOA layout to VERT_SEARCH, NASTY_EXPS and LU_SOLVER
AOSOA: Add an array-of-struct-of-array data layout under the LITE_LOOP
LU_SOLVER_COMPACT: Add new variant of LU_SOLVER with compact storage
Phys_driver: Fix printing of larger runtimes
PHYS: Separate...
Merge pull request #3 in ~DAOM/physics-layout from gnu-arm-c to master* commit '4ad36a3319237ff0f2e12a8e79379cc4853ae962':
Plots: Several tweaks to the plotting infrastructure
Plot: Add new draft for arch/cc comparison plots
Plot: Separating plotting utlities into separate script
Benchmark: Plot NPROMA-sweep for multiple kernels
Benchmark: Adding some rudimentary plotting capabilities
C: Add missing return statement to silence warnings
ARM: Experime...
LU_SOLVER_COMPACT: Add new variant of LU_SOLVER with compact storageWe're explicitly breaking the vectorization in this one, but we are
stashing the individual matrix components closer together.
This is really just to satisfy my curiosity...
Merge pull request #2 in ~DAOM/physics-layout from c-stream-example to master* commit 'ee572f563f64deec870b35937e3a3166d7922af8':
C: Small tweaks and a bug-fix for C_CONTIG-BLOCKED
Cray: Adding Cray compiler options to benchmark setup
Phys_kernel: Unifiying naming scheme and dropping obsolete routine
C: Add NASTY_EXPS translation
Fortran: Moving `in1 <- out` inside kernel and droppping nontemporal
VERT_SEARCH: Added C translation and marked indexing bug
NP...
Fortran: Moving `in1 <- out` inside kernel and droppping nontemporalAssignemnt moved for comparison fairness and the nontemporal pragma
seems to have less of an effect is pinning is done properly.
LITE_LOOP: Force parallel RNG array init and parallel out-copyThis avoids accidental NUMA issues when running single-ocket on
dual-socket machines and fixes a slowdown due to sequential copies.
Driver: Add generic driver with contiguous and nproma layoutsThe new driver now supports blocked and flat iterations over
contiguous or nproma(blocked) memory layouts.