-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Description
Currently, our implementation of the matmul ufunc is intelligent, and is able to pass appropriate transpose flags to BLAS to handle transposed contiguous arrays.
For A
, B
, and C
as contiguous 2D arrays, the inner loop is intelligent enough to map np.matmul(B.T, A.T, out=C.T)
to np.matmul(A, B, out=C)
:
numpy/numpy/core/src/umath/matmul.c.src
Lines 476 to 491 in 59a9752
/* matrix @ matrix */ | |
if (i1blasable && i2blasable && o_c_blasable) { | |
@TYPE@_matmul_matrixmatrix(ip1, is1_m, is1_n, | |
ip2, is2_n, is2_p, | |
op, os_m, os_p, | |
dm, dn, dp); | |
} else if (i1blasable && i2blasable && o_f_blasable) { | |
/* | |
* Use transpose equivalence: | |
* matmul(a, b, o) == matmul(b.T, a.T, o.T) | |
*/ | |
@TYPE@_matmul_matrixmatrix(ip2, is2_p, is2_n, | |
ip1, is1_n, is1_m, | |
op, os_p, os_m, | |
dp, dn, dm); | |
} else { |
However when the out
argument is omitted, the ufunc machinery pre-allocates out
with "C" memory ordering, which is not the "F" ordering that C.T
has. Ideally, we'd be able to allocate our array such that we can make o_c_blasable
or o_f_blasable
true as necessary.
As part of @seberg's ufunc work, it would be great if ufuncs could be involved in the output allocation machinery.