A common need whenever NumPy is used to mediate the Python level access to another library
is to wrap the memory that the library creates using its own allocator into a NumPy array. This allows easy Python-side manipulation of the data already available without requiring an un-necessary copy. Fundamentally this is easy to do using PyArray_SimpleNewFromData. The C-level calling syntax is
int nd;
npy_intp *dims
void *data;
arr = PyArray_SimpleNewFromData(nd, dims, typenum, data);
In this code block, nd is the number of dimensions, dims is a C-array of integers describing the number of elements in each dimension of the array, typenum is the simple data-type of the NumPy array (e.g. NPY_DOUBLE), and data is the pointer to the memory that has been previously allocated.
By default, the memory for the NumPy array will be interpreted as a C-ordered contiguous array. If you need more control over the data-type or the striding of the array, then you can also use PyArray_NewFromDescr.
This is the simple part and code like this has been possible in Numeric for more than a decade. The tricky part, however, is memory management. How does the memory get deallocated? The suggestions have always been something similar to “make sure the memory doesn’t get deallocated before the NumPy
array disappears.” This is nice advice, but not generally helpful as it basically just tells you to create a memory leak.
All that NumPy does internally is to un-set a flag on the array object indicating that the array doesn’t own its memory pointer and so NumPy won’t free the memory when the last reference to the array disappears.
The key to managing memory correctly is to recognize that every NumPy array that doesn’t own its own memory can also point to a “base” object from which it obtained the memory. This base object is usually another NumPy array or an object exposing the buffer protocol — but it can be any object (even one we create on the fly). This object is DECREF’d when the NumPy array is deallocated and if the NumPy array contains the only reference to the object, then it will also be deallocated when the NumPy array is deallocated.
Thus, a good way to manage memory from another allocator is to create an instance of a new Python type. You then store the pointer to the memory (and anything else you may need to call the deallocator correctly) in the instance. Finally, you call the deallocator in the tp_dealloc function of the new Python type you’ve created. Then, you point the base member of your new NumPy array to the new object you’ve created.
The concept is relatively simple, but there are enough moving parts that an example is probably useful. Let’s say I want to create an extension module that only uses NumPy arrays allocated on 16-byte boundaries (maybe I’m experimenting with some SIMD instructions). I want to use arrays whose data is allocated using the aligned allocator defined below (borrowed from a patch to NumPy by David Cournapeau):
#include <errno.h>
#define uintptr_t size_t
#define _NOT_POWER_OF_TWO(n) (((n) & ((n) - 1)))
#define _UI(p) ((uintptr_t) (p))
#define _CP(p) ((char *) p)
#define _PTR_ALIGN(p0, alignment)
((void *) (((_UI(p0) + (alignment + sizeof(void*)))
& (~_UI(alignment - 1)))))
/* pointer must sometimes be aligned; assume sizeof(void*) is a power of two */
#define _ORIG_PTR(p) (*(((void **) (_UI(p) & (~_UI(sizeof(void*) - 1)))) - 1))
static void *_aligned_malloc(size_t size, size_t alignment)
{
void *p0, *p;
if (_NOT_POWER_OF_TWO(alignment)) {
errno = EINVAL;
return ((void *) 0);
}
if (size == 0) {
return ((void *) 0);
}
if (alignment < sizeof(void *)) {
alignment = sizeof(void *);
}
/* including the extra sizeof(void*) is overkill on a 32-bit
machine, since malloc is already 8-byte aligned, as long
as we enforce alignment >= 8 ...but oh well */
p0 = malloc(size + (alignment + sizeof(void *)));
if (!p0) {
return ((void *) 0);
}
p = _PTR_ALIGN(p0, alignment);
_ORIG_PTR(p) = p0;
return p;
}
static void _aligned_free(void *memblock)
{
if (memblock) {
free(NPY_ALIGNED_ORIG_PTR(memblock));
}
}
Now, to create arrays using this allocator we just need to allocate the needed memory and use SimpleNewFromData. Then we create a new object encapsulating the deallocation and set this as the base object of the ndarray.
int nd=2
npy_intp dims[2]={10,20};
size_t size;
PyObject newobj, arr=NULL;
void *mymem;
size = PyArray_MultiplyList(dims, nd);
mymem = _aligned_malloc(size, 16);
arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, mymem);
if (arr == NULL) goto fail;
newobj = PyObject_New(_MyDeallocObject, &_MyDealloc_Type);
if (newobj == NULL) goto fail;
((_MyDeallocObject *)newobj)->memory = mymem;
PyArray_BASE(arr) = newobj;
fail:
_aligned_free(size);
Py_XDECREF(arr);
Now, all that is missing is the code to create the new Python Type. That code is
typedef struct {
PyObject_HEAD
void *memory;
} _MyDeallocObject;
static void
_mydealloc_dealloc(_MyDeallocObject *self)
{
_aligned_free(self->memory);
self->ob_type->tp_free((PyObject *)self);
}
static PyTypeObject _myDeallocType = {
PyObject_HEAD_INIT(NULL)
0, /*ob_size*/
"mydeallocator", /*tp_name*/
sizeof(_MyDeallocObject), /*tp_basicsize*/
0, /*tp_itemsize*/
_mydealloc_dealloc, /*tp_dealloc*/
0, /*tp_print*/
0, /*tp_getattr*/
0, /*tp_setattr*/
0, /*tp_compare*/
0, /*tp_repr*/
0, /*tp_as_number*/
0, /*tp_as_sequence*/
0, /*tp_as_mapping*/
0, /*tp_hash */
0, /*tp_call*/
0, /*tp_str*/
0, /*tp_getattro*/
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT, /*tp_flags*/
"Internal deallocator object", /* tp_doc */
};
Don’t forget to add the following to the extension type initialization module in order to initialize the new Python type that has been created.
_MyDeallocType.tp_new = PyType_GenericNew;
if (PyType_Ready(&_MyDeallocType) < 0)
return;
This simple pattern should allow you to seamlessly integrate NumPy arrays with all kinds of memory allocation strategies. I think this pattern is common enough that we should probably add something to NumPy itself to make it easier to do this sort of thing in a few lines of code. Perhaps a new C-API call is justified with a new internal Python type that allows different allocators and deallocators to be used. Subscribe and post to the numpy-discussion@scipy.org list if you are interested in staying tuned.