Persistent array

In computer science, and more precisely in data structure, a persistent array is a persistent data structure with properties similar to a (non-persistent) array. That is then, after a value's update in a persistent array, there exists two persistent arrays. One persistent array in which the update is taken into account, and one which is equal to the array before the update.

Difference between persistent arrays and arrays

An array $\mathrm {ar} =[e_{0},\dots ,e_{n-1}]$ is a data structure, with a fixed number n of elements $e_{0},\dots ,e_{n-1}$ . It is expected that, given the array ar and an index $0\leq i<n$ , the value $e_{i}$ can be retrieved quickly. This operation is called a look-up. Furthermore, given the array ar, an index $0\leq i<n$ and a new value v, a new array ar2 with content $[e_{0},\dots ,e_{i-1},v,e_{i+1},\dots ,e_{n-1}]$ can be created quickly. This operation is called an update. The main difference between persistent and non-persistent arrays being that, in non-persistent arrays, the array ar is destroyed during the creation of ar2.

For example, consider the following pseudocode.

array = [0, 0, 0]
updated_array = array.update(0, 8)
other_array = array.update(1, 3)
last_array = updated_array.update(2, 5)

At the end of execution, the value of array is still [0, 0, 0], the value of update_array is [8, 0, 0], the value of other_array is [0, 3, 0] and the value of last_array is [0, 8, 5].

There exists two kinds of persistent arrays. A persistent array may be either partially or fully persistent. A fully persistent array may be updated an arbitrary number of times while a partially persistent array may be updated at most once. In our previous example, if array were only partially persistent, the creation of other_array would be forbidden, however, the creation of last_array would still be valid. Indeed, updated_array is an array distinct from array and has never been updated before the creation of last_array.

Implementations

Many implementations of persistent arrays exists. In this section, the positive natural number n will always be the size of the persistent array.

Three implementations are discussed below. The first ones are the easiest one to implement, while the last ones are the more efficient.

Using purely functional data structures

The simplest implementation of a fully persistent array of size n consist in using an arbitrary persistent map, whose entry are the numbers from 0 to n-1. Such a data structure may be, for example, a balanced tree. However, looking up an element in such a data structure would take logarithmic time. One of the main interest of array is that operations are executed in constant time. While it is impossible to create a data structures in which every operations takes constant time,[1] the following operations would allow look-up to be more efficient, at least on the last version of the structures.

Using an array, and a tree of modifications

A fully persistent array may be implemented using an array and the so-called Backer's trick[2] This implementation is used in the OCaml module parray.ml[3] by Jean-Christophe Filliâtre.

In order to define this implementation, a few other definitions must be given. An initial array is an array which is not generated by an update on another array. A child of an array ar is an array of the form ar.update(i,v), and ar is the parent of ar.update(i,v). A descendant of an array ar is either ar or the descendant of a child of ar. The initial array of an array ar is either ar if ar is initial, or it is the initial array of the parent of ar. That is, the initial array of ar is the unique array init such that $\mathrm {ar} =init.update(i_{0},v_{0}).\dots .update(i_{m},v_{m})$ , with ar initial and $i_{0},\dots ,i_{m}$ an arbitrary sequence of indexes and $v_{0},\dots ,v_{m}$ an arbitrary sequence of value. A family of array is thus a set of arrays containing an initial array and all of its descendants. Finally, the tree of a family of arrays is the tree whose nodes are the arrays, and with an edge e from ar to each of its child ar.update(i,v).

A persistent array using the Backer's trick consists into a pair with an actual array called array and the tree of arrays. This tree admits an arbitrary root - not necessarily the initial array. The root may be moved to an arbitrary node of the tree. Changing the root from root to an arbitrary node ar takes time proportional in the depth of ar. That is, in the distance between root and ar. Similarly, looking up a value takes time proportional to the distance between the array and the root of its family. Thus, if the same array ar may be look-up multiple time, it is more efficient to move the root to ar before doing the look-up. Finally updating an array only takes constant time.

Technically, given two adjacent arrays ar1 and ar2, with ar1 closer to the root than ar2, the edge from ar1 to ar2 is labelled by (i,ar2[i]), where i the only position whose value differ between ar1 and ar2.

Accessing an element i of an array ar is done as follows. If ar is the root, then ar[i] equals root[i]. Otherwise, let e the edge leaving ar toward the root. If the label of e is (i,v) then ar[i] equals v. Otherwise, let ar2 be the other node of the edge e. Then ar[i] equals ar2[i]. The computation of ar2[i] being done recursively using the same definition.

The creation of ar.update(i,v) consists in adding a new node ar2 to the tree, and an edge e from ar to ar2 labelled by (i,v).

Finally, moving the root to a node ar is done as follows. If ar is already the root, there is nothing to do. Otherwise, let e the edge leaving ar toward the current root, (i,v) its label and ar2 the other end of e. Moving the root to ar is done by first moving the root to ar2, changing the label of e to (i, ar2[i]), and changing array[i] to v.

Log-log-time

There exists an implementation of fully persistent arrays such that look-up and updates can be done in $O(\log \log(\min(n,m))$ -time, and space $O(m+n)$ , with m the number of arrays and n the number of element in an array.[1] This implementation is optimal for look-up, according to the so-called cell-probe model.

Note however that this implementation is far more complex than the two mentioned above, and thus won't be described in this article.

References

Straka e, Milan (2013). Functional Data Structures and Algorithms. Prague. pp. 87–90.
Fillâtre, Jean-Christophe; Conchon, Sylvain (2007). A Persistent Union-find Data Structure (PDF). New York, NY, USA: ACM. pp. 37–46. ISBN 978-1-59593-676-9.
Filliâtre, Jean-Christophe. "Persistent-array implementation".

.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Straka-1] Straka e, Milan (2013). Functional Data Structures and Algorithms. Prague. pp. 87–90.

[ML-2] Fillâtre, Jean-Christophe; Conchon, Sylvain (2007). A Persistent Union-find Data Structure (PDF). New York, NY, USA: ACM. pp. 37–46. ISBN 978-1-59593-676-9.

[3] Filliâtre, Jean-Christophe. "Persistent-array implementation".