[Subversion] View of /Trellis/Collections.txt

View of /Trellis/Collections.txt

Parent Directory

Revision Log

Revision: 2545 - (download)
Fri May 23 16:48:40 2008 UTC (15 years, 10 months ago) by pje
File size: 12422 byte(s)

Update docs for release

========================================= Event-Driven Collections with the Trellis ========================================= The ``peak.events.collections`` module includes some additional data structures for use with the Trellis:: >>> from peak.events import trellis, collections .. contents:: **Table of Contents** SortedSet --------- ``trellis.List`` objects have some inherent inefficiencies due to the wide variety of operations supported by Python lists. While ``trellis.Set`` and ``trellis.Dict`` objects update themselves in place by applying change logs, ``trellis.List`` has to use a copy-on-write strategy to manage updates, because there isn't any simple way to reduce operations like ``sort()``, ``reverse()``, ``remove()``, etc. to a meaningful change log. (That's why it only provides a simple ``changed`` flag.) So if you need to use large lists in an application, you may be better off using a different data structure design. That way, if you only need a subset of the list interface, you can implement a changelog-based structure. For example, the Trellis package includes a ``SortedSet`` type that maintains an index of items sorted by keys, with a cell that lists changed regions. A ``collections.SortedSet`` is a specialized component that lets you wrap a ``trellis.Set`` (or anything similar to one, that offers iteration plus ``added`` and ``removed`` cells) with a sort key, to get a list:: >>> myItems = trellis.Set([3,2,1]) >>> myIndex = collections.SortedSet(data=myItems) >>> class Viewer(trellis.Component): ... model = trellis.attr(None) ... ... trellis.perform() ... def view_it(self): ... if self.model is not None: ... print self.model >>> view = Viewer(model=myIndex) [1, 2, 3] You can change the ``sort_key`` attribute to re-sort the items:: >>> myIndex.sort_key = lambda x: -x [3, 2, 1] ``SortedSet`` objects also have a discrete rule attribute called ``changes``. It's normally empty, but during the recalculation triggered by a change to the underlying set, it will temporarily contain one or more ``(start, end, size)`` tuples, representing the effective changes to the old list. But as with all discrete or receiver attributes, you'll never see a value in it from non-rule code:: >>> myIndex.changes [] Only in rule code will you ever see it containing values, a moment before it becomes empty again:: >>> view.model = None # quiet, please >>> class Watcher(trellis.Component): ... trellis.perform() ... def dump(self): ... print myIndex.changes >>> watcher=Watcher() [] >>> myItems.remove(2) [(1, 2, 0)] [] >>> myIndex [3, 1] The ``changed`` values describe the changes made to the index contents. In this case, ``(1, 2, 0)`` means that the list slice ``1:2`` was reduced to length zero. Compare with the effect of re-inserting 2:: >>> myItems.add(2) [(1, 1, 1)] [] >>> myIndex [3, 2, 1] This value means the slice ``1:1`` was increased to length 1. In a sense, you can view the provided changes as being "set slice" operators. Note, too, that it's possible to have multiple slices if multiple items are added or deleted:: >>> def add_some(): ... myItems.add(0) ... myItems.add(4) >>> trellis.atomically(add_some) [(3, 3, 1), (0, 0, 1)] [] >>> myIndex [4, 3, 2, 1, 0] >>> add_some() As you can see, 1 item was inserted at position 3 in the old list, followed by 1 item being inserted at position 0. In other words, if you loop over the ``changes`` attribute and apply the slice operations in that order, you can successfully update another sequence to match the changed sequence. Finally, note that adjacent operations may be merged into single slices:: >>> def del_some(): ... myItems.remove(2) ... myItems.remove(3) >>> trellis.atomically(del_some) [(1, 3, 0)] [] >>> myIndex [4, 1, 0] And that insertion+deletion at the same position can lead to change slices that don't result in a net change to the number of rows:: >>> def add_and_del(): ... myItems.remove(1) ... myItems.add(2) >>> trellis.atomically(add_and_del) [(1, 2, 1)] [] >>> myIndex [4, 2, 0] >>> def add_and_del(): ... myItems.remove(2) ... myItems.add(1) >>> trellis.atomically(add_and_del) [(1, 2, 1)] [] >>> myIndex [4, 1, 0] And changing the sort key always results in a change slice for the entire index:: >>> myIndex.sort_key = lambda x:x [(0, 3, 3)] [] >>> myIndex [0, 1, 4] As does setting or clearing the ``reverse`` flag (which reverses the effect of the sort key):: >>> myIndex.reverse = True [(0, 3, 3)] [] >>> myIndex [4, 1, 0] >>> myItems.add(7) [(0, 0, 1)] [] >>> myIndex [7, 4, 1, 0] >>> myItems.add(2) [(2, 2, 1)] [] >>> myIndex [7, 4, 2, 1, 0] >>> myItems.remove(7) [(0, 1, 0)] [] >>> myItems.remove(2) [(1, 2, 0)] [] >>> myIndex [4, 1, 0] >>> myIndex.reverse = False [(0, 3, 3)] [] >>> myIndex [0, 1, 4] TODO: test changes to ``.data`` SubSet ------ The ``SubSet`` type lets you create a ``Set`` that is constrained to be a subset of another set. If items are removed from the ``base`` set, the same items are removed from the ``SubSet``:: >>> s1 = trellis.Set([1,2,3]) >>> ss = collections.SubSet([2], base = s1) >>> ss SubSet([2]) >>> s1.remove(2) >>> ss SubSet([]) Adding an item to the ``SubSet`` when the item is not in the ``base`` set has no effect:: >>> ss.add(4) >>> ss SubSet([]) But you *can* add items that are currently in the base set:: >>> ss.add(3) >>> ss SubSet([3]) >>> s1.remove(3) >>> ss SubSet([]) TODO: changes to ``base`` do not affect the current subset membership Observing --------- The ``Observing`` type is designed to assist in sending change notifications to GUI components that need to be told when a portion of the display is out of date. It works by taking a set of keys and a lookup function, and providing a ``changes`` attribute. Usually the lookup function will be an accessor on some trellis-ized data structure:: >>> data = trellis.List([1,2,3,4,5]) >>> o = collections.Observing(lookup_func = data.__getitem__) >>> def show(): ... print "Changes:", o.changes >>> c = trellis.Performer(show) Changes: {} The ``changes`` attribute is a dictionary (a regular dictionary, not a ``Dict``) that maps the observing instance's ``keys`` to ``(new, old)`` value tuples. Whenever a change occurs to either the ``keys`` or their corresponding values, it becomes non-empty, and then reverts to being empty:: >>> o.keys.update([1,2,4]) Changes: {1: (2, 2), 2: (3, 3), 4: (5, 5)} Changes: {} In the above example, the "new" and "old" values are the same, because the keys shown were all just added to the ``keys`` set. However, if a value changes for a key that's already present in ``keys``, the "new" and "old" values will be different:: >>> data[1] = -2 Changes: {1: (-2, 2)} Changes: {} >>> o.keys.remove(2) # removing key doesn't induce "changes" >>> o.keys.add(3) # but adding does Changes: {3: (4, 4)} Changes: {} >>> data.reverse() # and so does any change on the underlying data Changes: {1: (4, -2), 3: (-2, 4), 4: (1, 5)} Changes: {} The basic idea for using an ``Observing`` object is to update the ``keys`` to reflect the currently-visible row or column numbers (or other keys) of a data structure being displayed in a GUI. Then, a ``@perform`` rule that reads the ``changes`` can notify the GUI of any changes that happen to the underlying data structure. You can, of course, do this without an ``Observing`` instance, but recalculations would be prohibitively expensive if the underlying data set is large, compared to the portion being displayed. (E.g., when scrolling through an entire database.) So GUI applications that display grids or tables will typically use a ``SortedSet`` to find out about rows or columns being inserted, deleted, or replaced, and an ``Observing`` instance to find out about changes *within* the rows or columns. TODO: ``changes`` doesn't update if you initialize ``keys`` with a populated set. Hub (NEW in 0.7a2) ------------------ A ``collections.Hub`` is used for loosely-coupled many-to-many communications with flexible pattern matching -- aka "publish/subscribe" or "pub/sub" messaging:: >>> hub = collections.Hub() You can send messages into a hub by calling its ``put()`` method:: >>> hub.put(1, 2, 3) However, this does nothing unless there are rules using the hub's ``get()`` method to receive these messages:: >>> def watch_3_3(): ... for message in hub.get(None, None, 3): ... print message >>> watch_3_3 = trellis.Performer(watch_3_3) >>> hub.put(1, 2, 3) (1, 2, 3) The ``put()`` and ``get()`` methods both accept an arbitrary number of positional arguments, but ``get()`` will only match ``put()`` calls with the same number of arguments:: >>> hub.put('x', 'y') >>> hub.put(1, 2, 3, 4) And then, only if the non-``None`` arguments to ``get()`` match the corresponding arguments given to ``put``:: >>> hub.put(1, 2, 4) >>> hub.put(5, 4, 3) (5, 4, 3) You can of course have multiple rules monitoring the same hub:: >>> def watch_2_4(): ... for message in hub.get(2, 4, None): ... print "24:", message >>> watch_2_4 = trellis.Performer(watch_2_4) >>> hub.put(2,4,3) 24: (2, 4, 3) (2, 4, 3) >>> hub.put(2, 4, 4) 24: (2, 4, 4) And you can send more than one value in a single recalculation or atomic action, with the relative order of messages being preserved for each observer:: >>> def send_many(): ... hub.put(1, 2, 3) ... hub.put(2, 4, 4) ... hub.put(2, 4, 3) >>> trellis.atomically(send_many) 24: (2, 4, 4) 24: (2, 4, 3) (1, 2, 3) (2, 4, 3) Note, however, that all arguments to ``put()`` and ``get()`` must be hashable:: >>> hub.put(1, []) Traceback (most recent call last): ... TypeError: list objects are unhashable >>> hub.get(1, []) Traceback (most recent call last): ... TypeError: list objects are unhashable This is because hubs use a dictionary-based indexing system, that avoids the need to test every message against every observer's match pattern. Each active ``get()`` pattern is saved under an index, keyed by its rightmost non-``None`` value. Each value in a message is then looked up in this index, and then tested against that (hopefully small) subset of active patterns. For example, if we look at the contents of our sample hub's index, we can see that the ``(None, None, 3)`` match pattern is indexed under "position 2, value 3", and the ``(2, 4, None)`` pattern is indexed under "position 1, value 4":: >>> hub._index {(2, 3): {(None, None, 3): 1}, (1, 4): {(2, 4, None): 1}} This means that ``(2, 4, None)`` will only be checked for messages with a 4 in the second position, and ``(None, None, 3)`` will only be checked for messages with a 3 in the third position (which of course it will always match). So, for best performance in high-volume applications, make sure you design your messages to place "more distinct" fields further to the right. For example, if you have a small number of distinct message types, you should probably make the message type the first field, so that if a ``get()`` matches on both the message type and some more-distinctive field, it will be indexed only on the more-distinctive field, avoiding it being matched against every message of the desired type. (Unless of course, the ``get()`` is *supposed* to return all messages of the desired type!) In contrast, if you placed the message type as the last field, then any ``get()`` targeting a particular message type would incur a match-time penalty for *every* message of that type. Thus, you should place fields with fewer possible values more to the left, and fields with a larger number of possible values more to the right.

View of /Trellis/Collections.txt

ViewCVS and CVS Help