vendor/im-rc/src/lib.rs - toolchain/rustc - Git at Google

 // This Source Code Form is subject to the terms of the Mozilla Public
 // License, v. 2.0. If a copy of the MPL was not distributed with this
 // file, You can obtain one at http://mozilla.org/MPL/2.0/.

 //! # Immutable Data Structures for Rust
 //!
 //! This library implements several of the more commonly useful immutable data
 //! structures for Rust.
 //!
 //! ## What are immutable data structures?
 //!
 //! Immutable data structures are data structures which can be copied and
 //! modified efficiently without altering the original. The most uncomplicated
 //! example of this is the venerable [cons list][cons-list]. This crate offers a
 //! selection of more modern and flexible data structures with similar
 //! properties, tuned for the needs of Rust developers.
 //!
 //! Briefly, the following data structures are provided:
 //!
 //! * [Vectors][vector::Vector] based on [RRB trees][rrb-tree]
 //! * [Hash maps][hashmap::HashMap]/[sets][hashset::HashSet] based on [hash
 //!   array mapped tries][hamt]
 //! * [Ordered maps][ordmap::OrdMap]/[sets][ordset::OrdSet] based on
 //!   [B-trees][b-tree]
 //!
 //! ## Why Would I Want This?
 //!
 //! While immutable data structures can be a game changer for other
 //! programming languages, the most obvious benefit - avoiding the
 //! accidental mutation of data - is already handled so well by Rust's
 //! type system that it's just not something a Rust programmer needs
 //! to worry about even when using data structures that would send a
 //! conscientious Clojure programmer into a panic.
 //!
 //! Immutable data structures offer other benefits, though, some of
 //! which are useful even in a language like Rust. The most prominent
 //! is *structural sharing*, which means that if two data structures
 //! are mostly copies of each other, most of the memory they take up
 //! will be shared between them. This implies that making copies of an
 //! immutable data structure is cheap: it's really only a matter of
 //! copying a pointer and increasing a reference counter, where in the
 //! case of [`Vec`][std::vec::Vec] you have to allocate the same
 //! amount of memory all over again and make a copy of every element
 //! it contains. For immutable data structures, extra memory isn't
 //! allocated until you modify either the copy or the original, and
 //! then only the memory needed to record the difference.
 //!
 //! Another goal of this library has been the idea that you shouldn't
 //! even have to think about what data structure to use in any given
 //! situation, until the point where you need to start worring about
 //! optimisation - which, in practice, often never comes. Beyond the
 //! shape of your data (ie. whether to use a list or a map), it should
 //! be fine not to think too carefully about data structures - you can
 //! just pick the one that has the right shape and it should have
 //! acceptable performance characteristics for every operation you
 //! might need. Specialised data structures will always be faster at
 //! what they've been specialised for, but `im` aims to provide the
 //! data structures which deliver the least chance of accidentally
 //! using them for the wrong thing.
 //!
 //! For instance, [`Vec`][std::vec::Vec] beats everything at memory
 //! usage, indexing and operations that happen at the back of the
 //! list, but is terrible at insertion and removal, and gets worse the
 //! closer to the front of the list you get.
 //! [`VecDeque`][std::collections::VecDeque] adds a little bit of
 //! complexity in order to make operations at the front as efficient
 //! as operations at the back, but is still bad at insertion and
 //! especially concatenation. [`Vector`][vector::Vector] adds another
 //! bit of complexity, and could never match [`Vec`][std::vec::Vec] at
 //! what it's best at, but in return every operation you can throw at
 //! it can be completed in a reasonable amount of time - even normally
 //! expensive operations like copying and especially concatenation are
 //! reasonably cheap when using a [`Vector`][vector::Vector].
 //!
 //! It should be noted, however, that because of its simplicity,
 //! [`Vec`][std::vec::Vec] actually beats [`Vector`][vector::Vector] even at its
 //! strongest operations at small sizes, just because modern CPUs are
 //! hyperoptimised for things like copying small chunks of contiguous memory -
 //! you actually need to go past a certain size (usually in the vicinity of
 //! several hundred elements) before you get to the point where
 //! [`Vec`][std::vec::Vec] isn't always going to be the fastest choice.
 //! [`Vector`][vector::Vector] attempts to overcome this by actually just being
 //! an array at very small sizes, and being able to switch efficiently to the
 //! full data structure when it grows large enough. Thus,
 //! [`Vector`][vector::Vector] will actually be equivalent to
 //! [Vec][std::vec::Vec] until it grows past the size of a single chunk.
 //!
 //! The maps - [`HashMap`][hashmap::HashMap] and
 //! [`OrdMap`][ordmap::OrdMap] - generally perform similarly to their
 //! equivalents in the standard library, but tend to run a bit slower
 //! on the basic operations ([`HashMap`][hashmap::HashMap] is almost
 //! neck and neck with its counterpart, while
 //! [`OrdMap`][ordmap::OrdMap] currently tends to run 2-3x slower). On
 //! the other hand, they offer the cheap copy and structural sharing
 //! between copies that you'd expect from immutable data structures.
 //!
 //! In conclusion, the aim of this library is to provide a safe
 //! default choice for the most common kinds of data structures,
 //! allowing you to defer careful thinking about the right data
 //! structure for the job until you need to start looking for
 //! optimisations - and you may find, especially for larger data sets,
 //! that immutable data structures are still the right choice.
 //!
 //! ## Values
 //!
 //! Because we need to make copies of shared nodes in these data structures
 //! before updating them, the values you store in them must implement
 //! [`Clone`][std::clone::Clone].  For primitive values that implement
 //! [`Copy`][std::marker::Copy], such as numbers, everything is fine: this is
 //! the case for which the data structures are optimised, and performance is
 //! going to be great.
 //!
 //! On the other hand, if you want to store values for which cloning is
 //! expensive, or values that don't implement [`Clone`][std::clone::Clone], you
 //! need to wrap them in [`Rc`][std::rc::Rc] or [`Arc`][std::sync::Arc]. Thus,
 //! if you have a complex structure `BigBlobOfData` and you want to store a list
 //! of them as a `Vector<BigBlobOfData>`, you should instead use a
 //! `Vector<Rc<BigBlobOfData>>`, which is going to save you not only the time
 //! spent cloning the big blobs of data, but also the memory spent keeping
 //! multiple copies of it around, as [`Rc`][std::rc::Rc] keeps a single
 //! reference counted copy around instead.
 //!
 //! If you're storing smaller values that aren't
 //! [`Copy`][std::marker::Copy]able, you'll need to exercise judgement: if your
 //! values are going to be very cheap to clone, as would be the case for short
 //! [`String`][std::string::String]s or small [`Vec`][std::vec::Vec]s, you're
 //! probably better off storing them directly without wrapping them in an
 //! [`Rc`][std::rc::Rc], because, like the [`Rc`][std::rc::Rc], they're just
 //! pointers to some data on the heap, and that data isn't expensive to clone -
 //! you might actually lose more performance from the extra redirection of
 //! wrapping them in an [`Rc`][std::rc::Rc] than you would from occasionally
 //! cloning them.
 //!
 //! ### When does cloning happen?
 //!
 //! So when will your values actually be cloned? The easy answer is only if you
 //! [`clone`][std::clone::Clone::clone] the data structure itself, and then only
 //! lazily as you change it. Values are stored in tree nodes inside the data
 //! structure, each node of which contains up to 64 values. When you
 //! [`clone`][std::clone::Clone::clone] a data structure, nothing is actually
 //! copied - it's just the reference count on the root node that's incremented,
 //! to indicate that it's shared between two data structures. It's only when you
 //! actually modify one of the shared data structures that nodes are cloned:
 //! when you make a change somewhere in the tree, the node containing the change
 //! needs to be cloned, and then its parent nodes need to be updated to contain
 //! the new child node instead of the old version, and so they're cloned as
 //! well.
 //!
 //! We can call this "lazy" cloning - if you make two copies of a data structure
 //! and you never change either of them, there's never any need to clone the
 //! data they contain. It's only when you start making changes that cloning
 //! starts to happen, and then only on the specific tree nodes that are part of
 //! the change. Note that the implications of lazily cloning the data structure
 //! extend to memory usage as well as the CPU workload of copying the data
 //! around - cloning an immutable data structure means both copies share the
 //! same allocated memory, until you start making changes.
 //!
 //! Most crucially, if you never clone the data structure, the data inside it is
 //! also never cloned, and in this case it acts just like a mutable data
 //! structure, with minimal performance differences (but still non-zero, as we
 //! still have to check for shared nodes).
 //!
 //! ## Data Structures
 //!
 //! We'll attempt to provide a comprehensive guide to the available
 //! data structures below.
 //!
 //! ### Performance Notes
 //!
 //! "Big O notation" is the standard way of talking about the time
 //! complexity of data structure operations. If you're not familiar
 //! with big O notation, here's a quick cheat sheet:
 //!
 //! *O(1)* means an operation runs in constant time: it will take the
 //! same time to complete regardless of the size of the data
 //! structure.
 //!
 //! *O(n)* means an operation runs in linear time: if you double the
 //! size of your data structure, the operation will take twice as long
 //! to complete; if you quadruple the size, it will take four times as
 //! long, etc.
 //!
 //! *O(log n)* means an operation runs in logarithmic time: for
 //! *log<sub>2</sub>*, if you double the size of your data structure,
 //! the operation will take one step longer to complete; if you
 //! quadruple the size, it will need two steps more; and so on.
 //! However, the data structures in this library generally run in
 //! *log<sub>64</sub>* time, meaning you have to make your data
 //! structure 64 times bigger to need one extra step, and 4096 times
 //! bigger to need two steps. This means that, while they still count
 //! as O(log n), operations on all but really large data sets will run
 //! at near enough to O(1) that you won't usually notice.
 //!
 //! *O(n log n)* is the most expensive operation you'll see in this
 //! library: it means that for every one of the *n* elements in your
 //! data structure, you have to perform *log n* operations. In our
 //! case, as noted above, this is often close enough to O(n) that it's
 //! not usually as bad as it sounds, but even O(n) isn't cheap and the
 //! cost still increases logarithmically, if slowly, as the size of
 //! your data increases. O(n log n) basically means "are you sure you
 //! need to do this?"
 //!
 //! *O(1)** means 'amortised O(1),' which means that an operation
 //! usually runs in constant time but will occasionally be more
 //! expensive: for instance,
 //! [`Vector::push_back`][vector::Vector::push_back], if called in
 //! sequence, will be O(1) most of the time but every 64th time it
 //! will be O(log n), as it fills up its tail chunk and needs to
 //! insert it into the tree. Please note that the O(1) with the
 //! asterisk attached is not a common notation; it's just a convention
 //! I've used in these docs to save myself from having to type
 //! 'amortised' everywhere.
 //!
 //! ### Lists
 //!
 //! Lists are sequences of single elements which maintain the order in
 //! which you inserted them. The only list in this library is
 //! [`Vector`][vector::Vector], which offers the best all round
 //! performance characteristics: it's pretty good at everything, even
 //! if there's always another kind of list that's better at something.
 //!
 //! | Type | Algorithm | Constraints | Order | Push | Pop | Split | Append | Lookup |
 //! | --- | --- | --- | --- | --- | --- | --- |
 //! | [`Vector<A>`][vector::Vector] | [RRB tree][rrb-tree] | [`Clone`][std::clone::Clone] | insertion | O(1)* | O(1)* | O(log n) | O(log n) | O(log n) |
 //!
 //! ### Maps
 //!
 //! Maps are mappings of keys to values, where the most common read
 //! operation is to find the value associated with a given key. Maps
 //! may or may not have a defined order. Any given key can only occur
 //! once inside a map, and setting a key to a different value will
 //! overwrite the previous value.
 //!
 //! | Type | Algorithm | Key Constraints | Order | Insert | Remove | Lookup |
 //! | --- | --- | --- | --- | --- | --- |
 //! | [`HashMap<K, V>`][hashmap::HashMap] | [HAMT][hamt] | [`Clone`][std::clone::Clone] + [`Hash`][std::hash::Hash] + [`Eq`][std::cmp::Eq] | undefined | O(log n) | O(log n) | O(log n) |
 //! | [`OrdMap<K, V>`][ordmap::OrdMap] | [B-tree][b-tree] | [`Clone`][std::clone::Clone] + [`Ord`][std::cmp::Ord] | sorted | O(log n) | O(log n) | O(log n) |
 //!
 //! ### Sets
 //!
 //! Sets are collections of unique values, and may or may not have a
 //! defined order. Their crucial property is that any given value can
 //! only exist once in a given set.
 //!
 //! | Type | Algorithm | Constraints | Order | Insert | Remove | Lookup |
 //! | --- | --- | --- | --- | --- | --- |
 //! | [`HashSet<A>`][hashset::HashSet] | [HAMT][hamt] | [`Clone`][std::clone::Clone] + [`Hash`][std::hash::Hash] + [`Eq`][std::cmp::Eq] | undefined | O(log n) | O(log n) | O(log n) |
 //! | [`OrdSet<A>`][ordset::OrdSet] | [B-tree][b-tree] | [`Clone`][std::clone::Clone] + [`Ord`][std::cmp::Ord] | sorted | O(log n) | O(log n) | O(log n) |
 //!
 //! ## In-place Mutation
 //!
 //! All of these data structures support in-place copy-on-write
 //! mutation, which means that if you're the sole user of a data
 //! structure, you can update it in place without taking the
 //! performance hit of making a copy of the data structure before
 //! modifying it (this is about an order of magnitude faster than
 //! immutable operations, almost as fast as
 //! [`std::collections`][std::collections]'s mutable data structures).
 //!
 //! Thanks to [`Rc`][std::rc::Rc]'s reference counting, we are able to
 //! determine whether a node in a data structure is being shared with
 //! other data structures, or whether it's safe to mutate it in place.
 //! When it's shared, we'll automatically make a copy of the node
 //! before modifying it. The consequence of this is that cloning a
 //! data structure becomes a lazy operation: the initial clone is
 //! instant, and as you modify the cloned data structure it will clone
 //! chunks only where you change them, so that if you change the
 //! entire thing you will eventually have performed a full clone.
 //!
 //! This also gives us a couple of other optimisations for free:
 //! implementations of immutable data structures in other languages
 //! often have the idea of local mutation, like Clojure's transients
 //! or Haskell's `ST` monad - a managed scope where you can treat an
 //! immutable data structure like a mutable one, gaining a
 //! considerable amount of performance because you no longer need to
 //! copy your changed nodes for every operation, just the first time
 //! you hit a node that's sharing structure. In Rust, we don't need to
 //! think about this kind of managed scope, it's all taken care of
 //! behind the scenes because of our low level access to the garbage
 //! collector (which, in our case, is just a simple
 //! [`Rc`][std::rc::Rc]).
 //!
 //! ## Thread Safety
 //!
 //! The data structures in the `im` crate are thread safe, through
 //! [`Arc`][std::sync::Arc]. This comes with a slight performance impact, so
 //! that if you prioritise speed over thread safety, you may want to use the
 //! `im-rc` crate instead, which is identical to `im` except that it uses
 //! [`Rc`][std::rc::Rc] instead of [`Arc`][std::sync::Arc], implying that the
 //! data structures in `im-rc` do not implement [`Send`][std::marker::Send] and
 //! [`Sync`][std::marker::Sync]. This yields approximately a 20-25% increase in
 //! general performance.
 //!
 //! ## Feature Flags
 //!
 //! `im` comes with optional support for the following crates through Cargo
 //! feature flags. You can enable them in your `Cargo.toml` file like this:
 //!
 //! ```no_compile
 //! [dependencies]
 //! im = { version = "*", features = ["proptest", "serde"] }
 //! ```
 //!
 //! | Feature | Description |
 //! | ------- | ----------- |
 //! | [`proptest`](https://crates.io/crates/proptest) | Strategies for all `im` datatypes under a `proptest` namespace, eg. `im::vector::proptest::vector()` |
 //! | [`quickcheck`](https://crates.io/crates/quickcheck) | `Arbitrary` implementations for all `im` datatypes (not available in `im-rc`) |
 //! | [`rayon`](https://crates.io/crates/rayon) | parallel iterator implementations for `Vector` (not available in `im-rc`) |
 //! | [`serde`](https://crates.io/crates/serde) | `Serialize` and `Deserialize` implementations for all `im` datatypes |
 //!
 //! [std::collections]: https://doc.rust-lang.org/std/collections/index.html
 //! [std::collections::VecDeque]: https://doc.rust-lang.org/std/collections/struct.VecDeque.html
 //! [std::vec::Vec]: https://doc.rust-lang.org/std/vec/struct.Vec.html
 //! [std::string::String]: https://doc.rust-lang.org/std/string/struct.String.html
 //! [std::rc::Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 //! [std::sync::Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html
 //! [std::cmp::Eq]: https://doc.rust-lang.org/std/cmp/trait.Eq.html
 //! [std::cmp::Ord]: https://doc.rust-lang.org/std/cmp/trait.Ord.html
 //! [std::clone::Clone]: https://doc.rust-lang.org/std/clone/trait.Clone.html
 //! [std::clone::Clone::clone]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone
 //! [std::marker::Copy]: https://doc.rust-lang.org/std/marker/trait.Copy.html
 //! [std::hash::Hash]: https://doc.rust-lang.org/std/hash/trait.Hash.html
 //! [std::marker::Send]: https://doc.rust-lang.org/std/marker/trait.Send.html
 //! [std::marker::Sync]: https://doc.rust-lang.org/std/marker/trait.Sync.html
 //! [hashmap::HashMap]: ./hashmap/struct.HashMap.html
 //! [hashset::HashSet]: ./hashset/struct.HashSet.html
 //! [ordmap::OrdMap]: ./ordmap/struct.OrdMap.html
 //! [ordset::OrdSet]: ./ordset/struct.OrdSet.html
 //! [vector::Vector]: ./vector/enum.Vector.html
 //! [vector::Vector::push_back]: ./vector/enum.Vector.html#method.push_back
 //! [rrb-tree]: https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf
 //! [hamt]: https://en.wikipedia.org/wiki/Hash_array_mapped_trie
 //! [b-tree]: https://en.wikipedia.org/wiki/B-tree
 //! [cons-list]: https://en.wikipedia.org/wiki/Cons#Lists

 #![deny(unsafe_code)]
 #![cfg_attr(has_specialisation, feature(specialization))]
 #![cfg_attr(docs_rs_workaround, feature(slice_get_slice))]

 extern crate typenum;

 #[cfg(test)]
 #[macro_use]
 extern crate pretty_assertions;

 #[cfg(feature = "quickcheck")]
 extern crate quickcheck;

 #[cfg(any(test, feature = "proptest"))]
 #[macro_use]
 extern crate proptest;

 #[cfg(feature = "proptest")]
 proptest! {}

 #[cfg(any(test, feature = "serde"))]
 extern crate serde;
 #[cfg(test)]
 extern crate serde_json;

 #[cfg(all(threadsafe, any(test, feature = "rayon")))]
 extern crate rayon;

 #[cfg(test)]
 extern crate metrohash;

 mod config;
 mod nodes;
 mod sort;
 mod sync;
 mod util;

 #[macro_use]
 mod ord;
 pub use ord::map as ordmap;
 pub use ord::set as ordset;

 #[macro_use]
 mod hash;
 pub use hash::map as hashmap;
 pub use hash::set as hashset;

 #[macro_use]
 pub mod vector;

 pub mod iter;

 #[cfg(any(test, feature = "serde"))]
 pub mod ser;

 pub use nodes::sized_chunk as chunk;

 pub use hashmap::HashMap;
 pub use hashset::HashSet;
 pub use ordmap::OrdMap;
 pub use ordset::OrdSet;
 pub use vector::Vector;

 #[cfg(test)]
 mod test;

 /// Update a value inside multiple levels of data structures.
 ///
 /// This macro takes a [`Vector`][Vector], [`OrdMap`][OrdMap] or [`HashMap`][HashMap],
 /// a key or a series of keys, and a value, and returns the data structure with the
 /// new value at the location described by the keys.
 ///
 /// If one of the keys in the path doesn't exist, the macro will panic.
 ///
 /// # Examples
 ///
 /// ```
 /// # #[macro_use] extern crate im_rc as im;
 /// # use std::sync::Arc;
 /// # fn main() {
 /// let vec_inside_vec = vector![vector![1, 2, 3], vector![4, 5, 6]];
 ///
 /// let expected = vector![vector![1, 2, 3], vector![4, 5, 1337]];
 ///
 /// assert_eq!(expected, update_in![vec_inside_vec, 1 => 2, 1337]);
 /// # }
 /// ```
 ///
 /// [Vector]: ../vector/enum.Vector.html
 /// [HashMap]: ../hashmap/struct.HashMap.html
 /// [OrdMap]: ../ordmap/struct.OrdMap.html
 #[macro_export]
 macro_rules! update_in {
     ($target:expr, $path:expr => $($tail:tt) => *, $value:expr ) => {{
         let inner = $target.get($path).expect("update_in! macro: key not found in target");
         $target.update($path, update_in!(inner, $($tail) => *, $value))
     }};

     ($target:expr, $path:expr, $value:expr) => {
         $target.update($path, $value)
     };
 }

 /// Get a value inside multiple levels of data structures.
 ///
 /// This macro takes a [`Vector`][Vector], [`OrdMap`][OrdMap] or [`HashMap`][HashMap],
 /// along with a key or a series of keys, and returns the value at the location inside
 /// the data structure described by the key sequence, or `None` if any of the keys didn't
 /// exist.
 ///
 /// # Examples
 ///
 /// ```
 /// # #[macro_use] extern crate im_rc as im;
 /// # use std::sync::Arc;
 /// # fn main() {
 /// let vec_inside_vec = vector![vector![1, 2, 3], vector![4, 5, 6]];
 ///
 /// assert_eq!(Some(&6), get_in![vec_inside_vec, 1 => 2]);
 /// # }
 /// ```
 ///
 /// [Vector]: ../vector/enum.Vector.html
 /// [HashMap]: ../hashmap/struct.HashMap.html
 /// [OrdMap]: ../ordmap/struct.OrdMap.html
 #[macro_export]
 macro_rules! get_in {
     ($target:expr, $path:expr => $($tail:tt) => * ) => {{
         $target.get($path).and_then(|v| get_in!(v, $($tail) => *))
     }};

     ($target:expr, $path:expr) => {
         $target.get($path)
     };
 }

 #[cfg(test)]
 mod lib_test {
     #[test]
     fn update_in() {
         let vector = vector![1, 2, 3, 4, 5];
         assert_eq!(vector![1, 2, 23, 4, 5], update_in!(vector, 2, 23));
         let hashmap = hashmap![1 => 1, 2 => 2, 3 => 3];
         assert_eq!(
             hashmap![1 => 1, 2 => 23, 3 => 3],
             update_in!(hashmap, 2, 23)
         );
         let ordmap = ordmap![1 => 1, 2 => 2, 3 => 3];
         assert_eq!(ordmap![1 => 1, 2 => 23, 3 => 3], update_in!(ordmap, 2, 23));

         let vecs = vector![vector![1, 2, 3], vector![4, 5, 6], vector![7, 8, 9]];
         let vecs_target = vector![vector![1, 2, 3], vector![4, 5, 23], vector![7, 8, 9]];
         assert_eq!(vecs_target, update_in!(vecs, 1 => 2, 23));
     }

     #[test]
     fn get_in() {
         let vector = vector![1, 2, 3, 4, 5];
         assert_eq!(Some(&3), get_in!(vector, 2));
         let hashmap = hashmap![1 => 1, 2 => 2, 3 => 3];
         assert_eq!(Some(&2), get_in!(hashmap, &2));
         let ordmap = ordmap![1 => 1, 2 => 2, 3 => 3];
         assert_eq!(Some(&2), get_in!(ordmap, &2));

         let vecs = vector![vector![1, 2, 3], vector![4, 5, 6], vector![7, 8, 9]];
         assert_eq!(Some(&6), get_in!(vecs, 1 => 2));
     }
 }