R-Style Boolean Sequences in Swift
One of the great things about the R statistical programming languages is it’s ability to operate on vectors of values. I got to thinking, wouldn’t it to be great to have this capability in Swift? Well, Swift is nice that way. Swift gives you a lot of flexibility to morph the language.
We want to be able to say something like this:
[1, 2, 3, 2, 1, 2, 8] < 3
And this should return a sequence of boolean values:
[true, true, false, true, true, true, false]
Generating Sequences of Boolean Values
In order create a sequence of boolean values we can add:
extension Sequence where Element: Comparable {
static func <(_ lhs: Self, _ rhs: Element) -> [Bool] {
lhs.map { $0 < rhs }
}
}
This does the job, but materializes a boolean array of the same size as the source. To avoid that allocation, we can make it lazy and return a lazy sequence like this:
extension Sequence where Element: Comparable {
static func <(_ lhs: Self, _ rhs: Element) -> LazyMapSequence<Self, Bool> {
lhs.lazy.map { $0 < rhs }
}
}
Although it requires some boilerplate, we can extend this to all of the other comparison operators as well.
Broadcasting a collection
Another great feature of the R language and other languages like Octave, is the ability to broadcast values. This gives you a way to deal with sequences of unequal lengths by “broadcasting” the values of a sequence again and again if necessary.
We can make a custom collection to do this:
struct Broadcasted<Base: Collection>: Collection where Base.Index: BinaryInteger {
typealias Element = Base.Element
typealias Index = Base.Index
private var base: Base
private var count_: Index
var startIndex: Index { base.startIndex }
var endIndex: Index { count_ + base.startIndex }
func index(after i: Index) -> Index { i + 1 }
subscript(index: Index) -> Element {
base[(index - startIndex) % Index(base.count) + startIndex]
}
init(_ base: Base, count: Index) {
self.base = base
self.count_ = count
}
}
It requires the collection to have an index that supports the modulo operation so it can get at the value again and again.
Selection subscript
We can bring these together to let us select elements from a collection.
extension Collection {
subscript<C>(selection: C) -> [Element] where C: Collection, C.Index: BinaryInteger, C.Element == Bool {
zip(Broadcasted(selection, count: C.Index(self.count)), self)
.filter(\.0).map(\.1)
}
}
This let’s us say something like this:
// values: [Double]
values[values < 3.2]
Future Improvements
The gist is here: https://gist.github.com/rayfix/b5f5641f0c34e88f9a178e9a57d39092
I think it’s a nice start but there are some other things to look at. Our subscript operators allocates a new array. It might be better to return a slice into the original array. It could, for example, return a DiscontinuousSlice
of the original collection specified as part of SE-0270. See https://github.com/apple/swift-se0270-range-set/blob/main/Sources/SE0270_RangeSet/DiscontiguousSlice.swift
.
It might also be nice to create boolean operators for lists of boolean values. Left for a future post. :)
UPDATES:
- The original post’s
Broadcast
collection didn’t handle slices (non-zero start indexes correctly). Fixed. - Josh Homann suggested using an
UnfoldSequence
instead of a customBroadcasted
collection. This removes the requirements on the index type to know about modulo arithmetic, which is nice. Although it only usesSequence
API, it extendsCollection
since repeated access to aSequence
is not defined.
extension Collection {
func repeatForever() -> UnfoldSequence<Element, Iterator> {
sequence(state: self.makeIterator()) { iterator in
iterator.next() ?? {
iterator = self.makeIterator()
return iterator.next()
}()
}
}
}
[1,2,3].repeatForever().prefix(8).forEach { print($0) }