One of the great things about the R statistical programming languages is it’s ability to operate on vectors of values. I got to thinking, wouldn’t it to be great to have this capability in Swift? Well, Swift is nice that way. Swift gives you a lot of flexibility to morph the language.

We want to be able to say something like this:

[1, 2, 3, 2, 1, 2, 8] < 3

And this should return a sequence of boolean values:

[true, true, false, true, true, true, false]

Generating Sequences of Boolean Values

In order create a sequence of boolean values we can add:

extension Sequence where Element: Comparable {
    static func <(_ lhs: Self, _ rhs: Element) -> [Bool] {
        lhs.map { $0 < rhs }
    }
}

This does the job, but materializes a boolean array of the same size as the source. To avoid that allocation, we can make it lazy and return a lazy sequence like this:

extension Sequence where Element: Comparable {
    static func <(_ lhs: Self, _ rhs: Element) -> LazyMapSequence<Self, Bool> {
        lhs.lazy.map { $0 < rhs }
    }
}

Although it requires some boilerplate, we can extend this to all of the other comparison operators as well.

Broadcasting a collection

Another great feature of the R language and other languages like Octave, is the ability to broadcast values. This gives you a way to deal with sequences of unequal lengths by “broadcasting” the values of a sequence again and again if necessary.

We can make a custom collection to do this:

struct Broadcasted<Base: Collection>: Collection where Base.Index: BinaryInteger {

  typealias Element = Base.Element
  typealias Index = Base.Index

  private var base: Base
  private var count_: Index

  var startIndex: Index { base.startIndex }
  var endIndex: Index { count_ + base.startIndex }

  func index(after i: Index) -> Index { i + 1 }

  subscript(index: Index) -> Element {
    base[(index - startIndex) % Index(base.count) + startIndex]
  }

  init(_ base: Base, count: Index) {
    self.base = base
    self.count_ = count
  }
}

It requires the collection to have an index that supports the modulo operation so it can get at the value again and again.

Selection subscript

We can bring these together to let us select elements from a collection.

extension Collection {
    subscript<C>(selection: C) -> [Element] where C: Collection, C.Index: BinaryInteger, C.Element == Bool {
        zip(Broadcasted(selection, count: C.Index(self.count)), self)
            .filter(\.0).map(\.1)
    }
}

This let’s us say something like this:

// values: [Double]

values[values < 3.2]

Future Improvements

The gist is here: https://gist.github.com/rayfix/b5f5641f0c34e88f9a178e9a57d39092

I think it’s a nice start but there are some other things to look at. Our subscript operators allocates a new array. It might be better to return a slice into the original array. It could, for example, return a DiscontinuousSlice of the original collection specified as part of SE-0270. See https://github.com/apple/swift-se0270-range-set/blob/main/Sources/SE0270_RangeSet/DiscontiguousSlice.swift .

It might also be nice to create boolean operators for lists of boolean values. Left for a future post. :)

UPDATES:

  • The original post’s Broadcast collection didn’t handle slices (non-zero start indexes correctly). Fixed.
  • Josh Homann suggested using an UnfoldSequence instead of a custom Broadcasted collection. This removes the requirements on the index type to know about modulo arithmetic, which is nice. Although it only uses Sequence API, it extends Collection since repeated access to a Sequence is not defined.
extension Collection {
  func repeatForever() -> UnfoldSequence<Element, Iterator> {
    sequence(state: self.makeIterator()) { iterator in
      iterator.next() ?? {
        iterator = self.makeIterator()
        return iterator.next()
       }()
    }
  }
}

[1,2,3].repeatForever().prefix(8).forEach { print($0) }