Title: | Iterator Tools |
---|---|
Description: | Various tools for creating iterators, many patterned after functions in the Python itertools module, and others patterned after functions in the 'snow' package. |
Authors: | Steve Weston, Hadley Wickham |
Maintainer: | Steve Weston <[email protected]> |
License: | GPL-2 |
Version: | 0.1-3 |
Built: | 2024-10-29 03:29:58 UTC |
Source: | https://github.com/cran/itertools |
The itertools
package provides a variety of functions used to create
iterators, as defined by REvolution Computing's iterators
package.
Many of the functions are patterned after functions of the same name in
the Python itertools module, including chain
, product
,
izip
, ifilter
, etc.
In addition, a number of functions were inspired by utility functions in the
snow
package, such as isplitRows
, isplitCols
, and
isplitIndices
.
There are also several utility functions that were contributed by
Hadley Wickham that aid in writing iterators.
These include is.iterator
, end_iterator
,
iteration_has_ended
, and new_iterator
.
More information is available on the following topics:
isplitVector |
splits, or slices, a vector into shorter segments |
isplitCols |
splits a matrix column-wise |
isplitRows |
splits a matrix row-wise |
isplitIndices |
iterate over “chunks” of indices from 1 to n |
chain |
chain multiple iterators together into one iterator |
enumerate |
create an enumeration from an iterator |
ichunk |
create lists of values from an iterator to aid manual chunking |
ihasNext |
add a hasNext method to an iterator |
ifilter |
only return values for which a predicate function returns true |
ifilterfalse |
only return values for which a predicate function returns false |
ilimit |
limit, or truncate, an iterator |
ireadBin |
reads from a binary connection |
irep |
an iterator version of the rep function |
irepeat |
a simple repeating value iterator |
izip |
zip together multiple iterators |
product |
zip together multiple iterators in cartesian product fashion |
recycle |
recycle values from an iterator repeatedly |
timeout |
iterate for a specified number of seconds |
is.iterator |
indicates if an object is an iterator |
end_iteration |
throws an exception to signal end of iteration |
iteration_has_ended |
tests an exception to see if iteration has ended |
new_iterator |
creates a new iterator object |
For a complete list of functions with individual help pages,
use library(help="itertools")
.
Create an iterator that chains multiple iterables together.
chain(...)
chain(...)
... |
The iterables to iterate over. |
# Iterate over two iterables as.list(chain(1:2, letters[1:3]))
# Iterate over two iterables as.list(chain(1:2, letters[1:3]))
Create an iterator that iterates over an iterable, returning the value in a list that includes an index.
enumerate(iterable)
enumerate(iterable)
iterable |
Iterable to iterate over. |
# Create an enumeration of five random numbers as.list(enumerate(rnorm(5)))
# Create an enumeration of five random numbers as.list(enumerate(rnorm(5)))
hasNext
is a generic function that indicates if the iterator
has another element.
hasNext(obj, ...) ## S3 method for class 'ihasNext' hasNext(obj, ...)
hasNext(obj, ...) ## S3 method for class 'ihasNext' hasNext(obj, ...)
obj |
an iterator object. |
... |
additional arguments that are ignored. |
Logical value indicating whether the iterator has a next element.
it <- ihasNext(iter(c('a', 'b', 'c'))) while (hasNext(it)) print(nextElem(it))
it <- ihasNext(iter(c('a', 'b', 'c'))) while (hasNext(it)) print(nextElem(it))
Create an iterator over an array.
iarray(X, MARGIN, ..., chunks, chunkSize, drop, idx=lapply(dim(X), function(i) TRUE))
iarray(X, MARGIN, ..., chunks, chunkSize, drop, idx=lapply(dim(X), function(i) TRUE))
X |
Array to iterate over. |
MARGIN |
Vector of subscripts to iterate over. Note that if
the length of |
... |
Used to force subsequent arguments to be specified by name. |
chunks |
Number of elements that the iterator should generate.
This can be a single value or a vector the same length as |
chunkSize |
The maximum size Number of elements that the iterator
should generate.
This can be a single value or a vector the same length as |
drop |
Should dimensions of length 1 be dropped in the generated
values? It defaults to |
idx |
List of indices used to generate a call object. |
# Iterate over matrices in a 3D array x <- array(1:24, c(2,3,4)) as.list(iarray(x, 3)) # Iterate over subarrays as.list(iarray(x, 3, chunks=2)) x <- array(1:64, c(4,4,4)) it <- iarray(x, c(2,3), chunks=c(1,2)) jt <- nextElem(it) nextElem(jt) jt <- nextElem(it) nextElem(jt) it <- iarray(x, c(2,3), chunks=c(2,2)) jt <- nextElem(it) nextElem(jt) nextElem(jt) jt <- nextElem(it) nextElem(jt) nextElem(jt)
# Iterate over matrices in a 3D array x <- array(1:24, c(2,3,4)) as.list(iarray(x, 3)) # Iterate over subarrays as.list(iarray(x, 3, chunks=2)) x <- array(1:64, c(4,4,4)) it <- iarray(x, c(2,3), chunks=c(1,2)) jt <- nextElem(it) nextElem(jt) jt <- nextElem(it) nextElem(jt) it <- iarray(x, c(2,3), chunks=c(2,2)) jt <- nextElem(it) nextElem(jt) nextElem(jt) jt <- nextElem(it) nextElem(jt) nextElem(jt)
Create an iterator that iterates over another iterator until
a specified function returns FALSE
.
This can be useful for breaking out of a foreach loop, for example.
ibreak(iterable, finished)
ibreak(iterable, finished)
iterable |
Iterable to iterate over. |
finished |
Function that returns a logical value. The iterator
stops when this function returns |
# See how high we can count in a tenth of a second mkfinished <- function(time) { starttime <- proc.time()[3] function() proc.time()[3] > starttime + time } length(as.list(ibreak(icount(), mkfinished(0.1))))
# See how high we can count in a tenth of a second mkfinished <- function(time) { starttime <- proc.time()[3] function() proc.time()[3] > starttime + time } length(as.list(ibreak(icount(), mkfinished(0.1))))
Create an iterator that issues lists of values from the underlying iterable. This is useful for manually “chunking” values from an iterable.
ichunk(iterable, chunkSize, mode='list')
ichunk(iterable, chunkSize, mode='list')
iterable |
Iterable to iterate over. |
chunkSize |
Maximum number of values from |
mode |
Mode of the objects returned by the iterator. |
# Split the vector 1:10 into "chunks" with a maximum length of three it <- ihasNext(ichunk(1:10, 3)) while (hasNext(it)) { print(unlist(nextElem(it))) } # Same as previous, but return integer vectors rather than lists it <- ihasNext(ichunk(1:10, 3, mode='integer')) while (hasNext(it)) { print(nextElem(it)) }
# Split the vector 1:10 into "chunks" with a maximum length of three it <- ihasNext(ichunk(1:10, 3)) while (hasNext(it)) { print(unlist(nextElem(it))) } # Same as previous, but return integer vectors rather than lists it <- ihasNext(ichunk(1:10, 3, mode='integer')) while (hasNext(it)) { print(nextElem(it)) }
The ifilter
and ifilterfalse
functions create iterators
that return a subset of the values of the specified iterable.
ifilter
returns the values for which the pred
function
returns TRUE
, and ifilterfalse
returns the values for
which the pred
function returns FALSE
.
ifilter(pred, iterable) ifilterfalse(pred, iterable)
ifilter(pred, iterable) ifilterfalse(pred, iterable)
pred |
A function that takes one argument and returns |
iterable |
The iterable to iterate over. |
# Return the odd numbers between 1 and 10 as.list(ifilter(function(x) x %% 2 == 1, icount(10))) # Return the even numbers between 1 and 10 as.list(ifilterfalse(function(x) x %% 2 == 1, icount(10)))
# Return the odd numbers between 1 and 10 as.list(ifilter(function(x) x %% 2 == 1, icount(10))) # Return the even numbers between 1 and 10 as.list(ifilterfalse(function(x) x %% 2 == 1, icount(10)))
ihasNext
is a generic function that indicates if the iterator
has another element.
ihasNext(iterable)
ihasNext(iterable)
iterable |
an iterable object, which could be an iterator. |
An ihasNext
iterator that wraps the specified iterator and supports
the hasNext
method.
it <- ihasNext(c('a', 'b', 'c')) while (hasNext(it)) print(nextElem(it))
it <- ihasNext(c('a', 'b', 'c')) while (hasNext(it)) print(nextElem(it))
Create an iterator that wraps a specified iterable a limited number of times.
ilimit(iterable, n)
ilimit(iterable, n)
iterable |
Iterable to iterate over. |
n |
Maximum number of values to return. |
# Limit icount to only return three values as.list(ilimit(icount(), 3))
# Limit icount to only return three values as.list(ilimit(icount(), 3))
Create an iterator to read binary data from a connection.
ireadBin(con, what='raw', n=1L, size=NA_integer_, signed=TRUE, endian=.Platform$endian, ipos=NULL)
ireadBin(con, what='raw', n=1L, size=NA_integer_, signed=TRUE, endian=.Platform$endian, ipos=NULL)
con |
A connection object or a character string naming a file or a raw vector. |
what |
Either an object whose mode will give the mode of the vector
to be read, or a character vector of length one describing
the mode: one of “numeric”, “double”, “integer”,
“int”, “logical”, “complex”, “character”,
“raw”. Unlike |
n |
integer. The (maximal) number of records to be read each time the iterator is called. |
size |
integer. The number of bytes per element in the byte stream. The default, ‘NA_integer_’, uses the natural size. |
signed |
logical. Only used for integers of sizes 1 and 2, when it determines if the quantity on file should be regarded as a signed or unsigned integer. |
endian |
The endian-ness ('“big”' or '“little”') of the target system for the file. Using '“swap”' will force swapping endian-ness. |
ipos |
iterable. If not |
zz <- file("testbin", "wb") writeBin(1:100, zz) close(zz) it <- ihasNext(ireadBin("testbin", integer(), 10)) while (hasNext(it)) { print(nextElem(it)) } unlink("testbin")
zz <- file("testbin", "wb") writeBin(1:100, zz) close(zz) it <- ihasNext(ireadBin("testbin", integer(), 10)) while (hasNext(it)) { print(nextElem(it)) } unlink("testbin")
Create an iterator to read data frames from files.
ireaddf(filenames, n, start=1, col.names, chunkSize=1000)
ireaddf(filenames, n, start=1, col.names, chunkSize=1000)
filenames |
Names of files contains column data. |
n |
Number of elements to read from each column file. |
start |
Element to starting reading from. |
col.names |
Names of the columns. |
chunkSize |
Number of rows to read at a time. |
The irecord
function records the values issued by a specified
iterator to a file or connection object. The ireplay
function
returns an iterator that will replay those values. This is useful
for iterating concurrently over multiple, large matrices or data frames
that you can't keep in memory at the same time. These large objects
can be recorded to files one at a time, and then be replayed
concurrently using minimal memory.
irecord(con, iterable) ireplay(con)
irecord(con, iterable) ireplay(con)
con |
A file path or open connection. |
iterable |
The iterable to record to the file. |
suppressMessages(library(foreach)) m1 <- matrix(rnorm(70), 7, 10) f1 <- tempfile() irecord(f1, iter(m1, by='row', chunksize=3)) m2 <- matrix(1:50, 10, 5) f2 <- tempfile() irecord(f2, iter(m2, by='column', chunksize=3)) # Perform a simple out-of-core matrix multiply p <- foreach(col=ireplay(f2), .combine='cbind') %:% foreach(row=ireplay(f1), .combine='rbind') %do% { row %*% col } dimnames(p) <- NULL print(p) all.equal(p, m1 %*% m2) unlink(c(f1, f2))
suppressMessages(library(foreach)) m1 <- matrix(rnorm(70), 7, 10) f1 <- tempfile() irecord(f1, iter(m1, by='row', chunksize=3)) m2 <- matrix(1:50, 10, 5) f2 <- tempfile() irecord(f2, iter(m2, by='column', chunksize=3)) # Perform a simple out-of-core matrix multiply p <- foreach(col=ireplay(f2), .combine='cbind') %:% foreach(row=ireplay(f1), .combine='rbind') %do% { row %*% col } dimnames(p) <- NULL print(p) all.equal(p, m1 %*% m2) unlink(c(f1, f2))
Create an iterator version of the rep
function.
irep(iterable, times, length.out, each)
irep(iterable, times, length.out, each)
iterable |
The iterable to iterate over repeatedly. |
times |
A vector giving the number of times to repeat each element
if the length is greater than one, or to repeat all the elements if the
length is one. This behavior is less strict than |
length.out |
non-negative integer. The desired length of the output iterator. |
each |
non-negative integer. Each element of the iterable is repeated
|
unlist(as.list(irep(1:4, 2))) unlist(as.list(irep(1:4, each=2))) unlist(as.list(irep(1:4, c(2,2,2,2)))) unlist(as.list(irep(1:4, c(2,1,2,1)))) unlist(as.list(irep(1:4, each=2, len=4))) unlist(as.list(irep(1:4, each=2, len=10))) unlist(as.list(irep(1:4, each=2, times=3)))
unlist(as.list(irep(1:4, 2))) unlist(as.list(irep(1:4, each=2))) unlist(as.list(irep(1:4, c(2,2,2,2)))) unlist(as.list(irep(1:4, c(2,1,2,1)))) unlist(as.list(irep(1:4, each=2, len=4))) unlist(as.list(irep(1:4, each=2, len=10))) unlist(as.list(irep(1:4, each=2, times=3)))
Create an iterator that returns a value a specified number of times.
irepeat(x, times)
irepeat(x, times)
x |
The value to return repeatedly. |
times |
The number of times to repeat the value. Default value is infinity. |
# Repeat a value 10 times unlist(as.list(irepeat(42, 10)))
# Repeat a value 10 times unlist(as.list(irepeat(42, 10)))
The iRNGStream
function creates an infinite iterator that calls
nextRNGStream
repeatedly, and iRNGSubStream
creates an
infinite iterator that calls nextRNGSubStream
repeatedly.
iRNGStream(seed) iRNGSubStream(seed)
iRNGStream(seed) iRNGSubStream(seed)
seed |
Either a single number to be passed to |
set.seed
, nextRNGStream
,
nextRNGSubStream
it <- iRNGStream(313) print(nextElem(it)) print(nextElem(it)) ## Not run: library(foreach) foreach(1:3, rseed=iRNGSubStream(1970), .combine='c') %dopar% { RNGkind("L'Ecuyer-CMRG") # would be better to initialize workers only once assign('.Random.seed', rseed, pos=.GlobalEnv) runif(1) } ## End(Not run)
it <- iRNGStream(313) print(nextElem(it)) print(nextElem(it)) ## Not run: library(foreach) foreach(1:3, rseed=iRNGSubStream(1970), .combine='c') %dopar% { RNGkind("L'Ecuyer-CMRG") # would be better to initialize workers only once assign('.Random.seed', rseed, pos=.GlobalEnv) runif(1) } ## End(Not run)
is.iterator
indicates if an object is an iterator.
end_iteration
throws an exception to signal that there
are no more values available in an iterator.
iteration_has_ended
tests an exception to see if it
indicates that iteration has ended.
new_iterator
returns an iterator object.
is.iterator(x) end_iteration() iteration_has_ended(e) new_iterator(nextElem, ...)
is.iterator(x) end_iteration() iteration_has_ended(e) new_iterator(nextElem, ...)
x |
any object. |
e |
a condition object. |
nextElem |
a function object that takes no arguments. |
... |
not currently used. |
# Manually iterate using the iteration_has_ended function to help it <- iter(1:3) tryCatch({ stopifnot(is.iterator(it)) repeat { print(nextElem(it)) } }, error=function(e) { if (!iteration_has_ended(e)) { stop(e) } })
# Manually iterate using the iteration_has_ended function to help it <- iter(1:3) tryCatch({ stopifnot(is.iterator(it)) repeat { print(nextElem(it)) } }, error=function(e) { if (!iteration_has_ended(e)) { stop(e) } })
Create an iterator that splits a matrix into block columns.
You can specify either the number of blocks, using the chunks
argument, or the maximum size of the blocks, using the chunkSize
argument.
isplitCols(x, ...)
isplitCols(x, ...)
x |
Matrix to iterate over. |
... |
Passed as the second and subsequent arguments to
|
An iterator that returns submatrices of x
.
# Split a matrix into submatrices with a maximum of three columns x <- matrix(1:30, 3) it <- ihasNext(isplitCols(x, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the same matrix into five submatrices it <- ihasNext(isplitCols(x, chunks=5)) while (hasNext(it)) { print(nextElem(it)) }
# Split a matrix into submatrices with a maximum of three columns x <- matrix(1:30, 3) it <- ihasNext(isplitCols(x, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the same matrix into five submatrices it <- ihasNext(isplitCols(x, chunks=5)) while (hasNext(it)) { print(nextElem(it)) }
Create an iterator of chunks of indices from 1 to n
.
You can specify either the number of pieces, using the chunks
argument, or the maximum size of the pieces, using the chunkSize
argument.
isplitIndices(n, ...)
isplitIndices(n, ...)
n |
Maximum index to generate. |
... |
Passed as the second and subsequent arguments to
|
An iterator that returns vectors of indices from 1 to n
.
# Return indices from 1 to 17 in vectors no longer than five it <- ihasNext(isplitIndices(17, chunkSize=5)) while (hasNext(it)) { print(nextElem(it)) } # Return indices from 1 to 7 in four vectors it <- ihasNext(isplitIndices(7, chunks=4)) while (hasNext(it)) { print(nextElem(it)) }
# Return indices from 1 to 17 in vectors no longer than five it <- ihasNext(isplitIndices(17, chunkSize=5)) while (hasNext(it)) { print(nextElem(it)) } # Return indices from 1 to 7 in four vectors it <- ihasNext(isplitIndices(7, chunks=4)) while (hasNext(it)) { print(nextElem(it)) }
Create an iterator that splits a matrix into block rows.
You can specify either the number of blocks, using the chunks
argument, or the maximum size of the blocks, using the chunkSize
argument.
isplitRows(x, ...)
isplitRows(x, ...)
x |
Matrix to iterate over. |
... |
Passed as the second and subsequent arguments to
|
An iterator that returns submatrices of x
.
# Split a matrix into submatrices with a maximum of three rows x <- matrix(1:100, 10) it <- ihasNext(isplitRows(x, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the same matrix into five submatrices it <- ihasNext(isplitRows(x, chunks=5)) while (hasNext(it)) { print(nextElem(it)) }
# Split a matrix into submatrices with a maximum of three rows x <- matrix(1:100, 10) it <- ihasNext(isplitRows(x, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the same matrix into five submatrices it <- ihasNext(isplitRows(x, chunks=5)) while (hasNext(it)) { print(nextElem(it)) }
Create an iterator that splits a vector into smaller pieces.
You can specify either the number of pieces, using the chunks
argument, or the maximum size of the pieces, using the chunkSize
argument.
isplitVector(x, ...)
isplitVector(x, ...)
x |
Vector to iterate over. Note that it doesn't need to be an atomic vector, so a list is acceptable. |
... |
Passed as the second and subsequent arguments to
|
An iterator that returns vectors of the same type as x
with one
or more elements from x
.
# Split the vector 1:10 into "chunks" with a maximum length of three it <- ihasNext(isplitVector(1:10, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the vector "letters" into four chunks it <- ihasNext(isplitVector(letters, chunks=4)) while (hasNext(it)) { print(nextElem(it)) } # Get the first five elements of a list as a list nextElem(isplitVector(as.list(letters), chunkSize=5))
# Split the vector 1:10 into "chunks" with a maximum length of three it <- ihasNext(isplitVector(1:10, chunkSize=3)) while (hasNext(it)) { print(nextElem(it)) } # Split the vector "letters" into four chunks it <- ihasNext(isplitVector(letters, chunks=4)) while (hasNext(it)) { print(nextElem(it)) } # Get the first five elements of a list as a list nextElem(isplitVector(as.list(letters), chunkSize=5))
Create an iterator that iterates over multiple iterables, returning the values as a list.
izip(...)
izip(...)
... |
The iterables to iterate over. |
# Iterate over two iterables of different sizes as.list(izip(a=1:2, b=letters[1:3]))
# Iterate over two iterables of different sizes as.list(izip(a=1:2, b=letters[1:3]))
Create an iterator that returns values from multiple iterators in
cartesian product fashion. That is, they are combined the manner
of nested for
loops.
product(...)
product(...)
... |
Named iterables to iterate over. The right-most iterables change more quickly, like an odometer. |
# Simulate a doubly-nested loop with a single while loop it <- ihasNext(product(a=1:3, b=1:2)) while (hasNext(it)) { x <- nextElem(it) cat(sprintf('a = %d, b = %d\n', x$a, x$b)) }
# Simulate a doubly-nested loop with a single while loop it <- ihasNext(product(a=1:3, b=1:2)) while (hasNext(it)) { x <- nextElem(it) cat(sprintf('a = %d, b = %d\n', x$a, x$b)) }
Create an iterator that recycles a specified iterable.
recycle(iterable, times=NA_integer_)
recycle(iterable, times=NA_integer_)
iterable |
The iterable to recycle. |
times |
integer. Number of times to recycle the values in the iterator.
Default value of |
# Recycle over 'a', 'b', and 'c' three times recycle(letters[1:3], 3)
# Recycle over 'a', 'b', and 'c' three times recycle(letters[1:3], 3)
Create an iterator that iterates over another iterator for a specified period of time, and then stops. This can be useful when you want to search for something, or run a test for awhile, and then stop.
timeout(iterable, time)
timeout(iterable, time)
iterable |
Iterable to iterate over. |
time |
The time interval to iterate for, in seconds. |
# See how high we can count in a tenth of a second length(as.list(timeout(icount(), 0.1)))
# See how high we can count in a tenth of a second length(as.list(timeout(icount(), 0.1)))
Create an object that contains a combiner function.
writedf.combiner(filenames)
writedf.combiner(filenames)
filenames |
Names of files to write column data to. |