An important feature in R is the ability for you to build your own
functions. This extensibility allows you to automate repetitive tasks or
run a series of commands with different input for each repetition. The
basic form for an R function is:

myFuctionName
<- function( ) {

>MY CODE GOES
HERE<

}

Once you run the function declaration, it is available for your use
during your current R session. This behavior is slightly different than
the behavior of functions in R packages. Once an R package is loaded,
you have immediate access to all of the package function. Your functions
are only available once you run the function declaration, and they are
only available in your current session. RStudio enables you to override
this behavior. You can save your R session when you close RStudio and it
will resume that session, with its entire environment [including your
functions] available, when you restart RStudio.

Here is a function that creates a vector of integers from 1 to n, where
the user will choose the value of n in the function call:

integerVector
<- function(n) {

v <- c(1:n)

v

}

The name of the new function is **integerVector(
)**. This
function has one numeric parameter **n **that is used in
the function to set the upper limit of the integer sequence in the
output vector. The first line of code builds the vector using the R
collect function **c( )**. The final line outputs the
resulting vector. This final line is necessary if you want to output the
results of the code inside of your function for use outside the
function.

That last sentence is an important concept to understand. All of the
data objects that are inside your function are not accessible outside
your function. The data object v in **integerVector(
)** only
exists while **integerVector( )** executes. If you want to use the
result of the computation in your function, you must output that data.
The last line of **integerVector(
)** does this.

Once you execute this declaration, you can use integerVector(
) in your R session.
The function call integerVector(10) will result in the output

1
2 3 4 5 6 7 8 9 10

You can save the output of integerVector(
) by simply assigning
its output to an R object like this:

x
<- integerVector(10)

Now the object x contains the vector 1
2 3 4 5 6 7 8 9 10

It is important to remember to use unique names for your functions. If
you accidentally choose a function name that matches the name of
existing R function, your function will override the existing function
and only your function will be active in your current session. While you
may want to compute a result differently than the existing R function,
it is still better to use your own unique function name. This naming
scheme will ensure that you have access to both functions, not simply
your custom function.

You like using integerVector( ) and you use it often to create new
vectors. After using it for a while, you notice that often you create
vectors with the values 1 to 10. We can modify integerVector(
) so that it will
create a vector containing the values 1 to 10 as a default operation,
but you can set another maximum value when you need it. We can do this
by declaring a default input argument value. The default input argument
will be used unless you designate another value.

Here is a modified integerVector( ) that has a default value of 10

integerVectorDef
<- function(n = 10) {

v <- c(1:n)

v

}

After you run this function declaration, entering integerVectorDef(
) will ouput the vector
1 2 3 4 5 6 7 8 9 10
and entering integerVectorDef(5) will ouput the vector 1 2 3 4 5.
Your new function integerVectorDef( ) assumes a default input value of
10, unless you designate another value. If you run integerVector(
) without an input
argument. You will get the error message: Error
in integerVector() : argument "n" is missing, with no default

Notice that the default value declaration n
= 10 uses the
assignment symbol **=** and not the symbol **<-**. This is an
important distinction. Either symbol can be used to assign a value or
the output of a function to an R object. The same is not true for
function arguments. You must always use **=** when you assign a
value to function input arguments. this is a good time to develop the
habit of always using **<-** when you assign a value to an object
and **=** when assigning a value to a function argument. This habit
can prevent errors in your R scripts.

Here is a more complex function:

select10
<- function(n) {

v <- c() #
create an empty vector

for(i in 1:n) { #
repeat n times

v = rbind(v,
sample(1:10, size = 1)) # add a random value to the vector

} # end loop

v # output the
vector

}

The function select10( )
above, creates a list of n random numbers in the range of 1 to 10. The
function creates an empty list v. It then selects a number between 1 and
10 randomly, using the R function sample(
) and adds it to the
list. It repeats this step n times. Once the list is complete, select10( )
outputs the final list.

You can modify select10( ) so that it creates a list of **size**
numbers between **mini** and **maxi**. [We will avoid using min(
) and max( ) since they are standard R function names]. The new function
might look like this:

selectAny
<- function(size, mini, maxi) {

v <- c() #
create an empty vector

for(i in 1:size) {
# repeat size times

v <- rbind(v,
sample(mini:maxi, size = 1)) # add a random value to the vector

} # end loop

v # output the
vector

}

This new function **selectAny( )** does the same computational steps
as **select10( )**
but the function arguments of **selectAny(
)** provide an
interface to designate the size of the list and the minimum and maximum
values found in the list. Both of these functions use a **for(
)** loop to repeat
an operation and you can control the number or repeated operations with
a function argument, as shown in **selectAny(
)**.

This is a good time to explain the arguments of a for(
) loop. The first
argument **i** is a variable that can be used inside the loop. It is
assigned the value of the loop sequence [in this case 1:size] and its value is advanced on each
pass through the loop. There is no requirement to use the loop variable.
The example above does not use it. The second argument of the loop is
the sequence of values generated in the loop. The loop will repeat once
for each item in the sequence. A sequence of 1:10 will repeat 10 times [once for each
value]. You can use any start and end value for a loop sequence. The
sequence 5:1
will repeat 5 times [once for each value] and produce the sequence 5 4 3 2 1.

Create and run the function **selectAny(
)**. Test it using
various values for the input arguments size, mini and maxi.

Functions can also be used to simulate tasks. Here is an example:

# simulate rolling
a pair of 6-sided dice x times

dice6Data
<- function(x) {

d1 <- c() # a
numeric vector for the first die

d2 <- c() # a
numeric vector for the second die

ds <- c() # a
numeric vector for the sum of both dice

for(i in 1:x) { #
roll the dice x times

d1 <- rbind(d1,
sample(1:6, size = 1)) # add a random value to the vector

d2 <- rbind(d2,
sample(1:6, size = 1)) # add a random value to the vector

} # end loop

ds <- d1 + d2 #
add the two dice

v <- cbind(d1,
d2, ds) # combine the three vectors into a matrix with three columns

v <-
as.data.frame(v) # convert v into a data frame

names(v) <-
c("die1", "die2", "sum") #assign column titles

v # output the
results

}

Because functions help you automate tasks, you can develop functions to
compute anything that is not already included in an R package. An
example of this would be the mode statistical metric. In descriptive
statistics, the mode is the most frequent value in a set of values. This
statistic is normally used to describe discrete [integer] sets of
values. R does not have a standard function to calculate mode. The **mode( )**
function built into R returns the data type of a data object, not its
statistical mode. You can remedy this by writing your own function. Here
is an example:

# statistical mode
function - R mode() does something else

statMode
<- function(v) { # input is a vector

result <-
as.numeric(names( which(table(v) == max(table(v))))) # find the most
frequent element in the vector

return(result) #
output the result

}

When
given a data object containing a vector of integers, **statMode(
)** will return the
most frequent value or values [if there is a tie]. Notice that statMode( )
uses the command to return the result of the function's computation.
This command **return( )** is another way to output the
results of your function.

So, to summarize this discussion, user defined functions in R allow you
to repeat operations with each call to your function, help you simulate
a process for analysis, and to automate a functionality not normally
included in R or any of its packages.