Blood
Pressure Database Example
Tasks covered:
Working
with .RData files [save and load]
Creating
and working with functions
Working
with date data types
Working
with date sequences
Plotting
multiples data series on a single chart
Annotating
a chart axis with dates and and displaying the dates perpendicular to
the axis
This
example R script demonstrates how to use an R data frame as a database.
It includes code to create an initial blood pressure database. and two
functions, one to add new blood pressure entries to the database, and
another to chart the data in the database. Code to create two example
52-week blood pressure databases is also included to help demonstrate
the use of the two included functions.
This
example demonstrates several interesting skills with R. The code shows
how to use dates in a data frame, how to show dates along a chart axis,
how to use date data to position a chart legend, how to plot multiple
lines [data and reference] on the same chart, and the use of color, line
type, and line width in a chart.
Create
an initial blood pressure database
The
first block of code can be used to create an initial entry in your
database.
##########################################################################
#
use this code to create your own bp database - skip this if you are
simply going to run the example
#
add new data entries with add_bp_entry( )
#
chart the database with view_bp_chart( )
#
create the blood pressure database as a data frame [use this code to
create a new database]
bp_db
<- as.data.frame(cbind(0, 0, 0)) # create the initial database
entry
bp_db[,1]
<- Sys.Date() # insert the current date into the first column
names(bp_db)
<- c('Date','Systolic', 'Diastolic') # assign column titles
save(bp_db,
file = 'bp_db.RData') # save the database
#
end of the creation your initial database
##########################################################################
The
code
creates an initial entry in a data frame containing zeros for all of the
values. The current date is inserted into the first column. Column
titles are assigned to the data frame, and the data frame is saved as an
.RData file. This data frame can be used as a blood pressure database
with the functions add_bp_entry( ) and view_bp_chart( ).
Example
blood pressure databases
The
next block of code creates two example blood pressure databases. These
are helpful in testing the functionality of the two project functions
that will be discussed below.
##########################################################################
# use this code to create the example bp databases
# create the example databases [these are used in the example below]
# add new data entries with add_bp_entry( )
# chart the database with view_bp_chart( )
# create example data
dateSeq <- seq(as.Date("2017-06-03"), by = "week", length.out = 52) #
a sequence of 52 weekly dates
sysSeq <- seq(120, 138) # a set of systolic bp values
diaSeq <- seq(68, 85) # a set of distolic bp values
# a function to create our example bp sequences
sampleSeq <- function(s, x) {
r <- c() # an empty list
for (i in 1:x) { # repeat x times
r <-rbind(r, sample(s, 1)) # add a random value
from s to list r
}
r # return list r with x elements
}
# create bob's bp database [the dates will be inserted incorrectly]
# generate example bp values
s <- sampleSeq(sysSeq, 52) # a sequence of 52 random systolic bp
values
d <- sampleSeq(diaSeq, 52) # a sequence of 52 random diastolic bp
values
bob_bp_db <- as.data.frame(cbind(dateSeq,s,d)) # combine the data
into a dataframe
# cbind( ) coerces the dates to integer values
bob_bp_db[,1] <- dateSeq # fix the dates in the first column
names(bob_bp_db) <- c('Date','Systolic','Diastolic') # assign column
titles
save(bob_bp_db, file = 'bob_bp_db.RData') # save the database
# create carol's bp database [the dates will be inserted incorrectly]
# generate example bp values
s <- sampleSeq(sysSeq, 52) # a sequence of 52 random systolic bp
values
d <- sampleSeq(diaSeq, 52) # a sequence of 52 random diastolic bp
values
carol_bp_db <- as.data.frame(cbind(dateSeq,s,d)) # combine the data
into a dataframe
# cbind( ) coerces the dates to integer values
carol_bp_db[,1] <- dateSeq # fix the dates in the first column
names(carol_bp_db) <- c('Date','Systolic','Diastolic') # assign
column titles
save(carol_bp_db, file = 'carol_bp_db.RData') # save the database
# end of the creation the example databases
###########################################################################
The
first three lines of the code generate three sequences: a weekly date
sequence spanning 52 weeks, a sequence of systolic blood pressure
values from 120 to 138, and a sequence of diastolic blood pressure
values from 68 to 85.
# create example
data
dateSeq <- seq(as.Date("2017-06-03"), by = "week", length.out = 52) #
a sequence of 52 weekly dates
sysSeq <- seq(120, 138) # a set of systolic bp values
diaSeq <- seq(68, 85) # a set of distolic bp values
The
next section of code defines a function that we will use to generate the
series
of systolic and diastolic values for each weekly reading in
our example databases.
# a function to
create our example bp sequences
sampleSeq <- function(s, x) {
r <- c() # an empty list
for (i in 1:x) { # repeat x times
r <-rbind(r, sample(s, 1)) # add a random value
from s to list r
}
r # return list r with x elements
}
The
function
sampleSeq has two input arguments: a
sequence of values and the desired number of values in
the output list. The first line of code creates an empty list, names r,
where the generated values will be stored. Then the function loops for x
iterations. In each loop, the input sequence is randomly sampled for and
value and that value is inserted in the output list. After the loop is
complete, the output list r of x elements is returned.
The
next two code blocks create the two example blood pressure databases.
Both blocks are identical with the exception that one creates a database
called bob_bp_db
and the other creates a database called carol_bp_db, therefore only the first block
will be described here.
# create bob's bp
database [the dates will be inserted incorrectly]
# generate example bp values
s <- sampleSeq(sysSeq, 52) # a sequence of 52 random systolic bp
values
d <- sampleSeq(diaSeq, 52) # a sequence of 52 random diastolic bp
values
bob_bp_db <- as.data.frame(cbind(dateSeq,s,d)) # combine the data
into a dataframe
# cbind( ) coerces the dates to integer values
bob_bp_db[,1] <- dateSeq # fix the dates in the first column
names(bob_bp_db) <- c('Date','Systolic','Diastolic') # assign column
titles
save(bob_bp_db, file = 'bob_bp_db.RData') # save the database
The
function sampleSeq( )
is used to generate two 52 element sequences, one for the systolic blood
pressure values and another for the diastolic values. These two
sequences are combined with the weekly dates into the data frame bob_bp_db
using cbind( ).
Unfortunately, cbind( )
coerces the date values into integer values before as.data.frame(
) creates our data
frame, so the date sequence is reassigned to the first column of bob_bp_db.
Now, the first column of the data frame is populated correctly with our
sequence of weekly dates. The column titles of Date, Systolic, and
Diastolic are added to the data frame. finally the data frame is saved
as an RData file. This process is repeated for carol_bp_db.
Manage
the blood pressure databases [add and view the data]
The
next block of code creates the two functions for the blood pressure
database. The function add_bp_entry(
) adds a new entry [current date, systolic BP,
diastolic BP] to the database and saves it. The function view_bp_chart(
) charts the data in a selected database.
##########################################################################
# the bp database functions
add_bp_entry <- function(DB, systolic, diastolic) { # add an entry
to the db
if(DB == 'bob') {
load('bob_bp_db.RData') # open the database
new_row <- as.data.frame(cbind(0, systolic,
diastolic)) # create a new row
new_row[,1] <- Sys.Date() # insert the current
date in the first column
names(new_row) <- c('Date','Systolic',
'Diastolic') # add column titles
result <- rbind(bob_bp_db, new_row) # add the
new row to the database
bob_bp_db <- result
save(bob_bp_db, file = 'bob_bp_db.RData') # save
the database
} else {
load('carol_bp_db.RData') # open the database
new_row <- as.data.frame(cbind(0, systolic,
diastolic)) # createa a new row
new_row[,1] <- Sys.Date() # insert the current
date in the first column
names(new_row) <- c('Date','Systolic',
'Diastolic') # add column titles
result <- rbind(carol_bp_db, new_row) # add the
new row to the database
carol_bp_db <- result
save(carol_bp_db, file = 'carol_bp_db.RData') #
save the database
}
}
view_bp_chart <- function(DB) { # chart the bp data
if(DB == "bob") {
load('bob_bp_db.RData') # open the database
startDate <- bob_bp_db[1,1] # get the first date
in the database to position the legend correctly
# plot a line chart of the systolic bp [note that
xaxt='n' supresses the x-axis values]
plot(bob_bp_db$Systolic ~ bob_bp_db$Date,
col='red', type = 'l', ylim = c(50,145), xaxt='n',
main =
'Bob\'s Blood Pressure', xlab = 'Date', ylab = 'BP')
# add dates as the x-axis values perpendicular [las
= 2] to the axis
axis(1, bob_bp_db$Date, labels =
format(bob_bp_db$Date, "%b-%d"), las = 2, cex.axis = .7)
# add the diastolic bp line
lines(bob_bp_db$Diastolic ~ bob_bp_db$Date, col =
'green')
} else {
load('carol_bp_db.RData') # open the database
startDate <- carol_bp_db[1,1] # get the first
date in the database to position the legend correctly
# plot a line chart of the systolic bp [note that
xaxt='n' supresses the x-axis values]
plot(carol_bp_db$Systolic ~ carol_bp_db$Date,
col='red', type = 'l', ylim = c(50,145), xaxt='n',
main =
'Carol\'s Blood Pressure', xlab = 'Date', ylab = 'BP')
# add dates as the x-axis values perpendicular [las
= 2] to the axis
axis(1, carol_bp_db$Date, labels =
format(carol_bp_db$Date, "%b-%d"), las = 2, cex.axis = .7)
# add the diastolic bp line
lines(carol_bp_db$Diastolic ~ carol_bp_db$Date, col
= 'green')
}
grid() # show a grid
abline(h = 120, lty = 3, col = 'red') # normal systolic bp line
abline(h = 140, lty = 3, lwd = 2, col = 'red') # borderline
systolic bp line
abline(h = 80, lty = 3, col = 'green') # normal diastolic bp
line
abline(h = 90, lty = 3, lwd = 2, col = 'green') #
borderline diastolic bp line
# add a legend
legend(as.Date(startDate),115, c('Systolic','Diastolic'),lty =
c(1,1), col = c('red','green'))
}
# end of the database functions
##########################################################################
The function add_bp_entry(
) will add a daily
entry to the selected blood pressure database.
add_bp_entry <-
function(DB, systolic, diastolic) { # add an entry to the db
if(DB == 'bob') {
load('bob_bp_db.RData') # open the database
new_row <- as.data.frame(cbind(0, systolic,
diastolic)) # create a new row
new_row[,1] <- Sys.Date() # insert the current
date in the first column
names(new_row) <- c('Date','Systolic',
'Diastolic') # add column titles
result <- rbind(bob_bp_db, new_row) # add the new
row to the database
bob_bp_db <- result
save(bob_bp_db, file = 'bob_bp_db.RData') # save the
database
} else {
load('carol_bp_db.RData') # open the database
new_row <- as.data.frame(cbind(0, systolic,
diastolic)) # createa a new row
new_row[,1] <- Sys.Date() # insert the current
date in the first column
names(new_row) <- c('Date','Systolic',
'Diastolic') # add column titles
result <- rbind(carol_bp_db, new_row) # add the
new row to the database
carol_bp_db <- result
save(carol_bp_db, file = 'carol_bp_db.RData') # save
the database
}
}
This function has three input
arguments: the database to add the new entry to, the systolic blood
pressure value, and the diastolic blood pressure value. The function
is separated into two duplicate sections. The first section adds an
entry into bob_bp_db ,
and the second section adds an entry into carol_bp_db
. Since the second section is
essentially a mirror of the first section, so this description will
only cover the functionality of the first section.
The
first code line opens the database file for editing in this function.
Once add_bp_entry( ) ends, the blood pressure database is not active.
The next line of code, a new data frame is created containing 0, the
input systolic value, and the input diastolic value using as.data.frame( )
and cbind( ) .
The first value of this data frame is replaced with the current date
using Sys.Date( )
and the column titles [Date, Systolic, and Diastolic] are added with
the function names( ) .
It is necessary to have matching column titles so that new new data
frame can be added to the existing database. R will raise an error if
the column titles do not match. Finally, the new row is added to the
database with rbind( ) and the updated database is saved
to an RData file. These steps are repeated in the second section for
carol_bp_db. The command add_bp_entry('bob',
121, 72) will add a
new entry to the database bob_bp_db containing today's date,
a systolic blood pressure value of 121, and a
diastolic blood pressure value of 72.
It
is important to understand that the database file is open and only
available inside this function and not in your main R session. Even if
the database is active in your current R session, this instance of the
database will be opened, updated, and saved without altering the open
version in your main R session.
The function view_bp_chart(
) creates a line
chart of the systolic and diastolic blood pressure values in the
selected blood pressure database. The function consists of three
sections. The first section charts bob_bp_db. The second section charts carol_bp_db.
The third section adds common details to either chart. The details
of how the charts are defined and presented are worth reviewing and
understanding.
view_bp_chart
<- function(DB) { # chart the bp data
if(DB == 'bob') {
load('bob_bp_db.RData') # open the database
startDate <- bob_bp_db[1,1] # get the first date
in the database to position the legend correctly
# plot a line chart of the systolic bp [note that
xaxt='n' supresses the x-axis values]
plot(bob_bp_db$Systolic ~ bob_bp_db$Date,
col='red', type = 'l', ylim = c(50,145), xaxt='n',
main = 'Bob\'s Blood Pressure', xlab = 'Date', ylab
= 'BP')
# add dates as the x-axis values perpendicular [las
= 2] to the axis
axis(1, bob_bp_db$Date, labels =
format(bob_bp_db$Date, "%b-%d"), las = 2, cex.axis = .7)
# add the diastolic bp line
lines(bob_bp_db$Diastolic ~ bob_bp_db$Date, col =
'green')
} else {
load('carol_bp_db.RData') # open the database
startDate <- carol_bp_db[1,1] # get the first
date in the database to position the legend correctly
# plot a line chart of the systolic bp [note that
xaxt='n' supresses the x-axis values]
plot(carol_bp_db$Systolic ~ carol_bp_db$Date,
col='red', type = 'l', ylim = c(50,145), xaxt='n',
main = 'Carol\'s Blood Pressure', xlab = 'Date',
ylab = 'BP')
# add dates as the x-axis values perpendicular [las
= 2] to the axis
axis(1, carol_bp_db$Date, labels =
format(carol_bp_db$Date, "%b-%d"), las = 2, cex.axis = .7)
# add the diastolic bp line
lines(carol_bp_db$Diastolic ~ carol_bp_db$Date, col
= 'green')
}
grid() # show a grid
abline(h = 120, lty = 3, col = 'red') # normal systolic bp line
abline(h = 140, lty = 3, lwd = 2, col = 'red') # borderline
systolic bp line
abline(h = 80, lty = 3, col = 'green') # normal diastolic bp
line
abline(h = 90, lty = 3, lwd = 2, col = 'green') #
borderline diastolic bp line
# add a legend
legend(as.Date(startDate),115, c('Systolic','Diastolic'),lty =
c(1,1), col = c('red','green'))
}
The
function view_bp_chart( ) has one input argument that
identifies which blood pressure database (bob or carol) to chart. The
designated RData file is opened. The first date entry is saved so the
chart legend can be properly positioned later in the function. A chart
is plotted for the systolic blood pressure values with the command
plot(bob_bp_db$Systolic ~ bob_bp_db$Date, col='red', type = 'l', ylim
= c(50,145), xaxt='n',
main = 'Bob\'s Blood Pressure', xlab = 'Date', ylab = 'BP')
Let's
examine each argument of this plot( ) function call. The first argument
[bob_bp_db$Systolic ~ bob_bp_db$Date] designates bob_bp_db$Date as the x-axis and bob_bp_db$Systolic as the y-axis of our chart. This
argument uses the R model formula notation. The format for this
notation is response ~ predictor. For the plot(
) function, the
predictor defines the x-axis variable and the response defines the
y-axis variable. The second argument [col='red'] sets the color of the plot to
red. The third argument [type = 'l'] sets the plot type to line
[the default type is dot]. The fourth argument [ylim
= c(50,145)] sets the
limit values for the y-axis. R normally sets the axis limits
based on the maximum and minimum values of the data being
plotted. If you wish specific maximum and minimum
limits,
as we do here, you must designate them using the ylim [or xlim]
arguments. These limits will be maximum and minimum
values
in the ranges you want your chart to show. In this example we
are designating specific minimum and maximum values
for the y-axis so we can set the location for our
chart legend later. The
fifth argument [xaxt='n']
suppresses the default x-axis index values. We will add custom
labels in another command and we must suppress the default
axis values to prevent conflicting duplicate values. The sixth
argument [main = 'Bob\'s Blood
Pressure'] adds the
chart title. Notice how the apostrophe is designated using
the symbol set \'. The final two arguments [xlab
= 'Date', ylab = 'BP'>]
assign the labels for the x-axis and y-axis.
The
all of the remaining commands use functions that add content to
the current existing chart rather than creating a new chart. The
axis( )
command adds a customized x-axis to our chart.
axis(1, bob_bp_db$Date, labels =
format(bob_bp_db$Date, "%b-%d"), las = 2, cex.axis = .7)
The
first argument in the axis( ) function identifies which
side of the chart this function will customize. The possible
argument values are:
1 = below, 2 = left, 3 = above, and 4 = right. The second
argument [bob_bp_db$Date]
identifies the data that will define the axis index values. The
third argument [labels
= format(bob_bp_db$Date, "%b-%d")] defines the
format for the index value. This example uses a month-date
format. The fourth argument [las
= 2] defines the index label orientation in
relation to the axis. The argument values identify whether the labels are parallel (=0) or
perpendicular(=2) to axis. The final argument [cex.axis
= .7] is the magnification
of axis annotation relative to cex, where cex is the number indicating the amount
by which plotting text and symbols should be scaled relative to
the default. 1=default, 1.5 is 50% larger, 0.5 is 50% smaller,
etc. In this example, the axis index labels are 70% of normal
size.
The
next command adds another line plot to our chart. This
line will chart Bob's diastolic blood pressure.
lines(bob_bp_db$Diastolic ~ bob_bp_db$Date, col =
'green')
The
lines( )uses
two arguments in this example. The first argument [bob_bp_db$Diastolic ~
bob_bp_db$Date] uses the R model formula
notation, bob_bp_db$Diastolic
is identified as the response and bob_bp_db$Date is
identified as the predictor. Notice that this
function uses the same predictor variable as our plot(
) function above. This guarantees that this
added line plot will coincide with the line plot in the main
chart. The second argument [col
= 'green'] sets the color of the new line to
green. This completes the section of commands for Bob's
individual chart. The next six commands will add generic content
to either chart. This concludes the portion of the function
creating the main blood pressure chart. This code is repeated if
the other database is selected for charting.
The
final block of code adds common features to the existing blood
pressure chart. The function grid()
adds a light grid to the background of the chart area. The
function abline(
) is used four times. The first use inserts a
horizontal dotted red line at the maximum normal systolic blood
pressure value of 120. The
second use inserts a heavy horizontal dotted red line
at the borderline high systolic blood pressure value
of 140. The
third use inserts a horizontal dotted green line at the
maximum normal diastolic blood pressure value of 80. The fourth
use inserts a heavy horizontal dotted green line
at the borderline high diastolic blood pressure
value of 80. These reference lines enhance a
viewer's understand what the chart values can
mean in the context of normal and high blood
pressure values.
The
final command adds a legend to the chart so the viewer knows
what the charted lines represent.
legend(as.Date(startDate),115,
c('Systolic','Diastolic'),lty = c(1,1), col = c('red','green'))
The
first two arguments of legend(
) [as.Date(startDate),115]
identify the x-y location of the upper left corner of the legend
box [as.Date(startDate),115].
These values are expressed in terms of the x and y axis values.
In this example, the x coordinate is a date saved when the
database was loaded at the beginning of the function view_bp_chart( )[the first
date in the selected database]. The y coordinate is a blood
pressure value. The value 115 was chosen so the legend is well
placed vertically on the chart. The third argument [c('Systolic','Diastolic')]
designates the text labels for the legend items. These labels
are stored in a list created with the function c(
). The fourth argument [lty
= c(1,1)] designates that the legend items
should be shown as lines. The final argument [col
= c('red','green')] designates the color for
each legend item.
Here
is what the chart looks like for the example database bob_bp_db.RData.
This
little project is a good demonstration of several very useful R
skills and techniques.