Title: | Miscellaneous Useful Functions Including LaTeX Tables, Kalman Filtering, QQplots with Simulation-Based Confidence Intervals, Linear Regression Diagnostics and Development Tools |
---|---|
Description: | Implementing various things including functions for LaTeX tables, the Kalman filter, QQ-plots with simulation-based confidence intervals, linear regression diagnostics, web scraping, development tools, relative risk and odds rati, GARCH(1,1) Forecasting. |
Authors: | Benjamin M. Taylor [aut, cre] |
Maintainer: | Benjamin M. Taylor <[email protected]> |
License: | GPL-3 |
Version: | 1.5-10 |
Built: | 2025-02-11 03:26:38 UTC |
Source: | https://github.com/cran/miscFuncs |
A function to print a welcome message on loading package
.onAttach(libname, pkgname)
.onAttach(libname, pkgname)
libname |
libname argument |
pkgname |
pkgname argument |
...
A function to convert decimal to binary
bin(n)
bin(n)
n |
a non-negative integer |
the binary representation stored in a vector.
A function to
colour_legend(palette, suffix = "", dir = ".")
colour_legend(palette, suffix = "", dir = ".")
palette |
X |
suffix |
X |
dir |
X |
...
A function to compute Taylor's correlation coefficient ;-)
cor_taylor(X)
cor_taylor(X)
X |
a numeric matrix with number of rows bigger than the number of columns |
Taylor's correlation coefficient, a number between 0 and 1 expressing the amount of dependence between multiple variables.
A function to
cospulse(x, tau = pi)
cospulse(x, tau = pi)
x |
X |
tau |
pulse duration |
...
Generic function for model diagnostics.
dplot(mod, ...)
dplot(mod, ...)
mod |
an object |
... |
additional arguments |
method dplot
Function for producing diagnostic plots for linear models. Points are identified as being outliers, of high leverage and high influence. The QQ plot has a confidence band. A plot of leverage vs fitted is given. The plot of Studentised residuals versus leverage includes along with standard thresholds (at Cook's distance 0.5 and 1) an additional band highlighting influential observations, whose Cook's distance exceed 8/(n-2p), where n is the number of observations and p is the number of parameters. The respective threshold for outliers are set, by default, as those observations whose standardised residuals exceed 2. Obervations are declared as having high leverage if their value exceeds 2p/n.
## S3 method for class 'lm' dplot( mod, pch = 19, outlier.threshold = 2, leverage_threshold = function(n, p) { return(2 * p/n) }, influence_threshold = function(n, p) { return(8/(n - 2 * p)) }, ibands = c(0.5, 1), ... )
## S3 method for class 'lm' dplot( mod, pch = 19, outlier.threshold = 2, leverage_threshold = function(n, p) { return(2 * p/n) }, influence_threshold = function(n, p) { return(8/(n - 2 * p)) }, ibands = c(0.5, 1), ... )
mod |
an object of class 'lm' |
pch |
the type of point to use, passed to 'plot', the default being 19 |
outlier.threshold |
threshold on standardised residuals to declare an outlier, default is 2 |
leverage_threshold |
threshold on leverage to be classed as "high leverage", a function of (n,p), the default being 2p/n |
influence_threshold |
threshold on influence to be classed as "high influence", a function (n,p), the default being 2p/n |
ibands |
specifying thresholds at which to discplay Cook's distance on the Studentised residuals vs leverage plot. Default is at 0.5 and 1 |
... |
additional arguments, not used as yet |
...
A function to perform one iteration of ther EKF. Currently UNDER DEVELOPMENT.
EKFadvance( obs, oldmean, oldvar, phi, phi.arglist, psi, psi.arglist, W, V, loglik = FALSE, na.rm = FALSE )
EKFadvance( obs, oldmean, oldvar, phi, phi.arglist, psi, psi.arglist, W, V, loglik = FALSE, na.rm = FALSE )
obs |
observations |
oldmean |
old mean |
oldvar |
old variance |
phi |
Function computing a Taylor Series approximation of the system equation. Can include higher (ie 2nd order and above) terms. |
phi.arglist |
arguments for function phi |
psi |
Function computing a Taylor Series approximation of the observation equation. Can include higher (ie 2nd order and above) terms. |
psi.arglist |
arguments for function psi |
W |
system noise matrix |
V |
observation noise matrix |
loglik |
whether or not to compute the pseudo-likelihood |
na.rm |
logical, whether or not to handle NAs. Defult is FALSE. Set to TRUE if there are any missing values in the observed data. |
list containing the new mean and variance, and if specified, the likelihood
A function to forecast forwards using MCMC samples from the bayesGARCH function from the bayesGARCH package.
fcastGARCH(y, parmat, l)
fcastGARCH(y, parmat, l)
y |
vector of log-returns used in fitting the model via bayesGARCH |
parmat |
a matrix of MCMC samples from the bayesGARCH function e.g. "out$chain1" where "out" is the output of the fitted model and "chain1" is the desired chain |
l |
number of lags to forecast forward |
Suggest thinning MCMC samples to get, say 1000, posterior samples (this can be done post-hoc)
See also the function lr2fact for converting log-returns to a factor. Apply this to the output of fcastGARCH in order to undertake forecasting on the scale of the original series (i.e. not the log returns). Quantiles may be computed across the MCMC iterations and then all one needs to do is to multiply the result by the last observed value in the original series (again, not the log returns)
forcast log returns and also forecast y
A function to generate roxygen templates for generic funtions and associated methods.
generic(gen, methods = NULL, sp = 3, oname = "obj")
generic(gen, methods = NULL, sp = 3, oname = "obj")
gen |
character string giving the name of an S3 generic. |
methods |
character vector: a list of methods for which to provide templates |
sp |
the amont of space to put in between functions |
oname |
name of the generic object |
roxygen text printed to the console.
A function to create harmonic terms ready for a harmonic regression model to be fitted.
genharmonic( df, tname, base, num, sinfun = sin, cosfun = cos, sname = "s", cname = "c", power = FALSE )
genharmonic( df, tname, base, num, sinfun = sin, cosfun = cos, sname = "s", cname = "c", power = FALSE )
df |
a data frame |
tname |
a character string, the name of the time variable. Note this variable will be converted using the function as.numeric |
base |
the period of the first harmonic e.g. for harmonics at the sub-weekly level, one might set base=7 if time is measured in days |
num |
the number of harmonic terms to return |
sinfun |
function to compute sin-like components in model. Default is sin, but alternatives include sintri, or any other periodic function defined on [0,2pi] |
cosfun |
function to compute sin-like components in model. Default is cos, but alternatives include costri, or any other periodic function defined on [0,2pi] offset to sinfun by pi/2 |
sname |
the prefix of the sin terms, default 's' returns variables 's1', 's2', 's3' etc. |
cname |
the prefix of the cos terms, default 's' returns variables 's1', 's2', 's3' etc. |
power |
logical, if FALSE (the default) it will return the standard Fourier series with sub-harmonics at 1, 1/2, 1/3, 1/4 of the base periodicicy. If TRUE, a power series will be used instead, with harmonics 1, 1/2, 1/4, 1/8 etc. of the base frequency. |
a data frame with the time variable in numeric form and the harmonic components
A function to generate basis vectors for integrated Fourier series.
genIntegratedharmonic( df, t1name, t2name, base, num, sname = "bcoef", cname = "acoef", power = FALSE )
genIntegratedharmonic( df, t1name, t2name, base, num, sname = "bcoef", cname = "acoef", power = FALSE )
df |
a data frame containing a numeric time variable of interest |
t1name |
a character string, the name of the variable in df containing the start time of the intervals |
t2name |
a character string, the name of the variable in df containing the end time of the intervals |
base |
the fundamental period of the signal, e.g. if it repeats over 24 hours and time is measured in hours, then put 'base = 24'; if the period is 24 hours but time is measured in days, then use 'base = 1/7' |
num |
number of sin and cosine terms to compute |
sname |
character string, name for cosine terms in Fourier series (not integrated) |
cname |
character string, name for sine terms in Fourier series (not integrated) |
power |
legacy functionality, not used here |
If the non-integrated Fourier series is:
f(t) = sum_k a_k sin(2 pi k t / P) + b_k cos(2 pi k t / P)
then
int_t1^t2 f(s) ds = sum_k a_k (base/(2 pi k))*(cos(2 pi k t1 / P) - cos(2 pi k t2 / P)) +
b_k (base/(2 pi k))*(sin(2 pi k t2 / P)-sin(2 pi k t1 / P))
where P is the funcamental period, or 'base', as referred to in the function arguments
a data frame containing the start and end time vectors, together with the sin and cosine terms
A function used in web scraping. Used to simplify the searching of HTML strings for information.
getstrbetween(linedata, start, startmark, endmark, include = FALSE)
getstrbetween(linedata, start, startmark, endmark, include = FALSE)
linedata |
a string |
start |
integer, where to start looking in linedata |
startmark |
character string. a pattern identifying the start mark |
endmark |
character string. a pattern identifying the end mark |
include |
include the start and end marks? |
the first string after start and between the start and end marks
A function to return the lat/lon coordinates of towns in the UK from Wikipedia. Does not always work. Sometimes the county has to be specified too.
getwikicoords(place, county = NULL, rmslash = TRUE)
getwikicoords(place, county = NULL, rmslash = TRUE)
place |
character, ther name of the town |
county |
character, the county it is in |
rmslash |
remove slash from place name. Not normally used. |
The lat/lon coordinates from Wikipedia
A function used in the forecasting of GARCH(1,1) models
hCreate(pars, y, T = length(y))
hCreate(pars, y, T = length(y))
pars |
parameters for the GARCH model, these would come from an MCMC run |
y |
vector of log returns |
T |
this is the length of y; allow this to be pre-computed |
vector of h's
A function to compute one step of the Kalman filter. Embed in a loop to run the filter on a set of data.
KFadvance( obs, oldmean, oldvar, A, B, C, D, E, F, W, V, marglik = FALSE, log = TRUE, na.rm = FALSE )
KFadvance( obs, oldmean, oldvar, A, B, C, D, E, F, W, V, marglik = FALSE, log = TRUE, na.rm = FALSE )
obs |
Y(t) |
oldmean |
mu(t-1) |
oldvar |
Sigma(t-1) |
A |
matrix A |
B |
column vector B |
C |
matrix C |
D |
matrix D |
E |
column vector E |
F |
matrix F |
W |
state noise covariance |
V |
observation noise covariance |
marglik |
logical, whether to return the marginal likelihood contribution from this observation |
log |
whether or not to return the log of the likelihood contribution. |
na.rm |
na.rm logical, whether or not to handle NAs. Defult is FALSE. Set to TRUE if there are any missing values in the observed data. |
The model is: (note that Y and theta are COLUMN VECTORS)
theta(t) = A*theta(t-1) + B + C*W (state equation)
Y(t) = D*theta(t) + E + F*V (observation equation)
W and V are the covariance matrices of the state and observation noise. Prior is normal,
N(mu(t-1),Sigma(t-1))
Result is the posterior, N(mu(t),Sigma(t)), together with the likelihood contribution Prob(Y(t)|Y(t-1))
list containing the new mean and variance, and if specified, the likelihood
A function to compute one step of the Kalman filter with second order AR state evolution. Embed in a loop to run the filter on a set of data.
KFadvanceAR2( obs, oldmean, oldermean, oldvar, oldervar, A, A1, B, C, D, E, F, W, V, marglik = FALSE, log = TRUE, na.rm = FALSE )
KFadvanceAR2( obs, oldmean, oldermean, oldvar, oldervar, A, A1, B, C, D, E, F, W, V, marglik = FALSE, log = TRUE, na.rm = FALSE )
obs |
Y(t) |
oldmean |
mu(t-1) |
oldermean |
mu(t-2) |
oldvar |
Sigma(t-1) |
oldervar |
Sigma(t-2) |
A |
A matrix A |
A1 |
A matrix A1 |
B |
column vector B |
C |
matrix C |
D |
matrix D |
E |
column vector E |
F |
matrix F |
W |
state noise covariance |
V |
observation noise covariance |
marglik |
logical, whether to return the marginal likelihood contribution from this observation |
log |
whether or not to return the log of the likelihood contribution. |
na.rm |
na.rm logical, whether or not to handle NAs. Defult is FALSE. Set to TRUE if there are any missing values in the observed data. |
The model is: (note that Y and theta are COLUMN VECTORS)
theta(t) = A*theta(t-1) + A1*theta(t-2) + B + C*W (state equation)
Y(t) = D*theta(t) + E + F*V (observation equation)
W and V are the covariance matrices of the state and observation noise. Priors are normal,
N(mu(t-1),Sigma(t-1)) and N(mu(t-2),Sigma(t-2))
Result is the posterior, N(mu(t),Sigma(t)), together with the likelihood contribution Prob(Y(t)|Y(t-1))
list containing the new mean and variance, and if specified, the likelihood
A function to print KFfit and KFparest templates to the console. See vignette("miscFuncs") for more information
KFtemplates()
KFtemplates()
Tust prints to the console. This can be copied and pasted into a text editor for further manipulation.
A function to format text or numeric variables using scientific notation for LaTeX documents.
latexformat(x, digits = 3, scientific = -3, ...)
latexformat(x, digits = 3, scientific = -3, ...)
x |
a numeric, or character |
digits |
see ?format |
scientific |
see ?format |
... |
other arguments to pass to the function format |
...
A very useful function to create a LaTeX table from a matrix. Rounds numeric entries and also replaces small numbers with standard index form equivalents.
latextable( x, digits = 3, scientific = -3, colnames = NULL, rownames = NULL, caption = NULL, narep = " ", laststr = "", intable = TRUE, manualalign = NULL, file = "", ... )
latextable( x, digits = 3, scientific = -3, colnames = NULL, rownames = NULL, caption = NULL, narep = " ", laststr = "", intable = TRUE, manualalign = NULL, file = "", ... )
x |
a matrix, or object that can be coerced to a matrix. x can include mixed character and numeric entries. |
digits |
see help file for format |
scientific |
see help file for format |
colnames |
optional column names set to NULL (default) to automatically use column names of x. NOTE! if rownames is not NULL present, colnames must include an entry for the rownames i.e. it should be a vector of length the number of columns of x plus 1. |
rownames |
optional row names set to NULL (default) to automatically use row names of x |
caption |
optional caption, not normally used |
narep |
string giving replacement for NA entries in the matrix |
laststr |
string to write at end, eg note the double backslash!! |
intable |
output in a table environment? |
manualalign |
manual align string e.g. 'ccc' or 'l|ccc' |
file |
connection to write to, default is ” which writes to the console; see ?write for further details |
... |
additional arguments passed to format |
To get a backslash to appear, use a double backslash
Just copy and paste the results into your LaTeX document.
prints the LaTeX table to screen, so it can be copied into reports
latextable(as.data.frame(matrix(1:4,2,2)))
latextable(as.data.frame(matrix(1:4,2,2)))
Apply this to the output of fcastGARCH in order to undertake forecasting on the scale of the original series (i.e. not the log returns). Quantiles may be computed across the MCMC iterations and then all one needs to do is to multiply the result by the last observed value in the original series (again, not the log returns)
lr2fact(mod)
lr2fact(mod)
mod |
the output of fcastGARCH |
the multiplicative factors.
A function to generate a roxygen template for a method of a generic S3 function. Normally, this would be called from the function generic, see ?generic
method(meth, gen, oname = "obj")
method(meth, gen, oname = "obj")
meth |
character, the name of the method |
gen |
character the associated generic method |
oname |
name of object |
a roxygen template for the method.
A function to print details of the 2 by 2 table for use with the function twotwoinfo.
print22()
print22()
prints the names of the arguments of twotwofunction info to screen in their correct place in the 2 by 2 table
A function to compare quantiles of a given vector against quantiles of a specified distribution. The function outputs simulation-based confidence intervals too. The option of zero-ing the plot (rather than visualising a diagonal line (which can be difficult to interpret) and also standardising (so that varying uncertainty around each quantile appears equal to the eye) are also given.
qqci( x, rfun = NULL, y = NULL, ns = 100, zero = FALSE, standardise = FALSE, qts = c(0.025, 0.975), llwd = 2, lcol = "red", xlab = "Theoretical", ylab = "Sample", alpha = 0.02, cicol = "black", cilwd = 1, ... )
qqci( x, rfun = NULL, y = NULL, ns = 100, zero = FALSE, standardise = FALSE, qts = c(0.025, 0.975), llwd = 2, lcol = "red", xlab = "Theoretical", ylab = "Sample", alpha = 0.02, cicol = "black", cilwd = 1, ... )
x |
a vector of values to compare |
rfun |
a function accepting a single argument to generate samples from the comparison distribution, the default is rnorm |
y |
an optional vector of samples to compare the quantiles against. In the case this is non-null, the function rfun will be automatically chosen as bootstrapping y with replacement and sample size the same as the length of x. You must specify exactly one of rfun or y. |
ns |
the number of simulations to generate: the more simulations, the more accurate the confidence bands. Default is 100 |
zero |
logical, whether to zero the plot across the x-axis. Default is FALSE |
standardise |
logical, whether to standardise so that the variance around each quantile is made constant (this can help in situations where the confidence bands appear very tight in places) |
qts |
vector of probabilities giving which sample-based empirical quantiles to add to the plot. Default is c(0.025,0.975) |
llwd |
positive numeric, the width of line to plot, default is 2 |
lcol |
colour of line to plot, default is red |
xlab |
character, the label for the x-axis |
ylab |
character, the label for the y-axis |
alpha |
controls transparency of samples (coloured blue) |
cicol |
colour of confidence band lines, default is black |
cilwd |
width of confidence band lines, default is 1 |
... |
additional arguments to pass to matplot |
Produces a QQ-plot with simulation-based confidence bands
qqci(rnorm(1000)) qqci(rnorm(1000),zero=TRUE) qqci(rnorm(1000),zero=TRUE,standardise=TRUE)
qqci(rnorm(1000)) qqci(rnorm(1000),zero=TRUE) qqci(rnorm(1000),zero=TRUE,standardise=TRUE)
A function to build and check packages where documentation has been compiled with roxygen. Probably only works in Linux.
roxbc(name, checkflags = "--as-cran")
roxbc(name, checkflags = "--as-cran")
name |
package name |
checkflags |
string giving optional check flags to R CMD check, default is –as-cran |
builds and checks the package
A function to build packages where documentation has been compiled with roxygen. Probably only works in Linux.
roxbuild(name)
roxbuild(name)
name |
package name |
builds and checks the package
A function to generate roxygen documentation templates for functions for example,
roxtext(fname)
roxtext(fname)
fname |
the name of a function as a character string or as a direct reference to the function |
would generate a template for this function. Note that functions with default arguments that include quotes will throw up an error at the moment, just delete these bits from the string, and if shold work.
minimal roxygen template
A function to
sinpulse(x, tau = pi)
sinpulse(x, tau = pi)
x |
X |
tau |
pulse duration |
...
A function to time an operation in R
timeop(expr)
timeop(expr)
expr |
an expression to evaluate |
The time it took to evaluate the expression in seconds
A function to compute and diplay information about 2 by 2 tables for copying into LaTeX documents. Computes odds ratios and relative risks together with confidence intervals for 2 by 2 table and prints to screen in LaTeX format. The funciton will try to fill in any missing values from the 2 by 2 table. Type print22() at the console to see what each argument refers to.
twotwoinfo( e1 = NA, u1 = NA, o1t = NA, e2 = NA, u2 = NA, o2t = NA, et = NA, ut = NA, T = NA, lev = 0.95, LaTeX = TRUE, digits = 3, scientific = -3, ... )
twotwoinfo( e1 = NA, u1 = NA, o1t = NA, e2 = NA, u2 = NA, o2t = NA, et = NA, ut = NA, T = NA, lev = 0.95, LaTeX = TRUE, digits = 3, scientific = -3, ... )
e1 |
type print22() at the console |
u1 |
type print22() at the console |
o1t |
type print22() at the console |
e2 |
type print22() at the console |
u2 |
type print22() at the console |
o2t |
type print22() at the console |
et |
type print22() at the console |
ut |
type print22() at the console |
T |
type print22() at the console |
lev |
significance level for confidence intervals. Default is 0.95 |
LaTeX |
whether to print the 2 by 2 information as LaTeX text to the screen, including the table, odds ratio, relative risk and confidence intervals |
digits |
see ?format |
scientific |
see ?format |
... |
other arguments passed to function format |
Computes odds ratios and relative risks together with confidence intervals for 2 by 2 table and prints to screen in LaTeX format.
A function to generate a Van der Corput sequence of numbers.
vdc(n)
vdc(n)
n |
the length of the sequence |
Van der Corput sequence of length n
Function to calculate the variance inflation factor for each variable in a linear regression model.
vif(mod)
vif(mod)
mod |
an object of class 'lm' |
...
A function to perform forecasting of the series, used by fcastGARCH
yhIterate(i, current, pars, eps, omega)
yhIterate(i, current, pars, eps, omega)
i |
the index of the forward lags |
current |
current matrix of (y,h) |
pars |
parameters for the GARCH model, these would come from an MCMC run |
eps |
matrix of Gaussian noise, dimension equal to number of MCMC iterations by the number of forecast lags |
omega |
matrix of Inverse Gamma noise, dimension equal to number of MCMC iterations by the number of forecast lags |
two column matrix containing forecast y (1st column) and updated h (2nd column)