Title: | HMM-Based Model for Genotyping and Cross-Over Identification |
---|---|
Description: | Our method integrates information from all sequenced samples, thus avoiding loss of alleles due to low coverage. Moreover, it increases the statistical power to uncover sequencing or alignment errors <doi:10.1093/plphys/kiad191>. |
Authors: | Rafael Campos-Martin [cre] |
Maintainer: | Rafael Campos-Martin <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.0 |
Built: | 2025-02-22 04:11:54 UTC |
Source: | https://github.com/rfael0cm/rtiger |
The autosome chromosome lengths for Arabidopsis Thaliana.
Rafael Campos-Martin
Obtain number of Cross-Over events per sample and chromosome.
calcCOnumber(object)
calcCOnumber(object)
object |
a RViterbi object. |
Matrix m x n. M number of samples and N chromosomes.
#' @return a matrix with n chromosomes and m samples (n x m) and the number of CO events.
data("fittedExample") co.num = calcCOnumber(myDat)
data("fittedExample") co.num = calcCOnumber(myDat)
Function to developers. It runs one EM step
dev(psi, rigidity = NULL, nstates = 3, transition = NULL, start = NULL)
dev(psi, rigidity = NULL, nstates = 3, transition = NULL, start = NULL)
psi |
list of psi probabilities. |
rigidity |
Rigidity value. |
nstates |
Number of states. |
transition |
transition matrix |
start |
initial probabilities |
List with updates probabilites
Call Julia code to fit the values
fit(rtigerobj, max.iter , eps, trace, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE)
fit(rtigerobj, max.iter , eps, trace, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE)
rtigerobj |
an RTIGER object. |
max.iter |
maximum number of iterations to acomplish by the EM. |
eps |
differnece threshold to halt the EM. |
trace |
logical value whether to trace the changes in the parameters along the iterations. |
all |
logical value whether to use all data to fit the model. |
random |
if all FALSE use random samples. |
specific |
if all FALSE use specific samples. |
nsamples |
if random TRUE, how many samples to use. |
post.processing |
logical value, whether to run post.processing process. |
RTIGER object
## Not run: data("fittedExample") sourceJulia() myfit = fit(myDat, max.iter = 2, eps=0.01, trace = TRUE, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE) ## End(Not run)
## Not run: data("fittedExample") sourceJulia() myfit = fit(myDat, max.iter = 2, eps=0.01, trace = TRUE, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE) ## End(Not run)
Load data
generateObject(experimentDesign = NULL,nstates = 3, rigidity=NULL, seqlengths = NULL, verbose = TRUE)
generateObject(experimentDesign = NULL,nstates = 3, rigidity=NULL, seqlengths = NULL, verbose = TRUE)
experimentDesign |
a data Frame that contains minimum a column with the files direction (name of the column files) and another with a shorter name to be used inside the function. |
nstates |
the number of states to be fitted in the model. A standard setting would use 3 states (Homozygous1, Heterozygous, and Homozygous2). |
rigidity |
an integer number specifying the rigidity parameter to be used. |
seqlengths |
a named vector with the chromosome lenghts of the organism that the user is working with. |
verbose |
logical value. Whether to print info messages. |
RTIGER object
data("ATseqlengths") path = system.file("extdata", package = "RTIGER") files = list.files(path, full.names = TRUE) nam = sapply(list.files(path ), function(x) unlist(strsplit(x, split = "[.]"))[1]) expDesign = data.frame(files = files, name = nam) names(ATseqlengths) = paste0("Chr", 1:5) myres = generateObject(experimentDesign = expDesign, seqlengths = ATseqlengths, rigidity = 10 )
data("ATseqlengths") path = system.file("extdata", package = "RTIGER") files = list.files(path, full.names = TRUE) nam = sapply(list.files(path ), function(x) unlist(strsplit(x, split = "[.]"))[1]) expDesign = data.frame(files = files, name = nam) names(ATseqlengths) = paste0("Chr", 1:5) myres = generateObject(experimentDesign = expDesign, seqlengths = ATseqlengths, rigidity = 10 )
A fitted example using three own samples of Arabidopsis. More information in publication:
Rafael Campos-Martin
Find the otimum R value for a given data set
optimize_R(object, max_rigidity = 2^9, average_coverage = NULL, crossovers_per_megabase = NULL, save_it = FALSE, savedir = NULL)
optimize_R(object, max_rigidity = 2^9, average_coverage = NULL, crossovers_per_megabase = NULL, save_it = FALSE, savedir = NULL)
object |
an RTIGER object |
max_rigidity |
R values will be explored up the value given in this parameter. Default = 2^9 |
average_coverage |
For conservative results set it to the lowest average coverage of a sample in your experiment, or evne to the lowest average coverage in a (sufficiently large) region in one of your samples. The lower the value, the more conservative (higher) our estimates of the false positive segments rates. If it is not provided it will be computed as the average of all data points. |
crossovers_per_megabase |
For conservative results set it to the highest ratio of a sample in your experiment. The higher the value, the more conservative (higher) our estimates of the false positive segments rates. If it is not provided it will be computed as the average of all samples. |
save_it |
logical values if the results should be saved. Plots might be complicated to interpret. We suggest to read the manuscript to understand them (https://doi.org/10.1093/plphys/kiad191) |
savedir |
if results are saved, in which directory. |
A value with the optimum rigidity for the data set.
data("fittedExample") bestR = optimize_R(myDat)
data("fittedExample") bestR = optimize_R(myDat)
Obtain number of Cross-Over events per sample and chromosome.
plotCOs(object, file = NULL)
plotCOs(object, file = NULL)
object |
a RViterbi object. |
file |
file where to save the plot for CO numbers |
a plot
data("fittedExample") co.num = calcCOnumber(myDat)
data("fittedExample") co.num = calcCOnumber(myDat)
Load, Fit, and plot
RTIGER(expDesign, rigidity=NULL, outputdir=NULL, nstates = 3, seqlengths = NULL, eps=0.01, max.iter=50, autotune = FALSE, max_rigidity = 2^9, average_coverage = NULL, crossovers_per_megabase = NULL, trace = FALSE, tiles = 4e5, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE, save.results = TRUE, verbose = TRUE)
RTIGER(expDesign, rigidity=NULL, outputdir=NULL, nstates = 3, seqlengths = NULL, eps=0.01, max.iter=50, autotune = FALSE, max_rigidity = 2^9, average_coverage = NULL, crossovers_per_megabase = NULL, trace = FALSE, tiles = 4e5, all = TRUE, random = FALSE, specific = FALSE, nsamples = 20, post.processing = TRUE, save.results = TRUE, verbose = TRUE)
expDesign |
a data Frame that contains minimum a column with the files direction (name of the column files) and another with a shorter name to be used inside the function. |
rigidity |
an integer number specifying the rigidity parameter to be used. |
outputdir |
a character string that specifies the directory in which to save the results form the function. |
nstates |
the number of states to be fitted in the model. A standard setting would use 3 states (Homozygous1, Heterozygous, and Homozygous2). |
seqlengths |
a named vector with the chromosome lenghts of the organism that the user is working with. |
eps |
the threshold of the difference between the parameters value between the previous and actuay iteration to stope de EM algorithm. |
max.iter |
maximum number of iterations of the EM algorithm before to stop in case that eps has not been achieved. |
autotune |
Logical value if the R-value should be tuned by our algorithm. This will take longer as it needs a first training with the rigidity value provided by the user and then the optimization step is carried. Finally, a training using the optimum R will be performed and results for the optimum R will be returned. |
max_rigidity |
If autotune true, R values will be explored up the value given in this parameter. Default = 2^9 |
average_coverage |
If autotune true, for conservative results set it to the lowest average coverage of a sample in your experiment, or evne to the lowest average coverage in a (sufficiently large) region in one of your samples. The lower the value, the more conservative (higher) our estimates of the false positive segments rates. If it is not provided it will be computed as the average of all data points. |
crossovers_per_megabase |
If autotune true, for conservative results set it to the highest ratio of a sample in your experiment. The higher the value, the more conservative (higher) our estimates of the false positive segments rates. If it is not provided it will be computed as the average of all samples. |
trace |
logical value. Whether or not to keep track of the parameters for the HMM along the iterations. Deafault FALSE |
tiles |
length of the tiles by which the genome will be segmented in order to compute the ratio of COs in the complete dataset. |
all |
logical value. Whether to use the complete data set to fit the rHMM. default TRUE. |
random |
Logical value. Choose randomly a subset of the complete dataset to fit the rHMM. Default FALSE |
specific |
Logical value to specify which samples to take. |
nsamples |
if random TRUE, how many samples should be taken randomly. |
post.processing |
Logical value. Whether to run an extra step that fine maps the segment borthers. Default TRUE |
save.results |
Logical value, whether to generate and save the plots and igv files. |
verbose |
Logical, whether to print info to console. |
Matrix m x n. M number of samples and N chromosomes.
RTIGER object
## Not run: data("ATseqlengths") sourceJulia() path = system.file("extdata", package = "RTIGER") files = list.files(path, full.names = TRUE) nam = sapply(list.files(path ), function(x) unlist(strsplit(x, split = "[.]"))[1]) expDesign = data.frame(files = files, name = nam) names(ATseqlengths) = paste0("Chr", 1:5) myres = RTIGER(expDesign = expDesign, outputdir = "/home/campos/Documents/outputjulia/", seqlengths = ATseqlengths, rigidity = 4, max.iter = 2, trace = FALSE, save.results = TRUE) ## End(Not run)
## Not run: data("ATseqlengths") sourceJulia() path = system.file("extdata", package = "RTIGER") files = list.files(path, full.names = TRUE) nam = sapply(list.files(path ), function(x) unlist(strsplit(x, split = "[.]"))[1]) expDesign = data.frame(files = files, name = nam) names(ATseqlengths) = paste0("Chr", 1:5) myres = RTIGER(expDesign = expDesign, outputdir = "/home/campos/Documents/outputjulia/", seqlengths = ATseqlengths, rigidity = 4, max.iter = 2, trace = FALSE, save.results = TRUE) ## End(Not run)
This class is a generic container for RTIGER analysis
matobs
Nested lists. the first level is a list of samples. For each sample there are 5 matrices that contains the allele counts for each position.
params
a list with the parameters after training.
info
List with phenotipic data of the samples.
Viterbi
List of chromosomes with the viterbi path per sample.
Probabilities
Computed probabilites for the EM algorithm.
num.iter
Number of iterations needed to stop the EM algorithm.
Installs the needed packages in JULIA to run the EM algorithm for rHMM.
setupJulia(JULIA_HOME = NULL)
setupJulia(JULIA_HOME = NULL)
JULIA_HOME |
the file folder which contains julia binary, if not set, JuliaCall will look at the global option JULIA_HOME, if the global option is not set, JuliaCall will then look at the environmental variable JULIA_HOME, if still not found, JuliaCall will try to use the julia in path. |
empty
Function needed before using RTIGER() function. It loads the scripts in Julia that fit the rHMM.
sourceJulia()
sourceJulia()
empty