Title: | Quantification of Fate Bias in Multipotent Progenitors |
---|---|
Description: | Application of 'FateID' allows computation and visualization of cell fate bias for multi-lineage single cell transcriptome data. Herman, J.S., Sagar, Grün D. (2018) <DOI:10.1038/nmeth.4662>. |
Authors: | Dominic Grün <[email protected]> |
Maintainer: | Dominic Grün <[email protected]> |
License: | GPL-3 |
Version: | 0.2.2 |
Built: | 2024-11-02 04:13:47 UTC |
Source: | https://github.com/dgrun/fateid |
This function computes dimensional reduction representations to a specified number of dimensions using a number of different algorithms: t-SNE, cmd, diffusion maps, umap
compdr( x, z = NULL, m = c("tsne", "cmd", "umap"), k = 2, tsne.perplexity = 30, umap.pars = NULL, seed = 12345 )
compdr( x, z = NULL, m = c("tsne", "cmd", "umap"), k = 2, tsne.perplexity = 30, umap.pars = NULL, seed = 12345 )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
z |
Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is |
m |
a vector of dimensional reduction representations to be computed. The following representations can be computed: |
k |
vector of integers representing the dimensions for which the dimensional reduction representations will be computed. Default value is |
tsne.perplexity |
positive number. Perplexity used in the t-SNE computation. Default value is 30. |
umap.pars |
umap parameters. See umap package, |
seed |
integer seed for initialization. If equal to |
A two-dimensional list with the dimensional reduction representation stored as data frames as components. Component names for the first dimension are given by one of the following algorithms:
cmd |
classical multidimensional scaling computed by the |
tsne |
t-SNE map computed by the |
umap |
umap computed by the |
Component names of the second dimension are a concatenation of a capital D and an integer number of the dimension. There is one component for each dimension in k
.
x <- intestine$x dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) plot(dr[["cmd"]][["D2"]],pch=20,col="grey")
x <- intestine$x dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) plot(dr[["cmd"]][["D2"]],pch=20,col="grey")
This function performs differential expression analysis between two sets of single cell transcriptomes. The inference is based on a noise model or relies on the DESeq2
approach.
diffexpnb( x, A, B, DESeq = FALSE, method = "pooled", norm = FALSE, vfit = NULL, locreg = FALSE, ... )
diffexpnb( x, A, B, DESeq = FALSE, method = "pooled", norm = FALSE, vfit = NULL, locreg = FALSE, ... )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. This input has to be provided if |
A |
vector of cell IDs corresponding column names of |
B |
vector of cell IDs corresponding column names of |
DESeq |
logical value. If |
method |
either "per-condition" or "pooled". If DESeq is not used, this parameter determines, if the noise model is fitted for each set separately ("per-condition") or for the pooled set comprising all cells in |
norm |
logical value. If |
vfit |
function describing the background noise model. Inference of differentially expressed genes can be performed with a user-specified noise model describing the expression variance as a function of the mean expression. Default value is |
locreg |
logical value. If |
... |
additional arguments to be passed to the low level function |
If DESeq
equals TRUE
, the function returns the output of DESeq2. In this case list of the following two components is returned:
cds |
object returned by the DESeq2 function |
res |
data frame containing the results of the DESeq2 analysis. |
Otherwise, a list of three components is returned:
vf1 |
a data frame of three columns, indicating the mean |
vf2 |
a data frame of three columns, indicating the mean |
res |
a data frame with the results of the differential gene expression analysis with the structure of the |
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) thr <- .3 A <- rownames(fb$probs)[fb$probs[,"t6"] > .3] B <- rownames(fb$probs)[fb$probs[,"t13"] > .3] de <- diffexpnb(v,A=A,B=B)
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) thr <- .3 A <- rownames(fb$probs)[fb$probs[,"t6"] > .3] B <- rownames(fb$probs)[fb$probs[,"t13"] > .3] de <- diffexpnb(v,A=A,B=B)
This function computes fate biases for single cells based on expression data from a single cell sequencing experiment. It requires a clustering partition and a target cluster representing a commited state for each trajectory.
fateBias( x, y, tar, z = NULL, minnr = NULL, minnrh = NULL, adapt = TRUE, confidence = 0.75, nbfactor = 5, use.dist = FALSE, seed = NULL, nbtree = NULL, verbose = FALSE, ... )
fateBias( x, y, tar, z = NULL, minnr = NULL, minnrh = NULL, adapt = TRUE, confidence = 0.75, nbfactor = 5, use.dist = FALSE, seed = NULL, nbtree = NULL, verbose = FALSE, ... )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
tar |
vector of integers representing target cluster numbers. Each element of |
z |
Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is |
minnr |
integer number of cells per target cluster to be selected for classification (test set) in each iteration. For each target cluster, the |
minnrh |
integer number of cells from the training set used for classification. From each training set, the |
adapt |
logical. If |
confidence |
real number between 0 and 1. See |
nbfactor |
positive integer number. Determines the number of trees grown for each random forest. The number of trees is given by the number of columns of th training set multiplied by |
use.dist |
logical value. If |
seed |
integer seed for initialization. If equal to |
nbtree |
integer value. If given, it specifies the number of trees for each random forest explicitely. Default is |
verbose |
logical. If |
... |
additional arguments to be passed to the low level function |
The bias is computed as the ratio of the number of random forest votes for a trajectory and the number of votes for the trajectory with the second largest number of votes. By this means only the trajectory with the largest number of votes will receive a bias >1. The siginifcance is computed based on counting statistics on the difference in the number of votes. A significant bias requires a p-value < 0.05. Cells are assigned to a trajectory if they exhibit a significant bias >1 for this trajectory.
A list with the following three components:
probs |
a data frame with the fraction of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of |
votes |
a data frame with the number of random forest votes for each cell. Columns represent the target clusters. Column names are given by a concatenation of |
tr |
list of vectors. Each component contains the IDs of all cells on the trajectory to a given target cluster. Component names are given by a concatenation of |
rfl |
list of randomForest objects for each iteration of the classification. |
trall |
vector of cell ids ordered by the random forest iteration in which they have been classified into one of the target clusters. |
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,minnr=5,minnrh=20,adapt=TRUE,confidence=0.75,nbfactor=5) head(fb$probs)
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,minnr=5,minnrh=20,adapt=TRUE,confidence=0.75,nbfactor=5) head(fb$probs)
This function discards lowly expressed genes from the expression data frame.
filterset(x, n = NULL, minexpr = 2, minnumber = 1)
filterset(x, n = NULL, minexpr = 2, minnumber = 1)
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. |
n |
ordered vector of cell IDs to be included. Cell IDs need to be column names of |
minexpr |
positive real number. This is the minimum expression required for at least |
minnumber |
positive integer number. This is the minimum number of cells in which a gene needs to be expressed at least at a level of |
Reduced expression data frame with genes as rows and cells as columns in the same order as in n
.
This function produces a scatter plot of the expression levels of two genes. It allows plotting cells of selected clusters and permits highlighting of the fate bias.
gene2gene( x, y, g1, g2, clusters = NULL, fb = NULL, tn = NULL, col = NULL, tp = 1, plotnum = TRUE, seed = 12345 )
gene2gene( x, y, g1, g2, clusters = NULL, fb = NULL, tn = NULL, col = NULL, tp = 1, plotnum = TRUE, seed = 12345 )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
g1 |
gene id corresponding to a valid row names of x. Expression of gene |
g2 |
gene id corresponding to a valid row names of x. Expression of gene |
clusters |
vector of valid cluster ids. Expression is displayed for cells in any of the clusters contained in |
fb |
fateBias object returned by the function |
tn |
name of a target cluster, i. e. concatenation of a |
col |
optional vector of valid color names for all clusters in |
tp |
Transparency of points in the plot. Default value is 1, i. e. non-transparent. |
plotnum |
logical value. If |
seed |
integer number. Random seed for determining colour scheme. Default is 12345. |
None
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) gene2gene(v,y,"Muc2__chr7","Apoa1__chr9") gene2gene(v,y,"Muc2__chr7","Apoa1__chr9",fb=fb,tn="t6",plotnum=FALSE)
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) gene2gene(v,y,"Muc2__chr7","Apoa1__chr9") gene2gene(v,y,"Muc2__chr7","Apoa1__chr9",fb=fb,tn="t6",plotnum=FALSE)
This function performs a feature selection based on the inference of differentially expressed genes between each target cluster and all remaining cells.
getFeat(x, y, tar, fpv = 0.05, ...)
getFeat(x, y, tar, fpv = 0.05, ...)
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
tar |
vector of integers representing target cluster numbers. Each element of |
fpv |
p-value cutoff for calling differentially expressed genes. This is a cutoff for the Benjamini-Hochberg corrected false discovery rate. Default value is 0.05. |
... |
additional arguments to be passed to the low level function |
The function determines differentially expressed between the cells in each of the target clusters in comparison to the remaining cells by using diffexpnb
function.
A filtered expression table with features extracted based on differentially expressed genes.
x <- intestine$x y <- intestine$y tar <- c(6,9,13) xf <- getFeat(x,y,tar,fpv=.05)
x <- intestine$x y <- intestine$y tar <- c(6,9,13) xf <- getFeat(x,y,tar,fpv=.05)
This function performs an inference of a cell type partition based on the expression of marker genes.
getPart(x, FMarker, fthr = NULL, n = 25)
getPart(x, FMarker, fthr = NULL, n = 25)
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
FMarker |
list of vectors of gene IDs corresponding to valid rownames of |
fthr |
vector of real positive numbers. This vector has to have the same length as the list |
n |
positive integer number. For each component of |
A list with the following three components:
part |
A vector with a partitioning, i. e. cluster assignment for each cell. |
tar |
A vector with the numbers of target clusters. Cluster 1 comprises all cells with no enrichment of marker genes. The remaining clusters correspond to cell types up-regulating the markers in the list |
thr |
A vector with expression threshold values applied for each component in the list |
x <- intestine$x y <- intestine$y FMarker <- list(c("Defa20__chr8","Defa24__chr8"), "Clca3__chr3", "Alpi__chr1") xf <- getPart(x,FMarker,fthr=NULL,n=5)
x <- intestine$x y <- intestine$y FMarker <- list(c("Defa20__chr8","Defa24__chr8"), "Clca3__chr3", "Alpi__chr1") xf <- getPart(x,FMarker,fthr=NULL,n=5)
This function computes a topological ordering of pseudo-temporal expression profiles of all genes by using 1-dimensional self-organizing maps.
getsom(x, nb = 1000, alpha = 0.5)
getsom(x, nb = 1000, alpha = 0.5)
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. The pseudo-temporal expression profile of each gene is defined by the order of cell IDs, i. e. columns, in |
nb |
positive integer number. Number of nodes of the self-organizing map. Default value is 1000. |
alpha |
positive real number. Pseudo-temporal expression profiles are derived by a local regression of expression values across the ordered cells using the function |
A list of the following three components:
som |
a |
x |
pseudo-temporal expression profiles, i. e. the input expression data frame |
zs |
data frame of z-score transformed pseudo-temporal expression profiles. |
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5)
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5)
This function extracts all genes with an importance value for classifying cells into a given target cluster exceeding a given threshold for at least one of the random forest iterationns.
impGenes(fb, tn, ithr = 0.02, zthr = 2)
impGenes(fb, tn, ithr = 0.02, zthr = 2)
fb |
fateBias object returned by the function |
tn |
name of a target cluster, i. e. concatenation of a |
ithr |
positive real number. Threshold for the required importance measure (mean decrease in accuracy of classification upon removal, see randomForest) to include a gene into the output as important feature for classying cells in |
zthr |
positive real number. Threshold for the required z-score of the importance measure (importance divided by the standard deviation of importance) to include a gene into the output as important feature for classying cells in |
The function returns a list of two elements.
d |
a data frame with mean importance values for all genes surviving the filtering by |
d |
a data frame with the standard deviation of importance values for all genes surviving the filtering by |
The function produces a heatmap of d
with hierarchical clustering of the rows using the function pheatmap
from the pheatmap package.
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) k <- impGenes(fb,"t6",ithr=.02,zthr=2)
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) k <- impGenes(fb,"t6",ithr=.02,zthr=2)
This dataset contains gene expression values, i. e. transcript counts, of 278 intestinal epithelial cells, and additional information on different cell types in this sample.
intestine
intestine
A list of the following 5 components:
data frame with genes as rows and cells as columns. This reduced data frame only contains expression of the most variable genes.
vector containing a clustering partition of the 278 cells into different cell types
data frame with genes as rows and cells as columns. This data frame contains expression of all genes.
vector containing colour values for all clusters in y
None
Grün et al. (2016) Cell Stem Cell 19(2): 266-77 (PubMed)
This is a plotting function for visualizing the output of the diffexpnb
function as MA plot.
plotdiffgenesnb( x, pthr = 0.05, padj = TRUE, lthr = 0, mthr = -Inf, Aname = NULL, Bname = NULL, show_names = TRUE, ... )
plotdiffgenesnb( x, pthr = 0.05, padj = TRUE, lthr = 0, mthr = -Inf, Aname = NULL, Bname = NULL, show_names = TRUE, ... )
x |
output of the function |
pthr |
real number between 0 and 1. This number represents the p-value cutoff applied for displaying differentially expressed genes. Default value is 0.05. The parameter |
padj |
logical value. If |
lthr |
real number between 0 and Inf. Differentially expressed genes are displayed only for log2 fold-changes greater than |
mthr |
real number between -Inf and Inf. Differentially expressed genes are displayed only for log2 mean expression greater than |
Aname |
name of expression set |
Bname |
name of expression set |
show_names |
logical value. If |
... |
Additional arguments for function |
None
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) thr <- .3 A <- rownames(fb$probs)[fb$probs[,"t6"] > .3] B <- rownames(fb$probs)[fb$probs[,"t13"] > .3] de <- diffexpnb(v,A=A,B=B) plotdiffgenesnb(de,pthr=.05)
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) thr <- .3 A <- rownames(fb$probs)[fb$probs[,"t6"] > .3] B <- rownames(fb$probs)[fb$probs[,"t13"] > .3] de <- diffexpnb(v,A=A,B=B) plotdiffgenesnb(de,pthr=.05)
This function allows plotting pseudo-temporal expression profiles for single genes or groups of genes.
plotexpression( x, y, g, n, logsc = FALSE, col = NULL, name = NULL, cluster = FALSE, alpha = 0.5, types = NULL, cex = 3, ylim = NULL, map = TRUE, leg = TRUE, seed = 12345, ylab = NULL )
plotexpression( x, y, g, n, logsc = FALSE, col = NULL, name = NULL, cluster = FALSE, alpha = 0.5, types = NULL, cex = 3, ylim = NULL, map = TRUE, leg = TRUE, seed = 12345, ylab = NULL )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of |
g |
a gene ID corresponding to one of the rownames of |
n |
ordered vector of cell IDs to be included. Cell IDs need to be column names of |
logsc |
logical value. If |
col |
optional vector of valid color names for all clusters in |
name |
optional character string. This argument corresponds to a title for the plot. Default value is |
cluster |
logical value. If |
alpha |
positive real number. Pseudo-temporal expression profiles are derived by a local regression of expression values across the ordered cells using the function |
types |
optional vector with IDs for different subsets of cells in |
cex |
size of data points. Default value is 3. |
ylim |
vector of two numerical values: lower and upper limit of values shown on the y-axis. Default value is |
map |
logical. If |
leg |
logical. If |
seed |
integer number. Random seed for determining colour scheme. Default is 12345. |
ylab |
Optional label for the y-axis. Default is |
None
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) # plot average profile of all genes of node 1 in the self-organizing map g <- names(ps$nodes)[ps$nodes == 1] plotexpression(v,y,g,n,col=fcol,name="Node 1",cluster=FALSE,alpha=.5,types=NULL)
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) # plot average profile of all genes of node 1 in the self-organizing map g <- names(ps$nodes)[ps$nodes == 1] plotexpression(v,y,g,n,col=fcol,name="Node 1",cluster=FALSE,alpha=.5,types=NULL)
This function allows plotting loess-smoothed pseudo-temporal expression profiles for groups of genes. To display gene expression profiles on the same scale, row sums are normalized to one.
plotexpressionProfile( x, y, g, n, logsc = FALSE, col = NULL, name = NULL, cluster = FALSE, alpha = 0.5, lwd = 1, ylim = NULL, seed = 12345, ylab = NULL )
plotexpressionProfile( x, y, g, n, logsc = FALSE, col = NULL, name = NULL, cluster = FALSE, alpha = 0.5, lwd = 1, ylim = NULL, seed = 12345, ylab = NULL )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of |
g |
a gene ID corresponding to one of the rownames of |
n |
ordered vector of cell IDs to be included. Cell IDs need to be column names of |
logsc |
logical value. If |
col |
optional vector of valid color names used for the profiles of all genes in |
name |
optional character string. This argument corresponds to a title for the plot. Default value is |
cluster |
logical value. If |
alpha |
positive real number. Pseudo-temporal expression profiles are derived by a local regression of expression values across the ordered cells using the function |
lwd |
line width of profiles. Default value is 1. |
ylim |
vector of two numerical values: lower and upper limit of values shown on the y-axis. Default value is |
seed |
integer number. Random seed for determining colour scheme. Default is 12345. |
ylab |
Optional label for the y-axis. Default is |
None
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) # plot average profile of all genes of node 1 in the self-organizing map g <- sample(names(ps$nodes)[ps$nodes == 1],5) plotexpressionProfile(v,y,g,n,col=fcol,name="Node 1",alpha=.2)
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) # plot average profile of all genes of node 1 in the self-organizing map g <- sample(names(ps$nodes)[ps$nodes == 1],5) plotexpressionProfile(v,y,g,n,col=fcol,name="Node 1",alpha=.2)
This function plots a dimensional reduction representation using the output of the compdr
function as input. It allows display of the input clusters as well as color coding of fate bias probabilities and gene expression.
plotFateMap( y, dr, x = NULL, g = NULL, n = NULL, prc = FALSE, logsc = FALSE, k = 2, m = "cmd", kr = NULL, col = NULL, fb = NULL, trthr = NULL, start = NULL, tp = 1, seed = 12345, ... )
plotFateMap( y, dr, x = NULL, g = NULL, n = NULL, prc = FALSE, logsc = FALSE, k = 2, m = "cmd", kr = NULL, col = NULL, fb = NULL, trthr = NULL, start = NULL, tp = 1, seed = 12345, ... )
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
dr |
list of dimensional reduction representations returned by the function |
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. This input has to be provided if |
g |
either the name of one of the trajectories from |
n |
optional character string. This argument corresponds to a title for 2-dimensional plots. Default value is |
prc |
logical. If |
logsc |
logical. If |
k |
integer number for the dimension to be used. This dimension has to be present in |
m |
name of the dimensional reduction algorithms to be used for the principal curve computation. One of |
kr |
integer vector. If |
col |
optional vector of valid color names for all clusters in |
fb |
fateBias object returned by the function |
trthr |
real value representing the threshold of the fraction of random forest votes required for the inclusion of a given cell for the computation of the principal curve. If |
start |
integer number representing a specified starting cluster number for all trajectories, i. e. a common progenitor cluster. The argument is optional. Default value is |
tp |
Transparency of points in the plot to allow better visibility of the principal curves. Default value is 1, i. e. non-transparent. |
seed |
integer number. Random seed for determining colour scheme. Default is 12345. |
... |
additional arguments to be passed to the low level function |
If fb
is provided as input argument and prc
equals TRUE
then the output corresponds to the output of prcurve
. Otherwise, only ouput is generated is g
equals E. In this case a vector of fate bias entropies for all cells is given.
x <- intestine$x y <- intestine$y # v contains all genes (no feature selection like in x) v <- intestine$v fcol <- intestine$fcol tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) # plot principal curves pr <- plotFateMap(y,dr,k=2,prc=TRUE,m="cmd",col=fcol,fb=fb,trthr=0.25,start=NULL,tp=.5) # plot expression of gene Apoa1__chr9 plotFateMap(y,dr,x=v,g="Apoa1__chr9",prc=FALSE,k=2,m="cmd",col=intestine$fcol)
x <- intestine$x y <- intestine$y # v contains all genes (no feature selection like in x) v <- intestine$v fcol <- intestine$fcol tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) # plot principal curves pr <- plotFateMap(y,dr,k=2,prc=TRUE,m="cmd",col=fcol,fb=fb,trthr=0.25,start=NULL,tp=.5) # plot expression of gene Apoa1__chr9 plotFateMap(y,dr,x=v,g="Apoa1__chr9",prc=FALSE,k=2,m="cmd",col=intestine$fcol)
This function allows plotting of normalized or z-score transformed pseudo-temporal expression profiles and permits highlighting of partitioning along the x-axis and the y-axis
plotheatmap( x, xpart = NULL, xcol = NULL, xlab = TRUE, xgrid = FALSE, ypart = NULL, ycol = NULL, ylab = TRUE, ygrid = FALSE, cex = 1 )
plotheatmap( x, xpart = NULL, xcol = NULL, xlab = TRUE, xgrid = FALSE, ypart = NULL, ycol = NULL, ylab = TRUE, ygrid = FALSE, cex = 1 )
x |
data frame with input data to show. Columns will be displayed on the x-axis and rows on the y-axis in the order given in |
xpart |
optional vector with integer values indicating partitioning of the data points along the x-axis. For instance, |
xcol |
optional vector with valid color names. The number of components has to be equal to the number of different values on |
xlab |
logical value. If |
xgrid |
logical value. If |
ypart |
optional vector with integer values indicating partitioning of the data points along the y-axis. For instance, |
ycol |
optional vector with valid color names. The number of components has to be equal to the number of different values on |
ylab |
logical value. If |
ygrid |
logical value. If |
cex |
positive real number. Size of axis labels. Default is 1. |
None
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) plotheatmap(ps$all.e,xpart=y[n],xcol=fcol,ypart=ps$nodes,xgrid=FALSE,ygrid=TRUE,xlab=FALSE)
x <- intestine$x y <- intestine$y v <- intestine$v fcol <- intestine$col tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3) plotheatmap(ps$all.e,xpart=y[n],xcol=fcol,ypart=ps$nodes,xgrid=FALSE,ygrid=TRUE,xlab=FALSE)
This function computes a principal curve for a given dimensional reduction representation which is specified by component names of an object returned by compdr
using the princurve package.
prcurve(y, fb, dr, k = 2, m = "cmd", trthr = NULL, start = NULL, ...)
prcurve(y, fb, dr, k = 2, m = "cmd", trthr = NULL, start = NULL, ...)
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
fb |
fateBias object returned by the function |
dr |
list of dimensional reduction representations returned by the function |
k |
integer number for the dimension to be used. This dimension has to be present in |
m |
name of the dimensional reduction algorithms to be used for the principal curve computation. One of |
trthr |
real value representing the threshold of the fraction of random forest votes required for the inclusion of a given cell for the computation of the principal curve. If |
start |
integer number representing a specified starting cluster number for all trajectories, i. e. a common progenitor cluster. The argument is optional. Default value is |
... |
additional arguments to be passed to the low level function |
The function computes a principal curve for each differentiation trajectory by considering only cells that are assigned to the trajectory with a significant fate bias >1 or at least trthr
of the random forest votes, respectively.
For simulateneous computation and plotting of the principal curve, see function plotFateMap
.
A list of the following two components:
pr |
A list of principal curve objects produced by the |
trc |
A list of ordered cell IDs for each trajectory in |
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.25,start=NULL)
x <- intestine$x y <- intestine$y tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.25,start=NULL)
This function processes the self-organizing maps produced by the function getsom
.
procsom(s1d, corthr = 0.85, minsom = 3)
procsom(s1d, corthr = 0.85, minsom = 3)
s1d |
output of function |
corthr |
correlation threshold, i. e. a real number between 0 and 1. The z-score of the average normalized pseudo-temporal expression profiles within each node of the self-organizing map is computed, and the correlation of these z-scores between neighbouring nodes is computed. If the correlation is greater than |
minsom |
positive integer number. Nodes of the self-organizing map with less than |
A list of the following seven components:
k |
vector of Pearson's correlation coefficient between node |
nodes |
vector with assignment of genes to nodes of the final self-organizing map (after merging). Components are node numbers and component names are gene IDs. |
nodes.e |
data frame with average normalized pseudo-temporal expression profile for each node, ordered by node number. |
nodes.z |
data frame with z-score transformed average normalized pseudo-temporal expression profile for each node, ordered by node number. |
all.e |
data frame with normalized pseudo-temporal expression profile for all genes in the self-organizing map, ordered by node number. |
all.z |
data frame with z-score transformed normalized pseudo-temporal expression profile for all genes in the self-organizing map, ordered by node number. |
all.b |
data frame with binarized pseudo-temporal expression profile for all genes in the self-organizing map, ordered by node number. Expression is 1 in cells with z-score > 1 and -1 in cells with z-score < -1, and 0 otherwise. |
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3)
x <- intestine$x y <- intestine$y v <- intestine$v tar <- c(6,9,13) fb <- fateBias(x,y,tar,z=NULL,minnr=5,minnrh=10,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL) dr <- compdr(x,z=NULL,m="cmd",k=2,tsne.perplexity=30) pr <- prcurve(y,fb,dr,k=2,m="cmd",trthr=0.4,start=NULL) n <- pr$trc[["t6"]] fs <- filterset(v,n,minexpr=2,minnumber=1) s1d <- getsom(fs,nb=1000,alpha=.5) ps <- procsom(s1d,corthr=.85,minsom=3)
This function attempts to reassign additional cells in the dataset to one of the target clusters.
reclassify( x, y, tar, z = NULL, clthr = 0.75, nbfactor = 5, use.dist = FALSE, seed = NULL, nbtree = NULL, q = 0.9, ... )
reclassify( x, y, tar, z = NULL, clthr = 0.75, nbfactor = 5, use.dist = FALSE, seed = NULL, nbtree = NULL, q = 0.9, ... )
x |
expression data frame with genes as rows and cells as columns. Gene IDs should be given as row names and cell IDs should be given as column names. This can be a reduced expression table only including the features (genes) to be used in the analysis. |
y |
clustering partition. A vector with an integer cluster number for each cell. The order of the cells has to be the same as for the columns of x. |
tar |
vector of integers representing target cluster numbers. Each element of |
z |
Matrix containing cell-to-cell distances to be used in the fate bias computation. Default is |
clthr |
real number between zero and one. This is the threshold for the fraction of random forest votes required to assign a cell not contained within the target clusters to one of these clusters. The value of this parameter should be sufficiently high to only reclassify cells with a high-confidence assignment. Default value is 0.9. |
nbfactor |
positive integer number. Determines the number of trees grown for each random forest. The number of trees is given by the number of columns of th training set multiplied by |
use.dist |
logical value. If |
seed |
integer seed for initialization. If equal to |
nbtree |
integer value. If given, it specifies the number of trees for each random forest explicitely. Default is |
q |
real value between zero and one. This number specifies a threshold used for feature selection based on importance sampling. A reduced expression table is generated containing only features with an importance larger than the q-quantile for at least one of the classes (i. e. target clusters). Default value is 0.75. |
... |
additional arguments to be passed to the low level function |
The function uses random forest based supervised learning to assign cells not contained in the target clusters to one of these clusters. All cells not within any of the target clusters which receive a fraction of votes larger than clthr
for one of the target clusters will be reassigned to this cluster. Since this function is developed to reclassify cells only if they can be assigned with high confidence, a high value of clthr
(e. g. > 0.75) should be applied.
A list with the following three components:
part |
A vector with the revised cluster assignment for each cell in the same order as in the input argument |
rf |
The random forest object generated for the reclassification, with enabled importance sampling (see randomForest). |
xf |
A filtered expression table with features extracted based on the important samples, only features with an importance larger than the q-quantile are for at least one of the classes are retained. |
x <- intestine$x y <- intestine$y tar <- c(6,9,13) rc <- reclassify(x,y,tar,z=NULL,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL,q=.9)
x <- intestine$x y <- intestine$y tar <- c(6,9,13) rc <- reclassify(x,y,tar,z=NULL,nbfactor=5,use.dist=FALSE,seed=NULL,nbtree=NULL,q=.9)