Post

SPOTlight: Deconvolution of VISIUM data from singlecell reference

MEGENA: Multiscale embedded gene co-expression network analysis

논문 정보

Song, Won-Min, and Bin Zhang. “Multiscale embedded gene co-expression network analysis.” PLoS computational biology 11.11 (2015): e1004574. pdf download

Installation

Prerequisite

  • R version > 4.3
    dependency package impute , bioconductor ver.3.17부터 가능한데 이게 R 4.3부터 가능

impute

1
BiocManager::install("impute")

intall MEGENA

1
2
3
4
library(devtools);
install_github("songw01/MEGENA");
# OR
install.packages("MEGENA") #이거

⚠️ 아직 계속해서 업데이트중
최신버전 사용하고 싶으면, github코드로 사용

How to run MEGENA

Prepare inputs

1
2
3
4
5
6
7
8
9
10
11
12
13
library(Seurat)
options(Seurat.object.assay.version = "v5")

GEX.seurOBJ <- readRDS("GEX.seurOBJ_annotated.rds")

# by celltypes
for(i in levels(GEX.seurOBJ$Annot.L2)){
  celltype.obj <- subset(GEX.seurOBJ,subset=Annot.L2==i)
  data_matrix <- celltype.obj[["RNA"]]$data
  data_matrix <- data_matrix[rowSums(data_matrix)>0,]
  #data_matrix <- LayerData(celltype.obj, assay="RNA", layer="data")
  write.table(data_matrix, file=paste0('data_matrix_',i,'.tsv'), quote=FALSE, sep='\t')
  }

MEGENA input으로 준비하는 matrix에 대해서,..

RNA vs. SCT assay? Counts vs Data layer?
어떤 데이터를 사용해야할지 고민이 많았는데, 결론부터 말하면 어떤 것을 쓰던 결과는 동일하게 나온다
assy 종류, layer 종류 모두 상관없음.

run MEGENA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

cellType_list <- c("CD4 T","CD8 T","Proliferating T","NK","B","Plasma","CD14 Monocyte","CD16 Monocyte","Monocyte","cDC","pDC")

library(MEGENA)
anyNA <- function(x) any(is.na(x))
source("recap_source.R")
source("check_matrix.R")

wk <- "/sc/arion/projects/scMultiscale/chos14/Project01_sysFlu/04_MEGENA/"  # working directory within MEGENA directory

args=commandArgs(trailingOnly=TRUE)

FDR.cutoff <- 0.5
q.pr    <- 0.05       # filter of low responding probes based on quantiles
n.cores <- 16

# input parameters
doPar <- TRUE;
method = "spearman"
#FDR.cutoff = 0.05
module.pval = 0.05
hub.pval = 0.05

#################################################################
i = cellType_list[as.numeric(args[1])]
print(as.numeric(args[1]))

setwd(wk)

df <- paste0("woP03_data_matrix_",i,".tsv")  # Expression file, rows: probes; colums: samples; 1st col: probe-names
print(df)

# variable required: df, wk, FDR.cutoff, n.cores 
data.file <- df   # presuming that we use the full file name
annot.col <- 1 ## number of annotation columns in the data matrix
annot     <- 1 ## column with annotation to be used
wkdir <- paste0(wk,i,"_FDR",FDR.cutoff,"mpi")

###########
# create output folder
dir.create(wkdir)

# load data
data.Df <- read.delim(file = data.file,sep = "\t",header = T)
data.mat <- as.matrix(data.Df);
v <- cv.calculation(data.mat)
res <- filter_low_quality_genes(data.mat,min.read = 1E-320,min.cv = quantile(v,probs=q.pr),min.sprop = 0.05)

datExpr <- res$expr.matrix

 message("\n\nPrint some statistics:")
range(v)
 message("Quantile prob: ", q.pr)
quantile(v,probs=q.pr)
dim(data.mat)
dim(datExpr)

# set working folder
setwd(wkdir)

#####
# calculate correlation
message("Calculate correlation")
# cut short
ijw <- calculate.correlation(datExpr = datExpr,doPerm = 10,method = method,FDR.cutoff = FDR.cutoff)

write.table(ijw,paste0("ijw_",i,".txt"),,sep = "\t",row.names = F,col.names = T,quote = F)

 message("Calculate PFN")
 if (doPar & getDoParWorkers() == 1){
   cl <- parallel::makeCluster(n.cores)
   registerDoParallel(cl)

  # check how many workers are there
  cat(paste("number of cores to use:",getDoParWorkers(),"\n",sep = ""))
 }

# calculate PFN (no)
el <- calculate.PFN(ijw[,1:3],doPar = doPar,num.cores = n.cores)

 message("Write files")
write.table(el,file = paste0("MEGENA_Network_",i,".txt"),sep = "\t",row.names = F,col.names = T,quote = F)

# do clustering
g <- graph.data.frame(el,directed = F)

MEGENA.output <- do.MEGENA(g,
 mod.pval = module.pval,hub.pval = hub.pval,remove.unsig = TRUE,
 min.size = 10,max.size = vcount(g)/2,
 doPar = TRUE,num.cores = n.cores,n.perm = 100,
 save.output = TRUE)

m <- MEGENA.ModuleSummary(MEGENA.output, max.size=50000)
output.geneSet.file(m$modules, "multiscale_significant.modules.txt")
write.table(MEGENA.output$node.summary, file="multiscale_nodeSummary.txt", row.names=F, quote=F, sep="\t")

save(MEGENA.output,file = paste0("MEGENA_output_",i,".RData"))

rm("MEGENA.output","g","el","ijw","m")

이부분은 optional, server에 parallel job으로 넣고 싶어서 작성함.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#BSUB -J megena_sc[1-11]
#BSUB -n 24
#BSUB -L /bin/bash
#BSUB -a openmpi
#BSUB -q premium
#BSUB -P acc_scMultiscale
#BSUB -R rusage[mem=100000]
#BSUB -R himem
#BSUB -W 96:00
#BSUB -o %J.out.%I

source ~/.bash_profile

cd $LS_SUBCWD
echo $LS_SUBCWD

module load openmpi
#get the job index of an array job
jobID=$LSB_JOBID

#run script
export R_LIBS_USER=/sc/arion/projects/scMultiscale/chos14/source/R-4.3.1/library
 
Rscript /sc/arion/projects/scMultiscale/chos14/Project01_sysFlu/04_MEGENA/megen
a.R ${LSB_JOBINDEX}

date

Outputs

중간에 만든 wkdir 에 가보면

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
.
├── cluster_stats.png
├── * correlation_FDR_table.txt
├── Data_Correlation.txt
├── ijw_CD4\ T.txt
├── majority_vote.png
├── * MEGENA_Network_CD4\ T.txt
├── MEGENA_output_CD4\ T.RData
├── multiscale_clusters.RData
├── multiscale_hubAnalysis.RData
├── multiscale_nodeSummary.txt
├── * multiscale_significant.modules.txt
├── pfg_el.RData
├── pvalue_distributions.png
├── scalecluster_heatmap.png
└── scalewise_hub_tileplot.png

* 표시한 파알들 주목.

주의사항

  • filter_low_quality_genes() 이후 ngenes: xxx 갯수 확인.
    적어도 몇천 genes는 되어야 괜찮은 사이즈의 module을 얻을 수 있음.
  • correlation_FDR_table.txt
    FDR값이 계속 decreasing인지 확인. 중간에 다시 올라가진 않는지 확인하자.
  • MEGENA_Network.txt
    weight == correlation coefficient, 두 유전자가 얼마만큼의 correlation을 갖고 co-expressed되는지 알 수 있다. highest value가 어느정도인지 확인하자.
  • multiscale_significant.modules.txt
    최종 결과로 나온 significant module. 적어도 100개 이상이어야..

References

  • https://rpubs.com/grrompala/656670
  • https://github.com/songw01/AD_scRNAseq_companion/blob/master/scripts/Section_K_Gene_Network_Analysis.Rmd
  • chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/ModulePreservation/Tutorials/MiniTutorial-MouseLiver.pdf
  • https://cran.r-project.org/web/packages/MEGENA/vignettes/MEGENA_pipeline_10062016.html
This post is licensed under CC BY 4.0 by the author.

© Subin Cho. Some rights reserved.

Using the Chirpy theme for Jekyll.