PopGeneticsVis: a toolkit for whole-genome resequencing and bisulfite sequencing downstream visualization in population genetics.
./PopGeneticsVis/data/cytosine_report/JJ_G2_1.cytosine_report.txt
./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k.txt
./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig_Anno.txt
PopGeneticsVis: a toolkit for whole-genome resequencing and bisulfite sequencing downstream visualization in population genetics.
PopGeneticsVis is a comprehensive toolkit specifically designed for the downstream visualization of whole - genome resequencing and bisulfite sequencing in the context of population genetics. It serves as an essential resource for researchers delving into population - level genetic analysis, enabling them to effectively visualize and interpret the complex data generated from these sequencing techniques. This toolkit provides a set of tools and functions that streamline the process of transforming raw sequencing data into meaningful visual representations, facilitating a deeper understanding of genetic patterns, variations, and epigenetic modifications within populations.
# 1. R: >= v4.3.0
# 2. Packages
# 2.1 CRAN
cran_packages <- c("argparse", "CMplot", "dplyr", "tidyr", "stringr", "data.table", "ggplot2", "ggpubr")
install.packages(cran_packages)
# 2.2 Bioconductor
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
bioconductor_packages <- c("methylKit", "GenomicFeatures", "rtracklayer", "GenomicRanges")
BiocManager::install(bioconductor_packages)
https://github.com/benben-miao/PopGeneticsVis/releases
# 1. Clone
git clone https://github.com/benben-miao/PopGeneticsVis.git
# 2. Test
Rscript ./PopGeneticsVis/bin/plink_pca_vis.R --help
usage: ./PopGeneticsVis/bin/plink_pca_vis.R [-h] --eigenvec EIGENVEC --eigenval EIGENVAL
--sample_pop SAMPLE_POP --select_pops SELECT_POPS
[--x_pc X_PC] [--y_pc Y_PC]
[--point_size POINT_SIZE]
[--point_alpha POINT_ALPHA]
PCA Visualization for PLINK PCA Eigenvec
optional arguments:
-h, --help show this help message and exit
--eigenvec EIGENVEC PLINK PCA results [pca.eigenvec].
--eigenval EIGENVAL PLINK PCA results [pca.eigenval].
--sample_pop SAMPLE_POP
Table with samples in col1, pops in col2
[sample_pop.txt].
--select_pops SELECT_POPS
Select populations ['GroupA,GroupB'].
--x_pc X_PC Which PC for X Axis [1].
--y_pc Y_PC Which PC for Y Axis [2].
--point_size POINT_SIZE
Points size [4].
--point_alpha POINT_ALPHA
Points alpha [0.8].
PLINK (short for Whole-Genome Association Analysis Toolset) is one of the most commonly used tools in genome-wide association studies (GWAS) and population genetics research. Its built-in Principal Component Analysis (PCA) function can efficiently process large-scale genotype data, which is used to correct for population stratification or reveal the genetic structure of samples.
Rscript ./PopGeneticsVis/bin/plink_pca_vis.R --help
usage: ./PopGeneticsVis/bin/plink_pca_vis.R [-h] --eigenvec EIGENVEC --eigenval EIGENVAL
--sample_pop SAMPLE_POP --select_pops SELECT_POPS
[--x_pc X_PC] [--y_pc Y_PC]
[--point_size POINT_SIZE]
[--point_alpha POINT_ALPHA]
PCA Visualization for PLINK PCA Eigenvec
optional arguments:
-h, --help show this help message and exit
--eigenvec EIGENVEC PLINK PCA results [pca.eigenvec].
--eigenval EIGENVAL PLINK PCA results [pca.eigenval].
--sample_pop SAMPLE_POP
Table with samples in col1, pops in col2
[sample_pop.txt].
--select_pops SELECT_POPS
Select populations ['GroupA,GroupB'].
--x_pc X_PC Which PC for X Axis [1].
--y_pc Y_PC Which PC for Y Axis [2].
--point_size POINT_SIZE
Points size [4].
--point_alpha POINT_ALPHA
Points alpha [0.8].
Rscript \
./PopGeneticsVis/bin/plink_pca_vis.R \
--eigenvec ./PopGeneticsVis/data/plink_pca/pca.eigenvec \
--eigenval ./PopGeneticsVis/data/plink_pca/pca.eigenval \
--sample_pop ./PopGeneticsVis/data/plink_pca/samples_pops.txt \
--select_pops "JJ_G2,HN_G2" \
--x_pc 1 --y_pc 2 \
--point_size 4 --point_alpha 0.8
./PopGeneticsVis/data/plink_pca/pca.eigenvec
head -n 10 ./PopGeneticsVis/data/plink_pca/pca.eigenvec
#FID IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
HN_G2_1 HN_G2_1 0.04043 -0.0253789 0.0367277 0.00133413 -0.100252 -0.0905802 0.0113605 -0.0137787 -0.00668236 0.0167026
HN_G2_10 HN_G2_10 0.0212835 0.00859226 -0.0506521 -0.023103 0.0270852 -0.0357936 0.00354596 -0.020354 1.16024e-05 0.0141656
HN_G2_11 HN_G2_11 0.0127448 0.00981976 -0.067746 -0.0214325 -0.0301917 -0.0183634 -0.0200866 -0.00592332 -0.0141271 0.14919
HN_G2_12 HN_G2_12 0.0172671 0.022466 -0.0228558 0.0474678 -0.05247 0.0371412 -0.0215446 -0.0304079 0.0416183 0.0017122
HN_G2_13 HN_G2_13 0.0292439 0.0168136 -0.00497563 0.0267906 -0.0675681 0.0251754 -0.00228862 -0.0429644 0.0618089 -0.0117392
HN_G2_14 HN_G2_14 0.0158522 0.0359309 -0.0488923 0.0203213 0.00240803 -0.012922 0.0210997 -0.0185264 0.0167203 0.0338862
HN_G2_15 HN_G2_15 0.00289866 -0.00861881 -0.0898324 0.0208598 0.0670552 -0.173035 -0.035361 -0.0510791 -0.0171244 -0.0316206
HN_G2_16 HN_G2_16 0.0265133 0.00541896 -0.045698 -0.0219957 0.0469065 0.0547635 0.0421172 -0.0974258 -0.0106709 0.0287779
HN_G2_17 HN_G2_17 0.0250258 0.0229324 -0.0335188 0.00408936 -0.00744732 -0.0204986 0.023825 -0.010836 0.069014 0.0448165
./PopGeneticsVis/data/plink_pca/pca.eigenval
cat ./PopGeneticsVis/data/plink_pca/pca.eigenval
9.91596
7.34118
5.66223
4.90547
4.37931
4.00635
3.89641
3.54522
3.51559
3.16454
./PopGeneticsVis/data/plink_pca/samples_pops.txt
head -n 10 ./PopGeneticsVis/data/plink_pca/samples_pops.txt
HN_G2_1 HN_G2
HN_G2_2 HN_G2
HN_G2_3 HN_G2
HN_G2_4 HN_G2
HN_G2_5 HN_G2
HN_G2_6 HN_G2
HN_G2_7 HN_G2
HN_G2_8 HN_G2
HN_G2_9 HN_G2
HN_G2_10 HN_G2
In population genetics research, Tajima’s D test is one of the core methods for evaluating the deviation of DNA sequence polymorphism from the neutral evolution model, and VCFtools (Variant Call Format tools) is a commonly used tool for quickly calculating Tajima’s D values based on VCF files (a standard format for storing genetic variation data).
usage: ./PopGeneticsVis/bin/vcftools_tajimad_vis.R [-h] --tajimad_win TAJIMAD_WIN --select_pops SELECT_POPS
[--box_outliers BOX_OUTLIERS]
VCFtools TajimaD Violin Box Vis
optional arguments:
-h, --help show this help message and exit
--tajimad_win TAJIMAD_WIN
VCFtools TajimaD with ordered POP [all.Tajima.D].
--select_pops SELECT_POPS
Select populations ['GroupA,GroupB'].
--box_outliers BOX_OUTLIERS
Show box outliers uppercase [FALSE].
Rscript \
./PopGeneticsVis/bin/vcftools_tajimad_vis.R \
--tajimad_win ./PopGeneticsVis/data/vcftools_tajimad/all.Tajima.D \
--select_pops "JJ_G2,HN_G2" \
--box_outliers TRUE
./PopGeneticsVis/data/vcftools_tajimad/all.Tajima.D
head -n 10 ./PopGeneticsVis/data/vcftools_tajimad/all.Tajima.D
CHROM BIN_START N_SNPS TajimaD POP
11 10000 4 -1.51583 HN_G2
11 20000 0 nan HN_G2
11 30000 3 -0.504447 HN_G2
11 40000 0 nan HN_G2
11 50000 0 nan HN_G2
11 60000 0 nan HN_G2
11 70000 0 nan HN_G2
11 80000 0 nan HN_G2
11 90000 0 nan HN_G2
In PLINK (a commonly used tool for genome-wide association studies), Heterozyg (Heterozygosity) is a core indicator for measuring the genetic diversity of samples or loci. It reflects the degree to which an individual carries two different alleles (such as A/T) at a specific locus, or the frequency of the distribution of heterozygous alleles at a locus in a population.
usage: ./PopGeneticsVis/bin/plink_heterozyg_vis.R [-h] --het_data HET_DATA --sample_pop SAMPLE_POP --select_pops
SELECT_POPS [--point_size POINT_SIZE]
PLINK Heterozyg Violin Box Vis
optional arguments:
-h, --help show this help message and exit
--het_data HET_DATA PLINK Heterozyg results [het.het].
--sample_pop SAMPLE_POP
Table with IID in col1 and Population in col2
[sample_pop.txt].
--select_pops SELECT_POPS
Select populations ['GroupA,GroupB'].
--point_size POINT_SIZE
Point size [2].
Rscript \
./PopGeneticsVis/bin/plink_heterozyg_vis.R \
--het_data ./PopGeneticsVis/data/plink_het/het.het \
--sample_pop ./PopGeneticsVis/data/plink_het/samples_pops.txt \
--select_pops "JJ_G2,HN_G2" \
--point_size 2
./PopGeneticsVis/data/plink_het/het.het
head -n 10 ./PopGeneticsVis/data/plink_het/het.het
#FID IID O(HOM) E(HOM) OBS_CT F
HN_G2_1 HN_G2_1 19048512 1.81455e+07 20585181 0.370148
HN_G2_10 HN_G2_10 19053719 1.804e+07 20467745 0.417557
HN_G2_11 HN_G2_11 18950164 1.79458e+07 20366876 0.414843
HN_G2_12 HN_G2_12 19064469 1.79607e+07 20383072 0.455651
HN_G2_13 HN_G2_13 19222145 1.8105e+07 20549246 0.457051
HN_G2_14 HN_G2_14 18908338 1.8013e+07 20444064 0.368286
HN_G2_15 HN_G2_15 19256769 1.80381e+07 20474025 0.50029
HN_G2_16 HN_G2_16 19041670 1.80191e+07 20453626 0.420034
HN_G2_17 HN_G2_17 19309606 1.81539e+07 20603448 0.471799
./PopGeneticsVis/data/plink_het/samples_pops.txt
head -n 10 ./PopGeneticsVis/data/plink_het/samples_pops.txt
HN_G2_1 HN_G2
HN_G2_2 HN_G2
HN_G2_3 HN_G2
HN_G2_4 HN_G2
HN_G2_5 HN_G2
HN_G2_6 HN_G2
HN_G2_7 HN_G2
HN_G2_8 HN_G2
HN_G2_9 HN_G2
HN_G2_10 HN_G2
VCFtools (Variant Call Format tools) is a core set of tools for processing VCF (Variant Call Format) files in population genetics, while FST (Fixation Index) is a key indicator for measuring the degree of genetic differentiation between populations. The combination of the two can efficiently analyze the genetic differences between different populations at the genomic level, and is widely used in fields such as population structure research, domestication site localization, and adaptive evolution analysis.
usage: ./PopGeneticsVis/bin/cmplot_fst_xpclr.R [-h] [--method {FST,XPCLR}] --fst_xpclr FST_XPCLR
[--plot_type {m,c,d,q}]
CMplot for FST or XPCLR results annotation
optional arguments:
-h, --help show this help message and exit
--method {FST,XPCLR} SNP selection methods [FST].
--fst_xpclr FST_XPCLR
FST or XPCLR results with anno [fst.anno].
--plot_type {m,c,d,q}
Plot type m(Manhattan), c(Circle), d(Density), q(QQ-
plot) [m].
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method FST \
--fst_xpclr ./PopGeneticsVis/data/vcftools_fst/FST_JJ_G2_vs_HN_G2.windowed.weir.fst.chr.anno \
--plot_type d
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method FST \
--fst_xpclr ./PopGeneticsVis/data/vcftools_fst/FST_JJ_G2_vs_HN_G2.windowed.weir.fst.chr.anno \
--plot_type m
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method FST \
--fst_xpclr ./PopGeneticsVis/data/vcftools_fst/FST_JJ_G2_vs_HN_G2.windowed.weir.fst.chr.anno \
--plot_type c
./PopGeneticsVis/data/vcftools_fst/FST_JJ_G2_vs_HN_G2.windowed.weir.fst.chr.anno
head -n 10 ./PopGeneticsVis/data/vcftools_fst/FST_JJ_G2_vs_HN_G2.windowed.weir.fst.chr.anno
CHROM BIN_START BIN_END N_VARIANTS WEIGHTED_FST MEAN_FST anno_type gene_id
chr11 10001 20000 7 0.0436644 0.0239114 promoter,gene,UTR5,CDS,exon,intron,UTR3 HdF041849(p),HdF041849(g)
chr11 15001 25000 7 0.0436644 0.0239114 promoter,gene,UTR5,exon,intron HdF041849(p),HdF041849(g)
chr11 25001 35000 4 0.0150391 0.0161973 intergenic
chr11 30001 40000 4 0.0150391 0.0161973 intergenic
chr11 55001 65000 1 -0.00444874 -0.00444874 intergenic
chr11 60001 70000 1 -0.00444874 -0.00444874 intergenic
chr11 270001 280000 15 0.0511696 0.0366771 gene,UTR5,CDS,exon,intron,UTR3 HdF041844(g)
chr11 275001 285000 15 0.0511696 0.0366771 gene,UTR5,exon,intron HdF041844(g)
chr11 300001 310000 8 0.0526581 0.0448139 intergenic
./PopGeneticsVis/data/xpclr_xpclr/JJ_G2_vs_HN_G2.xpclr.chr.anno
XPCLR (Cross-Population Composite Likelihood Ratio) is a commonly used analytical tool in population genetics for detecting selection signals, and it is particularly suitable for studying differences in genomic regions among different populations caused by natural selection. Its core principle is to calculate the “composite likelihood ratio” for each locus on the genome by comparing the differences in allele frequencies between two populations and incorporating linkage disequilibrium (LD) information, thereby identifying regions that may be affected by positive selection (or other selective pressures). Such regions usually exhibit population differentiation signals significantly higher than the genomic background level.
head -n 10 ./PopGeneticsVis/data/xpclr_xpclr/JJ_G2_vs_HN_G2.xpclr.chr.anno
chrom start stop pos_start pos_stop modelL nullL sel_coef nSNPs nSNPs_avail xpclr xpclr_norm anno_type gene_id
chr1 110001 120000 116131 119955 -35.0884425294439 -40.1089557348041 1e-04 16 16 10.0410264107203 -0.0638126714140551 intergenic
chr1 115001 125000 116131 124815 -66.45676465399 -66.885874161588 5e-05 26 26 0.858219015196113 -0.465779523577906 intergenic
chr1 120001 130000 121746 129981 -78.2152330276062 -88.4911108160588 1e-04 31 31 20.5517555769051 0.396282431847874 intergenic
chr1 125001 135000 125362 134705 -109.377249276282 -109.686346377196 1e-05 41 41 0.618194201828032 -0.476286334242127 intergenic
chr1 130001 140000 130307 139709 -108.695950408831 -115.431457733638 2e-04 50 50 13.4710146496142 0.0863311262244798 intergenic
chr1 135001 145000 135498 144985 -158.605704897143 -169.662861526333 4e-04 65 65 22.1143132583807 0.464681600587198 intergenic
chr1 140001 150000 141304 149997 -154.800677159078 -154.800677159078 0 54 54 0 -0.503347075759572 intergenic
chr1 145001 155000 145791 154842 -81.9384742813194 -82.1444387953709 5e-05 31 31 0.411929028102946 -0.485315355435985 intergenic
chr1 150001 160000 150028 159925 -44.0246980239117 -50.7171763729386 4e-04 19 19 13.3849566980539 0.0825640381867617 intergenic
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method XPCLR \
--fst_xpclr ./PopGeneticsVis/data/xpclr_xpclr/JJ_G2_vs_HN_G2.xpclr.chr.anno \
--plot_type d
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method XPCLR \
--fst_xpclr ./PopGeneticsVis/data/xpclr_xpclr/JJ_G2_vs_HN_G2.xpclr.chr.anno \
--plot_type m
Rscript \
./PopGeneticsVis/bin/cmplot_fst_xpclr.R \
--method XPCLR \
--fst_xpclr ./PopGeneticsVis/data/xpclr_xpclr/JJ_G2_vs_HN_G2.xpclr.chr.anno \
--plot_type c
cytosine_report (cytosine report) is a core file related to WGBS (Whole Genome Bisulfite Sequencing) data analysis. It is mainly used to store key information about cytosine sites in DNA methylation research, supporting subsequent methylation level visualization and Differential Methylation Region (DMR) analysis.
usage: ./PopGeneticsVis/bin/cytosine_ratio_vis.R [-h] --cytosine_report CYTOSINE_REPORT
[--sample_num SAMPLE_NUM]
Methyl Ratio Vis for SytosineReport
options:
-h, --help show this help message and exit
--cytosine_report CYTOSINE_REPORT
Cytosine report from Bismark/MethylDackel
[./cytosine_report.txt].
--sample_num SAMPLE_NUM
Methyl site number for visualization [1000000].
Rscript \
./PopGeneticsVis/bin/cytosine_ratio_vis.R \
--cytosine_report ./PopGeneticsVis/data/cytosine_report/HN_G2_1.cytosine_report.txt \
--sample_n 1000000
./PopGeneticsVis/data/cytosine_report/HN_G2_1.cytosine_report.txt
head -n 10 ./PopGeneticsVis/data/cytosine_report/HN_G2_1.cytosine_report.txt
mith1tg000589c 2 + 0 18 CHH CCC
mith1tg000589c 3 + 0 19 CHH CCC
mith1tg000589c 4 + 0 19 CHH CCC
mith1tg000589c 5 + 0 21 CHG CCG
mith1tg000589c 6 + 0 25 CG CGT
mith1tg000589c 7 - 0 0 CG CGG
mith1tg000589c 14 + 0 27 CG CGC
mith1tg000589c 15 - 0 0 CG CGT
mith1tg000589c 16 + 1 28 CHH CCC
mith1tg000589c 17 + 1 29 CHH CCT
usage: ./PopGeneticsVis/bin/methylkit_dmr.R [-h] --control CONTROL --treated TREATED --in_dir
IN_DIR --out_dir OUT_DIR [--threads THREADS]
[--nrep NREP] [--level {DMC,DMR,DMP}]
[--context {CpG,CHG,CHH}] [--mincov MINCOV]
[--win_size WIN_SIZE] [--step_size STEP_SIZE]
[--corr_method {pearson,kendall,spearman}]
[--adjust_method {SLIM,holm,hochberg,hommel,bonferroni,BH,BY,fdr,none,qvalue}]
[--test_method {F,Chisq,fast.fisher,midPval}]
[--qvalue_cutoff QVALUE_CUTOFF]
[--meth_cutoff METH_CUTOFF] [--gtf GTF]
[--promoter_up PROMOTER_UP]
[--promoter_down PROMOTER_DOWN]
[--plot_width PLOT_WIDTH]
[--plot_height PLOT_HEIGHT]
MethylKit Analysis Workflow
options:
-h, --help show this help message and exit
--control CONTROL Control group name.
--treated TREATED Treated group name.
--in_dir IN_DIR Input directory [./].
--out_dir OUT_DIR Output directory [./].
--threads THREADS Threads, more CPUs more Mems need [1].
--nrep NREP Samples replicates each group [3].
--level {DMC,DMR,DMP}
Methylation level [DMR].
--context {CpG,CHG,CHH}
Methylation context [CpG].
--mincov MINCOV Minimum coverage [10].
--win_size WIN_SIZE Window size for DMR (bp) [2000].
--step_size STEP_SIZE
Step size for DMR (bp) [2000].
--corr_method {pearson,kendall,spearman}
Correlation method [pearson].
--adjust_method {SLIM,holm,hochberg,hommel,bonferroni,BH,BY,fdr,none,qvalue}
P-value adjustment method [SLIM].
--test_method {F,Chisq,fast.fisher,midPval}
Statistical test method [Chisq].
--qvalue_cutoff QVALUE_CUTOFF
Q-value cutoff [0.05].
--meth_cutoff METH_CUTOFF
Methylation difference cutoff [25].
--gtf GTF GTF/GFF of genome [NULL].
--promoter_up PROMOTER_UP
Promoter upstream distance (bp) [2000]
--promoter_down PROMOTER_DOWN
Promoter downstream distance (bp) [200]
--plot_width PLOT_WIDTH
Plot width (inch) [10.00].
--plot_height PLOT_HEIGHT
Plot height (inch) [10.00].
genome_gtf="/path_to_gtf/genome.gtf"
Rscript \
./PopGeneticsVis/bin/methylkit_dmr.R \
--control JJ_G2 \
--treated HN_G2 \
--in_dir ./PopGeneticsVis/data/cytosine_report/ \
--out_dir ./PopGeneticsVis/data/cytosine_report/ \
--threads 1 \
--nrep 3 \
--level DMR \
--context CpG \
--mincov 10 \
--win_size 2000 \
--step_size 2000 \
--corr_method pearson \
--adjust_method SLIM \
--test_method Chisq \
--qvalue_cutoff 0.05 \
--meth_cutoff 25 \
--gtf ${genome_gtf} \
--promoter_up 2000 \
--promoter_down 200 \
--plot_width 10.00 \
--plot_height 6.18
./PopGeneticsVis/data/cytosine_report/JJ_G2_1.cytosine_report.txt
head -n 10 ./PopGeneticsVis/data/cytosine_report/JJ_G2_1.cytosine_report.txt
chr2 1 + 0 0 CHG CTG
chr2 3 - 0 2 CHG CAG
chr2 5 + 0 0 CHH CAT
chr2 8 + 0 0 CHG CTG
chr2 10 - 2 1 CHG CAG
chr2 12 + 0 0 CHH CTC
chr2 14 + 0 0 CHH CCT
chr2 15 + 0 0 CHH CTC
chr2 17 + 0 0 CHH CTC
chr2 19 + 0 0 CHH CTC
./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k.txt
head -n 10 ./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k.txt
chr start end strand pvalue qvalue meth.diff
chr1 94001 96000 * 0.359806000563359 0.387704326219981 0.152798303668958
chr1 96001 98000 * 0.470816921695121 0.456792909828252 -0.0848647129274156
chr1 98001 100000 * 0.000872125098979439 0.0030624412141788 -0.51115564900546
chr1 100001 102000 * 0.14767572092135 0.21438828041915 0.144018353504898
chr1 102001 104000 * 0.96580550253059 0.6773458835649 -0.00530964695794603
chr1 104001 106000 * 0.797705926039805 0.614400356560997 -0.0539165519981705
chr1 106001 108000 * 0.00835614419137723 0.0219464554487716 1.44288691348377
chr1 110001 112000 * 1.30699444113323e-06 8.11073044542023e-06 1.90364567269138
chr1 112001 114000 * 0.000328168930580941 0.00127959885901375 0.694565505097805
head -n 10 ./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig.txt
chr start end strand pvalue qvalue meth.diff
chr1 466001 468000 * 0.00415711849409238 0.0120671167427604 -27.3170731707317
chr1 660001 662000 * 4.04113325784691e-09 3.62260475171775e-08 -33.3792540435737
chr1 1454001 1456000 * 8.93917792092073e-12 1.08386220427317e-10 -26.5563101301641
chr1 2750001 2752000 * 2.71610876608368e-129 5.71042756213395e-127 25.0474473214157
chr1 3428001 3430000 * 2.04907237864629e-10 2.14432834251086e-09 -46.3312368972746
chr1 3604001 3606000 * 2.13785433427291e-09 1.9841040640047e-08 37.8574305275876
chr1 4488001 4490000 * 3.46668153028929e-25 1.05232051309453e-23 -28.1260194808221
chr1 4934001 4936000 * 0.00170753601022128 0.00555652046196457 26.3565891472868
chr1 5294001 5296000 * 1.09733414650907e-08 9.28758216013033e-08 -58.5365853658537
./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig_Anno.txt
head -n 10 ./PopGeneticsVis/data/cytosine_report/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig_Anno.txt
chr start end strand pvalue qvalue meth.diff anno_genes anno_promoters
chr1 466001 468000 * 0.00415711849409238 0.0120671167427604 -27.3170731707317 HdF051575 NA
chr1 660001 662000 * 4.04113325784691e-09 3.62260475171775e-08 -33.3792540435737 HdF051575 NA
chr1 1454001 1456000 * 8.93917792092073e-12 1.08386220427317e-10 -26.5563101301641 HdF027216 NA
chr1 2750001 2752000 * 2.71610876608368e-129 5.71042756213395e-127 25.0474473214157 NA NA
chr1 3428001 3430000 * 2.04907237864629e-10 2.14432834251086e-09 -46.3312368972746 HdF027234 NA
chr1 3604001 3606000 * 2.13785433427291e-09 1.9841040640047e-08 37.8574305275876 HdF027242 NA
chr1 4488001 4490000 * 3.46668153028929e-25 1.05232051309453e-23 -28.1260194808221 HdF027261 NA
chr1 4934001 4936000 * 0.00170753601022128 0.00555652046196457 26.3565891472868 HdF027275 NA
chr1 5294001 5296000 * 1.09733414650907e-08 9.28758216013033e-08 -58.5365853658537 NA NA
MethylKit Context is a core analytical dimension in epigenetics (especially in DNA methylation analysis) based on the methylKit toolkit (commonly used for whole-genome bisulfite sequencing, WGBS, data analysis). It focuses on the distribution patterns and quantitative statistics of DNA methylation in different genomic contexts.
usage: ./PopGeneticsVis/bin/context_pie_vis.R [-h] --subjects_parts SUBJECTS_PARTS [--col_num COL_NUM]
Multi-Pie row: subjects; col: parts
optional arguments:
-h, --help show this help message and exit
--subjects_parts SUBJECTS_PARTS
Row: subjects; Col: parts [./subjects_parts.txt].
--col_num COL_NUM Columns/Pies in row [3].
Rscript \
./PopGeneticsVis/bin/context_pie_vis.R \
--subjects_parts ./PopGeneticsVis/data/methyl_context/context_stats.txt \
--col_num 3
./PopGeneticsVis/data/methyl_context/context_stats.txt
cat ./PopGeneticsVis/data/methyl_context/context_stats.txt
Subject CpG CHG CHH
JJ_G2_HN_G2 79674962 111393332 399114039
DMRs 602647 607166 609247
DMRs_Sig 5925 11 15
MethylKit is an open-source bioinformatics toolkit based on the R language, specifically designed for processing high-throughput methylation sequencing data (such as WGBS, RRBS, MeDIP-seq, etc.) and analyzing differential methylation. Among these, Differentially Methylated Regions (DMRs) are one of its core analysis targets — that is, specific regions in the genome where there are significant differences in methylation levels between different sample groups (e.g., disease group vs. normal group, treatment group vs. control group). These regions are key entry points for 解析 the biological functions regulated by methylation (such as gene expression, cell differentiation, and disease occurrence).
usage: ./PopGeneticsVis/bin/methylkit_dmr_violin.R [-h] --methylkit_dmr METHYLKIT_DMR
[--violin_scale {area,count,width}]
[--violin_width VIOLIN_WIDTH]
[--point_size POINT_SIZE]
Meth.Diff Hyper & Hypo Violin Plot
options:
-h, --help show this help message and exit
--methylkit_dmr METHYLKIT_DMR
MethylKit DMR results (chr, start, end, ...,
meth.diff) [./methylkit_dmr.txt].
--violin_scale {area,count,width}
Violin area and width [count].
--violin_width VIOLIN_WIDTH
Violin width percent [1].
--point_size POINT_SIZE
Point size [1.5].
Rscript \
./PopGeneticsVis/bin/methylkit_dmr_violin.R \
--methylkit_dmr ./PopGeneticsVis/data/methylkit_dmr/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig.txt \
--violin_scale count \
--violin_width 1 \
--point_size 1.5
Rscript \
./PopGeneticsVis/bin/methylkit_dmr_violin.R \
--methylkit_dmr ./PopGeneticsVis/data/methylkit_dmr/JJ_G2_vs_HN_G2_DMR_CHG_2k_Sig.txt \
--violin_scale count \
--violin_width 1 \
--point_size 1.5
Rscript \
./PopGeneticsVis/bin/methylkit_dmr_violin.R \
--methylkit_dmr ./PopGeneticsVis/data/methylkit_dmr/JJ_G2_vs_HN_G2_DMR_CHH_2k_Sig.txt \
--violin_scale count \
--violin_width 1 \
--point_size 1.5
./PopGeneticsVis/data/methylkit_dmr/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig.txt
head -n 10 ./PopGeneticsVis/data/methylkit_dmr/JJ_G2_vs_HN_G2_DMR_CpG_2k_Sig.txt
chr start end strand pvalue qvalue meth.diff
chr1 466001 468000 * 0.00415711849409238 0.0120671167427604 -27.3170731707317
chr1 660001 662000 * 4.04113325784691e-09 3.62260475171775e-08 -33.3792540435737
chr1 1454001 1456000 * 8.93917792092073e-12 1.08386220427317e-10 -26.5563101301641
chr1 2750001 2752000 * 2.71610876608368e-129 5.71042756213395e-127 25.0474473214157
chr1 3428001 3430000 * 2.04907237864629e-10 2.14432834251086e-09 -46.3312368972746
chr1 3604001 3606000 * 2.13785433427291e-09 1.9841040640047e-08 37.8574305275876
chr1 4488001 4490000 * 3.46668153028929e-25 1.05232051309453e-23 -28.1260194808221
chr1 4934001 4936000 * 0.00170753601022128 0.00555652046196457 26.3565891472868
chr1 5294001 5296000 * 1.09733414650907e-08 9.28758216013033e-08 -58.5365853658537