The qc_by_percentile() function performs quality control
on BAM files and generates a log2-transformed read count heatmap for
CRF-CRF pairs. This function can also accept BED files as input for
visualization purposes without filtering (set
filtered_percentile = 0 and plot = TRUE).
What this function does: This function implements a
percentile-based quality control strategy for CRF pair data. It
calculates read counts for all CRF pairs from input BAM files, then
filters out pairs with the lowest read counts based on a user-defined
percentile threshold. For example, setting
filtered_percentile = 0.25 removes the bottom 25% of CRF
pairs ranked by read count, retaining only pairs with sufficient
sequencing depth for downstream analysis.
The function generates a lower-triangle heatmap displaying log2-transformed read counts for all CRF pairs. Pairs that pass the quality control threshold are shown with a blue-white-red color gradient, while filtered pairs are displayed in grey to clearly distinguish them from high-quality data. Optional CRF grouping allows users to organize and annotate CRFs by biological categories (e.g., histone modifications, transcription factors), which are displayed as colored blocks along the heatmap margins. All QC results, including read count tables and visualizations, are automatically saved to the specified output directory.
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
file_paths |
character vector | Yes | Vector of BAM or BED file paths for QC and visualization | file_paths = c("sample1.bam", "sample2.bam") |
out_dir |
character | No (default: “./”) | Directory to store output TSV files and PDF | out_dir = "./results" |
filtered_percentile |
numeric | No (default: 0.25) | Numeric value between 0 and 1 to filter the lowest-read-count samples | filtered_percentile = 0.25 |
plot |
logical | No (default: TRUE) | If TRUE, generates heatmap visualization | plot = TRUE |
split_pair_by |
character | No (default: “-”) | Delimiter used to split CRF pairs into individual CRF names | split_pair_by = "-" |
group_csv |
character | No (default: NULL) | Optional file path to CSV defining CRF groupings and categories for annotation | group_csv = "crf_groups.csv" |
crf_col |
character | No (default: “crf”) | Column name in group_csv for CRF identifiers |
crf_col = "crf" |
category_col |
character | No (default: “category”) | Column name in group_csv for group categories |
category_col = "category" |
The function generates the following output files in the specified
out_dir:
all_read_count.tsv
| file | read_count |
|---|---|
| H3K27ac-H3K4me3 | 60573 |
| H3K4me3-H3K4me3 | 940240 |
| H3K4me1-H3K4me3 | 432292 |
| H3K4me3-H3K9me3 | 415540 |
| H3K27me3-H3K4me3 | 574643 |
| … | |
| H3K9me2-H3K9me3 | 201788 |
filtered_read_count.tsv
filtered_percentile parameterqc_heatmap.pdf
(if plot = TRUE)
group_csv provided, includes categorical annotations
with colored blocks

# BAM files (typical use case for QC):
input_path <- "./dir_for_bam_files"
bam_files <- list.files(path = input_path, pattern = "\\.bam$", recursive = TRUE, full.names = TRUE)
# Optional: Create grouping CSV
crfs <- c("H3K4me1", "H3K4me3", "H3K9ac", "H3K9me2", "H3K9me3", "H3K27ac", "H3K27me3", "H3K36me3")
categories <- c(rep("Active", 3), rep("Repressive", 3), rep("Other", 2))
group_df <- data.frame(crf = crfs, category = categories)
write.csv(group_df, "crf_groups.csv", row.names = FALSE)
# Run QC with grouping
library(multiEpiCore)
qc_by_percentile(
file_paths = bam_files,
out_dir = "output/qc_results",
filtered_percentile = 0.25,
plot = TRUE,
group_csv = "crf_groups.csv"
)
The function automatically detects whether input files are BAM or BED format based on file extensions.
The filtered_percentile parameter removes samples
with the lowest read counts (e.g., 0.25 removes the bottom
25%).
If filtered_path is provided, it overrides the
percentile-based filtering for determining significant pairs in the
heatmap.
The heatmap displays only the lower triangle of the matrix, with filtered pairs shown in grey.
Peak counts are log2-transformed (log2(count + 1)) for visualization.
When group_csv is provided, tags are sorted by
category and displayed with colored block annotations.
The function requires the following R packages:
ComplexHeatmap, latex2exp,
circlize, and grid.
All file paths in file_path and
filtered_path must exist, or the function will
fail.
If output_path_dir is NULL, outputs are saved to the
directory containing the first input file.
The frag_decomposition() function performs a complete
fragment length analysis pipeline including quality control, valley
detection, and visualization for nucleosome positioning analysis from
BAM files.
What this function does:
Performs quality control filtering on BAM files based on coverage percentile
Computes fragment lengths from filtered paired-end BAM files
Detects two local minimum valleys for nucleosome fragment classification
Generates fragment length histograms with valley cutoff lines
Creates summary statistics report (valley positions, min/max fragment lengths)
Produces fragment decomposition data with percentages
Generates bar plots showing fragment distribution across samples
| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
file_path |
character vector | Yes | Paths to BAM files to analyze | file_path = c("sample1.bam", "sample2.bam") |
save_dir |
character | Yes | Directory path to save all output files and plots | save_dir = "./fragment_analysis" |
frag_decomp_file |
character | No | Path to generated fragment decomposition file (returned by function) | Auto-generated |
filtered_percentile |
numeric | No | Percentile threshold for quality control filtering (default: 0.25, removes bottom 25%) | filtered_percentile = 0.25 |
dens_reso |
numeric | No | Resolution for kernel density estimation in valley detection (default: 2^15 = 32768) | dens_reso = 2^15 |
target_pair_mapping_df |
data frame | No | Optional data frame for mapping sample names to display names (default: NULL) | target_pair_mapping_df = df |
density_kernel |
character | No | Kernel type for density estimation (default: “gaussian”) | density_kernel = "gaussian" |
valley1_range |
numeric vector | No | Search range for first valley in bp (default: c(73, 221)) | valley1_range = c(73, 221) |
valley2_range |
numeric vector | No | Search range for second valley in bp (default: c(222, 368)) | valley2_range = c(222, 368) |
The function generates the following output files in the specified
save_dir:
premerge_all-qc_frag_lens.RData
premerge_frag_hist.pdf

summary_report.csv
local_min1: First valley position (bp)local_min2: Second valley position (bp)min_frag_len: Minimum fragment length (bp)max_frag_len: Maximum fragment length (bp)
premerge_frag_hist_with_cutoff.pdf

*_frag_decomp_with_perc.tsv
subnucleo,
monomer, dimer+
save_dir/barplots/*.pdf

library(multiEpiCore)
frag_decomposition(file_path = bam_files_vector, save_dir = "output/frag_decomposition")
## Use in bash
## change with your length of scen_list
#SBATCH --array=0-1
module load samtools
# setup (change)
thres_name="valley-V-qc"
scen_list=( "T" "V" ) # input dir name
scen_simple_name_list=( "T" "V" ) # output name
target_list=( "IgG_control" "H3K36me3" "H3K4me1"... )
load_root_dir="./data_align" # directory path for scen_list
save_root_raw_dir="output/frag_decomposition/split_bam"
# main function
run_frag_decomposition() {
scen=${scen_list[${SLURM_ARRAY_TASK_ID}]}
scen_simple_name=${scen_simple_name_list[${SLURM_ARRAY_TASK_ID}]}
save_root_dir=${save_root_raw_dir}/${thres_name}
mkdir -p "${save_root_dir}"
load_dir=${load_root_dir}/${scen}
mixed_dir=${save_root_dir}/${scen_simple_name}_mixed
subneucleo_dir=${save_root_dir}/${scen_simple_name}_sub
mononeucleo_dir=${save_root_dir}/${scen_simple_name}_mono
dineucleo_dir=${save_root_dir}/${scen_simple_name}_di
mkdir -p "${mixed_dir}" "${subneucleo_dir}" "${mononeucleo_dir}" "${dineucleo_dir}"
cp -r ${load_dir}/* "${mixed_dir}/"
for target_1 in "${target_list[@]}"; do
for target_2 in "${target_list[@]}"; do
if [ "${target_1}" \< "${target_2}" ] || [ "${target_1}" = "${target_2}" ]; then
mixed_filename="${target_1}-${target_2}.bam"
mixed_dir_filename=${load_dir}/${mixed_filename}
if [[ ! -f "${mixed_dir_filename}" ]]; then
echo "Warning: ${mixed_dir_filename} not found, skip."
continue
fi
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=24 && $9<=126) || ($9<=-24 && $9>=-126)' | samtools view -b > ${subneucleo_dir}/${mixed_filename}
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=127 && $9<=297) || ($9<=-127 && $9>=-297)' | samtools view -b > ${mononeucleo_dir}/${mixed_filename}
samtools view -h ${mixed_dir_filename} | awk 'substr($0,1,1)=="@" || ($9>=298 && $9<=800) || ($9<=-298 && $9>=-800)' | samtools view -b > ${dineucleo_dir}/${mixed_filename}
fi
done
done
}
run_frag_decomposition