dFORCE: Single molecule multimodal timing of in vivo mRNA synthesis

A reproducible, end-to-end bioinformatic resource for the 2025 dFORCE manuscript available on biorXiv
The code here are not production software, but serve as an archive of the pre-processing and analysis performed for dFORCE.
Paths in the nextflow pipeline and bash script will need to be updated to your specific environment.

Repository layout

rna-biogenesis-maps_publication/
├── ROGUE1_minimal/              # utility for classifying splicing state and handling pre-mRNA sequencing data 
│   └── R1.py
│
├── dFORCE_nextflow/             # nextflow pipeline to process raw dFORCE POD5 data using Dorado, minimap2 and ROGUE1 
│   ├── basecall.nf             
│   ├── align_genome.nf          # align dFORCE reads to a genome using minimap2 
│   ├── run_ROGUE1*.nf           # run R1 in first-pass or second-pass mode 
│   ├── main_workflow.nf         
│   └── nextflow.config
│
├── downstream_analysis/         # commands to filter annotations and run nextflow pipelines (in .sh scripts) and run figure-specific analysis (in .ipynb)
│   ├── 0_dFORCE_preprocess_reads_*.sh        
│   ├── dFORCE_fig{1..6}_*.ipynb              
│   ├── dFORCE_fig3_m6A_preprocessing.sh      
│   ├── dFORCE_fig3_m6A_isoform_plots.R       
│   ├── dFORCE_fig6_m6A_preprocessing.sh      
│   └── … (extra notebooks & helpers)
│
└── igv/                                       # IGV to SVG utility for making publication-grade genome browser plots 
    └── IGV_to_svg.bat

Quick-start guide

Full pipeline and indexing from RNA004 POD5 input

The dFORCE nextflow pipeline basecalls, aligns and summarises the data in multiple 'R1' read summary files which are used for subsequent analysis.
The workflow for these steps is included in downstream_analysis/0_dFORCE_preprocess_reads_*.sh for the relevant species/sample
R1 runs in a first-pass mode, which classifies RNA processing relative to a filtered GTF annotation.
dFORCE uses this data to curate a 'second-pass' isoform index, which enables R1 to be run in second-pass mode.
Most dFORCE analysis is based off these second-pass outputs, except initial biotype classification and PCA analysis.

Pre-processing pipeline in detail

The 0_dFORCE_preprocess_reads_.sh scripts implement a two-pass indexing workflow:

Stage	Purpose	Key outputs
(a) GTF filtering	Remove non-Ensembl/HAVANA transcripts; retain mirBase genes	`annotation/filtered_annotation.gtf`
(b) First-pass Nextflow	GPU base-call & align every replicate to genome; run ROGUE-1 once	`first_pass/<sample>/.bam`, `_first_pass.txt`
(c) Merge & re-score	Merge total-RNA BAMs; select best isoform per gene with `filter_gtf_using_totalRNA_alignments.py`	`second_pass_index/filter_gtf/new_anno.gtf`
(d) Poly(A) clustering	Cluster pre-mRNA-A sites (`cluster_polyA_sites.py`) and adjust 3′ UTRs	BED + GTF cluster files in `annotation_with_clusters/`
(e) Second-pass Nextflow	Re-run each sample with `--R1_index <annotation_with_clusters>` to refine read classification	`second_pass/*second_pass.txt`, updated BAMs
(f) Optional “unfiltered” run	Repeat first-pass using the vanilla Ensembl 113 GTF for biotype plots	`ens113_unfiltered_first_pass/`

Figure reproduction

Most figure notebooks read the TSV/BED/BAM artefacts produced by the human or mouse preprocessing scripts.

PATH conventions and best practice

Absolute HPC paths are left in the example scripts so reviewers can replay our December 2024 production runs. Change them to your own scratch/project space as needed—the first ~20 lines of every shell script contain all path variables.

Citing and licensing

All original code is released under the MIT Licence; third-party tools retain their own licences. Cite this repository as:

Sethi A.J. et al. (2025) dFORCE https://github.com/compRNA/dFORCE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dFORCE: Single molecule multimodal timing of in vivo mRNA synthesis

Repository layout

Quick-start guide

Full pipeline and indexing from RNA004 POD5 input

Pre-processing pipeline in detail

Figure reproduction

PATH conventions and best practice

Citing and licensing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ROGUE1_minimal		ROGUE1_minimal
dFORCE_nextflow		dFORCE_nextflow
downstream_analysis		downstream_analysis
igv		igv
README.md		README.md

comprna/dFORCE

Folders and files

Latest commit

History

Repository files navigation

dFORCE: Single molecule multimodal timing of in vivo mRNA synthesis

Repository layout

Quick-start guide

Full pipeline and indexing from RNA004 POD5 input

Pre-processing pipeline in detail

Figure reproduction

PATH conventions and best practice

Citing and licensing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages