Create Genome Index file to map sequencing data

Posted Sep 3, 2022 Updated Mar 29, 2023

By Dewan Shrestha

1 min read

There are several tools available map sequencing data to the reference genome. And for all of the tools index files needs to be generated first. Here are some of the tools and the syntax used to generate index file.

DNA Sequence

Burrows-Wheeler Alignment Tool (bwa)

  
bwa index hg38.fa
samtools faidx hg38.fa
cut -f1,2 hg38.fa.fai > hg38.sizes

Bowtie2

bowtie2-build hg38.fa index_filename

RNA Sequence

Spliced Transcripts Alignment to a Reference (STAR)

  
STAR --runThreadN 20 --runMode genomeGenerate --genomeDir index_files/star_2.5.3a --genomeFastaFiles hg38.fa --sjdbGTFfile gencode.v41.annotation.gtf --sjdbOverhang 99

sjdbOverhang value = readlength - 1

kallisto

kallisto doesn’t allign sequencing reads to the genome, it performs pseudoalignment for transcript quantification using indexed transciptome. You can download the cdna fasta sequence from ensembl site.

pseudo_alignment

kallisto index -i index_filename hg38.cdna.fa

example: index_filename = hg38_index.idx

Reference

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4631051/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3322381/

Create Genome Index file to map sequencing data

DNA Sequence

RNA Sequence

Reference

Further Reading

Download links for public data and tools

Downloading data from UCSC table browser

Calculating phyloP conservation score over certain genomic region