In the RNA-seq gene expression statistical analysis, we find different expression units like RPM, RPKM, FPKM, and raw read counts. Almost all of the time, it is complex to interpret the simple underlying methods for measuring these units from marked sequence data.
You must have seen a lot of posts on these standardization issues and their frustration among readers.
What do different normalized expression units do?
Expression units give a digital assessment of the number of transcripts. Normalized expression units are required to eliminate technological limitations in sequenced data, such as sequencing depth (more sequencing depth provides more read count for genes expressed at the same level) and gene length (Differences in gene length produce uneven readings for genes encoded at the same level; the longer the gene, the longer the read count).
There are three metrics that aim to normalize the depth and gene length of the series. Here’s how you’re going with RPKM:
- Count the cumulative readings in the survey and divide the amount by 1,000,000 – this is our “per million” scaling factor.
- Divide the read count by the “per million” scaling factor. This normalizes the sequencing depth, giving you readings per million (RPM)
- Divide the quantities of the RPM by the duration of the gene in kilobases. This will give you RPKM.
Gene expression unites and calculation
RPM (reads per million)
RPM= (number of reads mapped to gene x 10^6)/ Total number of mapped reads
- It does not keep the transcript length normalization under consideration.
- It is good for sequencing protocols when reads are produced regardless of the gene length.
RPKM (Reads per kilobase per million)
RPKM= (number of reads mapped to gene x (10^3)x(10^6))/ Total number of mapped reads x gene length in bp.
In this scenario, 10^3 epitomizes gene length and 10^6 is used to represent sequencing of the depth factor.
FPKM (Fragments per kilobase per million mapped readings) is similar to RPKM and is used in paired-end RNA-seq studies in particular. In paired-end RNA studies, two (left and right) readings are sequenced from the very same DNA element. When we graph paired-end results, either readings or only one reading from a high-quality segment will report to a segment.
FPKM is somewhat similar to that of RPKM. RPKM was made for single-end RNA-seq, where each read corresponded to a single fragrant that had been sequenced. FPKM was optimized for paired-end RNA-seq. For the paired-end RNA-seq, two reads equate to a single fragment, or, if one read in a pair does not map. One read may correspond to a single fragment.
In order to prevent misunderstanding or simultaneous counting, the fragments to which all or single read mapped are counted and interpreted for the FPKM measurement.
- RPKM acknowledges the duration of the gene for normalization
- RPKM is appropriate for sequencing protocols when sequencing reading relies on gene length.
- Used for single-end RNA-seq studies (FPKM for paired-end RNA-seq data)
- RPKM/FPKM could be skewed against the detection of differentially expressed genes as the overall normalized number for each study would be different (Bullard et al., 2010)
TPM (Transcription per million)
TPM= A x 1/∑(A) x 10^6
Where A= total reads mapped to gene×10^3/ gene length in bp
- TPM allows the duration of the gene for normalization
- TPM developed as an alternative to RPKM leading to the imprecision of RPKM measurements (Wagner et al., 2012)
- TPM is appropriate for sequencing protocols when sequencing reading relies on gene length.
TPM has a really good explanation when you refer to the transcript quantities. As the name suggests, the meaning is that if you were to sequence one million full-length transcripts, TPM is the number of transcripts you would’ve seen of type I, considering the excess of the other transcripts in the study. The last part of the “given” is significant.
So you see, when you measure TPM, the only distinction is that you normalize for gene length first, and then normalize for sequencing depth second. The implications of this disparity, however, are very significant.
When using TPM, the sum of all TPMs in each sample is the same. This makes it possible to compare the ratio of readings that have been assigned to the gene in each study (StatQuest, 2015).
For RNA test MBP has all the RNA Purification kits. Contact now.
Bullard, J. H., Purdom, E., Hansen, K. D., & Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics, 11, 76. https://doi.org/10.1186/1471-2105-11-94
StatQuest. (2015, July 9). RPKM, FPKM and TPM, clearly explained. StatQuest. https://statquest.org/rpkm-fpkm-and-tpm-clearly-explained/
Wagner, G. P., Kin, K., & Lynch, V. J. (2012). Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. PubMed.gov. 10.1007/s12064-012-0162-3