Scientists use math to increase accuracy of data analysis results for biomedical research - Genetics News

Scientists use math to increase accuracy of data analysis results for biomedical research – Genetics News

Kyoto: Ever since scientists first mapped the entire human genome, attention has now turned to the question of how cells use this master copy of genetic instructions. It is known that when genes are turned on, parts of the DNA sequences in the cell nucleus are copied into shorter chain-like molecules, RNA, that deliver molecules essential for cell-specific survival and functions.

Understanding the patterns of RNAs in a cell can show which genes are active and allow researchers to speculate about what the cell is doing. The technology to measure RNA using a massively parallel DNA sequencer, RNA sequencing, has become a standard technique over the last decade. More recently, rapid technological advances allow single-cell level RNA sequencing of thousands of cells in parallel, accelerating advances in biomedical science. But quantifying RNAs from such a small material poses big technical challenges. Even with state-of-the-art equipment, data produced from single-cell RNA sequencing data contains significant detection errors, including the so-called “dropout effect.” Furthermore, even small errors in the calculations of a large number of genes can quickly add up, so that any useful information is lost in the noise of the signal.

Now, a team from Kyoto University’s Institute for Advanced Studies in Human Biology (WPI-ASHBi) has developed a new mathematical method that can remove noise and thus allow clear signals to be extracted from data from single cell RNA sequencing. The new method successfully reduces random sampling noise in the data to enable an accurate and comprehensive understanding of a cell’s activity. The research was recently published in the journal Life Sciences Alliance.

The paper’s lead author, Yusuke Imoto of ASHBi, explains: “Each gene represents a different dimension in RNA sequencing data, which means tens of thousands of dimensions need to be collected and analyzed across multiple cells. Even the smallest Noise in one dimension can have a huge impact on downstream data analyses, missing potentially important signals, which is why we call it the “curse of dimensionality.”

To break the curse of dimensionality, the Kyoto team developed a new noise reduction method, RECODE, which stands for “resolve the curse of dimensionality,” to remove random sampling noise from single-cell RNA sequencing data. RECODE applies high-dimensional statistical theories to obtain accurate results, even for genes expressed at very low levels.

The team first tested their method with data from a widely studied cell population, human peripheral blood. They confirmed that RECODE successfully removes the curse of dimensionality to reveal individual gene expression patterns close to their expected values.

Then, compared to other state-of-the-art analysis methods, RECODE outperformed the competition by providing much more faithful representations of gene activation. Additionally, RECODE is easier to use than other methods, without relying on parameters or using machine learning to make the calculations work.

Finally, the team tested RECODE on a complex dataset of mouse embryonic cells that contained many different cell types with unique patterns of gene expression. While other methods muddied the results, RECODE clearly resolved gene expression levels, even for rare cell types.

Imoto concludes: “Analysis of single-cell RNA sequencing data remains a technical challenge and is a developing technique, but our RECODE algorithm is a step in being able to reveal the true behaviors of single-cell structures. With our contribution, single cell RNA sequencing data analysis could become a powerful research tool with massive implications in many biological fields. Another senior author, Tomonori Nakamura, a biologist at ASHBi and Kyoto University’s Hakubi Center for Advanced Study, adds: “By unlocking the true power of single-cell RNA sequencing, RECODE will enable researchers to discover cell types. unidentified rare species, which will lead to the development and establishment of the new field of basic scientific research, as well as clinical application research and drug discovery. »

RECODE calculation programs (Python/R code, office application) are available on GitHub (

Story Source:

Materials provided by Kyoto University. Note: Content can be edited for style and length.

#Scientists #math #increase #accuracy #data #analysis #results #biomedical #research #Genetics #News

Leave a Comment

Your email address will not be published. Required fields are marked *