On manuscript, Hu et al., Xing Lab
EMT Pathways and Hidden Complexity
This study challenges a fundamental assumption in single-cell biology: that low-dimensional visualizations faithfully represent the underlying cellular dynamics. Dimensionality reduction, while a useful technique for visualizing high-dimensional data, can distort underlying biological processes/alter important nuance.
Using epithelial-to-mesenchymal transition (EMT) as a test case, the authors reveal critical insights about how cells change their identity and the computational methods we use to study these processes (pseudotime analysis, optimal transport).
Another class uses integrated dynamical information in conjunction with single-cell gex data for trajectory inference. while single-cell is a snapshot, RNA velocity exploits a detail to infer a direction of change (ratio of unspliced:spliced RNA).
Example: dynamo, Qiu et al. — takes velocity measurements as input, uses ML to construct a vector field across the expression landscape, like a flow map showing how cells move through different states. This way, static + dynamic information is embedded, creating a more holistic picture. However, the dynamics-inference methods are still performed on a low-dimensional manifold which may not always faithfully represent distinguishable cell states in a meaningful manner. In essence, more emphasis needs to be placed on performing higher dimensional analysis (dimensions > 20).
This study uses a combination of low/high-dimensional methods to study the dynamics of epithelial mesenchymal transition (EMT) using scRNA-seq data. EMT is attractive because of its involvement in many processes including cell development, wound healing, tissue fibrosis and cancer metastasis. During EMT, epithelial cells lose epithelial traits and develop mesenchymal traits, characterized by a detachment from the basement membrane (a characteristic of epithelial cells), an elongated shape and increased mobility. This is regulated by a GRN involving transcription factors and microRNAs.
TGF-β has been shown to induce EMT in oral cancer cells. More specifically, it induces G1 phase cell cycle arrest. In the case of kidney fibrosis, a link between cell cycle arrest and partial EMT has been established where Snai1-induced partial EMT also induces G2/M phase cell cycle arrest. So basically: the timing of when cells stop dividing during EMT seems to be context-dependent. Different cell types or conditions may trigger EMT through different checkpoints. So, maybe there is no one universal EMT mechanism.
This leads to the main question: whether EMT proceeds through a single or multiple paths. Previous pseudotime analyses of human mammary epithelial MCF10A and NSCLC cells have concluded EMT proceeds through a one-dimensional continuum. However, live-cell imaging studies in a human lung carcinoma epithelial cell line, A549, found two parallel paths that connect the initial epithelial state to the final mesenchymal state.
This paper uses dynamo to perform vector field analyses on scRNA-seq data of MCF10A cells treated with increasing concentrations of TGF-β. Authors constructed a pipeline to obtain cell cycle informed representation, which was then used to predict two cell-cycle EMT paths corresponding to either the G1/S or G2/M checkpoints. These paths were confirmed in high-dimensional space/live imaging and 4i staining.
Two Paths, Not One
The researchers analyzed three scRNA-seq datasets:
- Static MCF10A dataset: Human mammary epithelial cells treated with increasing TGF-β concentrations (0-800 pM) for 14 days
- Time-course MCF10A dataset: Cells treated with 200 pM TGF-β over 3 days
- Time-course A549 dataset: Lung cancer cells treated with 400 pM TGF-β over 7 days
The authors developed a four-step method to explicitly capture cell cycle progression:
- Input to Revelio: Use raw spliced mRNA counts to identify cell cycle-dependent variations. This assumes a cylindrical shape in high-dimensional space.
- Coordinate extraction: Apply iterative finite temperature string method to obtain a 1D cyclic cell cycle coordinate from G1→S→G2→M→G1. This is done through tracing a circular path through the cylinder.
- Linearization: Identify division point, cutting the circle, and unravel the coordinate into a linear x-axis.
- 2D representation: Combine cell cycle coordinate with EMT-related dimensions to create a y-axis. This makes a 2D plot where x-axis = cell-cycle position and y-axis = EMT progression.
Using the dynamo framework, they:
- Constructed continuous vector fields from RNA velocity data
- Performed trajectory simulations in high-dimensional space (30+ dimensions)
- Applied transition path analysis to identify mean pathways
Standard 2D UMAP visualizations of untreated MCF10A cells show two stable fixed points but one represented actively proliferating cells. This is biologically impossible, as proliferating cells should show circular dynamics and not convergent flows. When projected to 2D from a 3D cylindrical vector field, these dynamics were lost, creating artificial convergent regions.
They found:
-
G1/S Path: Cells arrest at the G1/S checkpoint, then undergo EMT. This was dominant, ~90-92% of trajectories.
-
G2/M Path: Cells arrest at the G2/M checkpoint, then undergo EMT. About ~8-10% of trajectories. What I found cool was the fact that gex shits from G2/M to S-phase-like to G1/S-like during EMT.
Note: Low TGF-β MCF10A: Both paths present (~92% G1/S, ~8% G2/M); High TGF-β MCF10A: Only G1/S path detected; A549 cells: Both paths present (~60% G1/S, ~40% G2/M).
Both paths converge to a similar final state, G1/S-like mesenchymal cell, but take different routes to get there. Really interesting paper, it tackles the open question in EMT of proceeding through one/two paths as well as the discrepancies in dimensionality reduction.
Questions
- The cell cycle coordinate pipeline assumes a cylindrical manifold geometry. Would this hold universally for other types of cells/gex data? In the case that it’s not, would explicit cell cycle descriptions be needed?
- Is determining what molecular mechanisms constitute which EMT path a cell takes a goal of the paper?
- I’m curious about the temporal dynamics of the pathways; the G2/M path involves mitotic skipping, so would this take longer to complete EMT?
- Could mathematical modeling predict pathway ratios based on cellular parameters or environmental conditions? so the opposite of what dynamo does?