How can I include the discoveries, which I have described below, in my dissertation without having data?
Please click on
herehttp://singledrug.com/media/Temporal_Analysis_of_Aging_description.pdf
to see the latest draft of my dissertation..
I plan to add the following:
I found Osh6 at a very high rank of lifespan extending genes. That is why I want to take all the genes that are part of the same molecular function as Osh6 and run an expression similarity analysis on SPELL (Serial Pattern of Expression Levels Locator) (seespell.yeastgenome.org/).
I hope to find genes of unknown functions that could be involved in the same molecular function. If I find this, I'll validate with my time series plots and Transcription Factor Binding Sites (TFBS) similarity analysis. That way I hope to predict the function of an unknown gene in lipid metabolism. I will also try the opposite approach, i.e. I'll take genes of unknown function and look for a group of similarly expressed genes, which belong to the same GO term.
I added 19 pages to the second chapter. I was able to further improve the accuracy of my yeast lifespan predicting machine learning algorithm by adding more features about lipids. My last draft had an average prediction error of +/-5 replications. Now it is less than +/-3 replication for the 50th percentile of the replicative lifespan. To my knowledge nobody has published any research on about using machine learning and feature selection to predict the replicative age.
If you have any more ideas how I could make my dissertation look better, I'd be very thankful for any new feasible ideas and inspirations. I have mainly been using Python and R for all my analysis.
I found that ribosomal RNA and proteins are most closely follow the cell cycle and that if knocked out the yeast's lifespan can be extended significantly. I want to use the different cyclical periodically recurring and highly fluctuating gene expression oscillationpattern to discover the functions of genes we have not yet discovered. I am interested in this because a friend of mine showed me a cancer dataset that looked very similar. Since cancer is very similar to the yeast in that it keeps constantly dividing and hence its age can be expressed in replication we could use the same guilt-by-association approach to infer the functions or at least the way their expression is controlled by Transcription Factors (TFs) from genes with similar temporal expression patterns of genes with already known functions.
When analyzing the time series plots of the 62 yeast chaperons I discovered that their expression plot trajectories can easily be clustered into 7 expression patterns, which are consisted with the 7 classes of yeast chaperons. If I had only taken the average centroid of all 62 chaperons I would have been misled because using this approach could not be applied to potentially discover more new currently still unknown chaperons because no unknown chaperon would have time series trajectories, which would highly correlate with the average chaperon expression pattern of all 7 classes since their in-group expression pattern is very homogeneous but between the 7 different chaperon classes the expression pattern is very different.
I intended to elucidate the different ways by which we age by a very similar clustering algorithm, which I have developed based on time series curves over three subsequent yeast cell cycle consisting of 36 mRNA measures each about 25 minutes apart; thus, allowing to measure very steep mRNA expression fluctuation periods by manifold within the only 90 minutes of the cell cycle because even very brief but rapid expressionchanges that last not much longer than 1/12th of a cell cycle, of which I found surprisingly many.
I discovered that inferring gene functions based on highly correlated time series curves can only work if mRNA and protein concentration are measured at least 10 times within a cell cycle or else, a very brief but very steep peak or drop in expression could be missed. Any attempt to group genes into very highly coordinated and correlated gene expression pattern, which we can use to define them as a functional unit, must fail if our temporal resolution is lower than 1/10th of the cell cycle (i.e. time-spanbetween subsequent mRNA, protein, metabolite, microbiome and other omics measurements is too long to measure extremely brief but very steep (up to 100-fold) periodic expression changes. This would inevitably mislead us to mistakenly group genes together into illusionaryfunctional units despite their functions, mechanisms of action and expression control are so different (even if only for 25 minutes) that they could never form closely coordinated functional units.
I am very much fascinated by thinking of genes as functional units because this will eventually allow us to discover the still missing partners in crime base on the impressively highly attention-drawing correlations between time series trajectories of genes forming functional units.
In fact I lost the first 6 month because the time series curves between genes of the same GO-term (especially molecular functions) did not appear to be any higher correlated to one another than to the remaining unrelated genes of the rest of the genome because I analyzed lifespan mRNA and protein concentration measures, which were between 4 and 12 hours apart. Although I found up to 50 time series measurements throughout the lifespan of the yeast I could not generate co-expression or regulatory expression, translation and degradationnetworks, which I tried so hard to reproduce because I wanted to develop methods for making gene-, protein-, and metabolite interaction networks, which would allow us to make causal inferences about the mechanisms of aging and to distinguish between the drivers (causes) of the progression of the aging process and the genes, whose expression pattern changes as a consequence, i.e. a reaction to the gradual physiological decline that will kill all of us as we age unless we can focus on preventing the causes of aging to keep functioning in the way they have devilishly evolved instead of wasting time to prevent the consequences of aging to keep gradually killing us, because we can't stop them unless we annihilatedtheir driving causing. Thus, I was not aware of all the highly fluctuating periodically cell-cycle bouncing between their very low and high extremes, between which we must distinguish before we can discern between genes form functional units and those, whose expression pattern exclude them from forming highly correlated interdependent functional units.
So one of the most important things I had to learn the hard way in this research is that the temporal resolution of measurement must be higher than the briefest fluctuation windows for gene, protein and metabolite concentration extremes or else any kind of inferences based on similarities must fail if we cannot pick up on extremely short but highly function defining deviation of expression between different genes.
I plotted all lifespan spanning time series datasets, which divided the life of the yeast into at least 4 and up to 50 roughly equally long lasting parts because I kept hoping that I would eventually succeed in identifying aging-specific trends, markers and predictors, i.e. I was hoping to find genes, whose expression would rise or decline as the aging process would move forward in gradually killing us in order to stop it before its too late for us. I almost drove insane because as I kept plotting the time series curves for the same genes from different microarray datasets the more confused I felt because my plots look like as if a kindergarten-child had taken colored pens, one for each dataset, and drew very uncorrelated trajectories. Out of the about 5,500 genes on the Affymetrix Yeast 2 microarray chip, I only found a single gene, whose expression seemed to gradually decline as the yeast life neared its end. But I was not even certain that I could claim this one gene as an age-progression and remaining lifespan predicting discoveries, which could be used as a marker of aging, which first was proud because I thought I finally discovered something useful, I thought that if concentrations decline gradually among all of the over 50 lifespan datasets, whose measurements intervals exceeded a cell cycle period, for only a single out of over 5,500 genes on the chip, then this common trend could have easily been due to random chance because this gene has not been found to affect the rate of aging in any way.
By January 2017, I had plotted all lifespan time series yeast datasets, which devastated any of my few remaining hopes to ever graduate because I could not find any significant difference in any kind of similarity dimension , i.e. notably to hypothesize common function and highly correlated periodically recurring extremely rapid expressionoscillation pattern of almost everything important to life, i.e. changes in abundance ofmRNA, protein, metabolite, external stimuli, signally pattern, concentration and proton gradients, ribosomal coverage of mRNA, ribosomal composition of highly translationally bias subunits and rRNA sequences, membrane composition, enzymatic activities, electromagnetic interaction without which we'd have no chance to ever understand life and defeat death as long as we'd remain unaware of these period fluctuations and would erroneouslyhold on to 0 as a reference frame to which we'd compare all changes we can ever observe.
I discovered the necessity to only use datasets with time intervals between subsequent measurements of any life-indicating dimension by pure chance because I had become so desperate that I plotted any time series datasets I could possibly find at GEO in NCBI or Array Express until only cell cycle datasets were left. I did not consider cell cycle datasets of any benefit in finding ways to reverse aging because measurements were taken between 4.6-35 minutes intervals for up to 81 times but only for a few hours never exceeding 3 cell cycles unfortunately. After I plotted my first cell cycle dataset I suddenly noticed that genes believed to work together suddenly looked to have very obviously more similar time series expression trajectory than to all the other genes, of which we knew from experiments that they don't participate in the same function or process. I made simulations in R to show how the gradual increase in time intervals between subsequent measurements gradually caused distinct graphical features, such as local maximums, minimums, inflection and saddlepoints or distinct changes in slopes connecting measurements to get completely lost; thus, eliminating any kind of landmark without which we would never be able to group genes correctly.
Also I noticed that even 20 min intervals would be too long and cause us to inevitably lose the needed landmark features for successfully and correctly discriminate between gene-groups, who we need to manipulate as a unit if we want to ever have a chance to take fully and permanently control of all aspects of our lives because otherwise we cannot survive.
Otherwise, the time series trajectories between different completely independently and maybe even antagonistically functioning unrelated or even opposition would look too similar to one another for ever even having the slightest chance for correctly inferring obscure genefunctions, common mechanisms of regulation, which others had published to have succeeded to determine correctly, which genes (nodes) had to be connected by edges because their expression patterns are found to be much more correlated to one another than to any of the other genes, which are not participating to the same specific functions.
Before starting plotting cell cycle data of extremely high temporal resolution I could not imagine to ever find a way to connect genes belonging to the same pathway, molecular function, biological process, or other essential life actions, which require lots of highly coordinated expression teamwork, because just by looking at their plots over time, I would never have grouped them together despite being aware of the experimental proof for their coordination.
I am wondering how people could have ever succeeded in constructing gene co-expression or regulatory networks from subsequent measurements if any time intervals between them exceeds 1/10th of the cell cycle. I am worried about unnecessarily slower progress if people retain a linear concept about life and keep generating co-expression and other gene-, protein or regulatory network from data lacking essential parts of periodic information quantifying repetitive reoccurring changes experienced by all forms of life from datasets, whose time intervals between subsequent measurements exceed the critical length of at least 1/10th of the duration of the cell cycle because is inevitable causing the briefest but nevertheless most relevant criteria needed for preventing to accidentally group together functionally unrelated genes into the sane GO term. I think this is a very serious problems because for many of the smaller GO terms seem to be comprised of genes following two different main dominating expression pattern. According to my analysis I feel that many GO terms need to still be subdivided into smaller more specific yet independent functions because their member genes seem to belong to two different groups. But since our understanding is still so limited that GO terms get changed, retired and redefined on an ongoing basis I cannot trust and rely on them. But this limping together genes into a single GO term even though it should be spitted until all its member genes have proven to be essential for performing the task, in which they are believed to be involved. The uncertainty about the correct but most likely still flawed groupings of functionally unrelated genes into the same GO terms makes it hard for me to trust in my similarity measure because my expression data causes me to feel insecure if my similarity measures would not group together all the genes into the same GO term because of two distinct differences between presumably 2 differently functionally unrelated and independently operating groups of genes, who have evolved to function like a single unit consisting of very well tasks coordinating team members with addressing completely different needs of the cell to which they belong.