Behold The VEGI database! This week’s tip of the week highlights a paper that provides thousands of data points. But they aren’t in the paper. You can access the results in a special browser installation they created to accompany that paper. I’ll show you where.
I know some people get tired of YAGS (Yet Another Genome Syndrome). But each time a new genome is released, I’m still really pleased and appreciative of the folks who did that work. I also get excited because I know it’s going to be really useful later, in comparison with other genomes and with further analyses of the conserved sequences–and the divergent ones too. And when one of those papers comes along, I can immerse myself for days in that data.
One of those papers was released last week. “An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions” compares a number of plant genomes, and generates some really cool stats as well as new sequence data. With the bonus of a UCSC Genome Browser installation to look around for more.
They compared a model organism (Arabidopsis), an one you might be eating (Chinese cabbage), some extremophile plants related to these (including a salt-tolerant one), and more. They also chose 3 species to sequence themselves, that made sense within the lineage and would help the analysis. Plants have some genome duplications that complicate the comparisons, and they picked some with and without certain duplication events. The total number they compared was nine genomes. This is such a cool opportunity and a great lineup to consider stuff we know pretty well like A. thaliana, and to look at related plants with very different lifestyles, and see what makes them similar and what makes them different–exactly what all these sequenced genomes should be helping us to do. But even this work is just a starting point for more biology still–there’s great stuff to be explored when researchers start to focus on their genes and pathways of interest and the regulation of them.
My particular favorite part was the analysis in figure 3. They show the breakdown of sites under selection across types of genomic elements. Coding sequences have the largest fraction, of course, at 77%. But I really liked the way they illustrated the non-coding selection data. There’s so much opportunity in the non-coding conserved regions to learn what’s driving development, responses to challenges, and more. (PS: there’s more of this kind of breakdown in the supplement.)
But as I keep saying–this data is not in the papers anymore. They can publish a summary of the work, and they can give huge supplemental data files (in this case 11 supplemental figures and 9 supplemental tables), but to really get into the results you need access to the data from the repositories and–my favorite case–they create a project-specific browser to accompany their work.
Read more
0 comments:
POST A COMMENT