Saturday, 15 March 2025

A genome assembly and annotation for the Australian alpine skink Bassiana duperreyi using long-read technologies.

Hanrahan BJ, Alreja K, Reis ALM, Chang JK, Dissanayake DSB, Edwards RJ, Bertozzi T, Hammond JM, O’Meally D, Deveson IW, Georges A, Waters P & Patel HR (accepted): A genome assembly and annotation for the Australian alpine skink Bassiana duperreyi using long-read technologies. G3 jkaf046, DOI: 10.1093/g3journal/jkaf046 [G3] [PubMed]

Abstract

The eastern three-lined skink (Bassiana duperreyi) inhabits the Australian high country in the southeast of the continent including Tasmania. It is a distinctive oviparous species because it undergoes sex reversal (from XX genotypic females to phenotypic males) at low incubation temperatures. We present a chromosome-scale genome assembly of a Bassiana duperreyi XY male individual, constructed using PacBio HiFi and ONT long reads scaffolded using Illumina HiC data. The genome assembly length is 1.57 Gbp with a scaffold N50 of 222 Mbp, N90 of 26 Mbp, 200 gaps and 43.10% GC content. Most (95%) of the assembly is scaffolded into 6 macrochromosomes, 8 microchromosomes and the X chromosome, corresponding to the karyotype. Fragmented Y chromosome scaffolds (n=11 > 1 Mbp) were identified using Y-specific contigs generated by genome subtraction. We identified two novel alpha-satellite repeats of 187 bp and 199 bp in the putative centromeres that did not form higher order repeats. The genome assembly exceeds the standard recommended by the Earth Biogenome Project; 0.02% false expansions, 99.63% kmer completeness, 94.66% complete single copy BUSCO genes and an average 98.42% of transcriptome data mappable to the genome assembly. The mitochondrial genome (17,506 bp) and the model rDNA repeat unit (15,154 bp) were assembled. The B. duperreyi genome assembly has high completeness for a skink and will provide a resource for research focused on sex determination and thermolabile sex reversal, as an oviparous foundation species for studies of the evolution of viviparity, and for other comparative genomics studies of the Scincidae.

Friday, 14 March 2025

Chromosome-level genome assembly of the spangled emperor, Lethrinus nebulosus (Forsskål 1775)

The first data note from the Ocean Genomes project is now out in Scientific Data - a first-in-family genome from the emperor breams.

Parata L, Anstiss L, de Jong E, Doran A, Edwards RJ, Newman SJ, Payet SD, Skepper CL, Wakefield CB, OceanOmics Centre, OceanOmics Division & Corrigan S (2025): Chromosome-level genome assembly of the spangled emperor, Lethrinus nebulosus (Forsskål 1775). Scientific Data 12:435. DOI: 10.1038/s41597-025-04690-w.

Abstract

Spangled emperor, Lethrinus nebulosus (Forsskål 1775), is a tropical marine fish of economic and cultural importance throughout the Indo-West Pacific. It is one of the most targeted recreational fishes in the Gascoyne Coast Bioregion of Western Australia where it serves as an indicator species for recreational fishing. Here, we present a highly accurate, near-gapless, chromosome-level, haplotype-phased reference genome assembly of L. nebulosus (Lethrinus nebulosus (Spangled Emperor) genome, fLetNeb1.1; PRJNA1074345), the first for the species and the first high-quality genome representative of the family Lethrinidae. The 1.09 Gb genome was assembled from PacBio HiFi and Dovetail Omni-C proximity ligation sequencing data. The contig N50 is 21–24 Mbp and BUSCO completeness greater than 99%. A preliminary gene annotation identified 24,583 genes with the predicted transcriptome achieving a BUSCO completeness score of 99.1%. This resource will facilitate genomic studies to inform the sustainable management of L. nebulosus and other Lethrinids.

Thursday, 13 February 2025

Small but mitey: a gapless telomere-to-telomere assembly of an unidentified mite with a streamlined genome

A few years ago, we accidentally sequenced an interesting mite when making the Rhodamnia argentea reference genome. We don’t really know what it is, but thanks to nanopore sequencing and a low repeat content, we produced a gapless telomere-to-telomere assembly! As an unfunded passion project, it’s been ticking along in the background since then, but is now published in Genome Biology and Evolution.

One of the most interesting aspects of the paper is the genome reduction - despite being a complete nuclear genome, the assembly is under 35 megabases! This is a couple of orders of magnitude smaller than many other arachnid genomes.

We are yet to do a comprehensive analysis, but just looking at the core set of expected arachnid genes, as defined by BUSCO (e.g. single-copy genes that are expected to be present in most arachnid genomes) revealed that about a third of them were missing. (This low completeness was partly responsible for the time taken to fully recognised and deal with the contamination.) Tellingly, the closest sequenced relatives of this mite also have reduced genomes and are missing many of the same BUSCO genes, revealing a long history of gene loss.

The proportion of “Duplicated” BUSCO genes is also surprisingly high at 4.5%. These are genuine duplications, with consistent diploid read depths and many of the pairs present on both chromosomes. It will be interesting to see if these duplications are replacing some of the lost functions from the other genes, or are novel genes behind such a specialised lifestyle. As an unfunded passion project, it was beyond scope to investigate the full annotation as part of this paper, but get in touch if you would be interested to do this!

Edwards RJ, Chen SH, Halliday B & Bragg JG (2025): Small but mitey: a gapless telomere-to-telomere assembly of an unidentified mite with a streamlined genome. Genome Biology and Evolution Feb 13. [Gen Biol Evol] [PubMed]

Abstract

A draft assembly of the rainforest tree Rhodamnia argentea Benth. (malletwood, Myrtaceae) revealed contaminating DNA sequences that most closely matched those from mites in the family Eriophyidae. Eriophyoid mites are plant parasites that often induce galls or other deformities on their host plants. They are notable for their small size (averaging 200 μm), distinctive four-legged body structure, and heavily streamlined genomes, which are among the smallest known of all arthropods. Contaminating mite sequences were assembled into a high-quality gapless telomere-to-telomere nuclear genome. The entire genome was assembled on two fully contiguous chromosomes, capped with a novel TTTGG or TTTGGTGTTGG telomere sequence, and exhibited clear signs of genome reduction (34.5 Mbp total length, 68.6% arachnid Benchmarking Universal Single-Copy Ortholog completeness). Phylogenomic analysis confirmed that this genome is that of a previously unsequenced eriophyoid mite. Despite its unknown identity, this complete nuclear genome provides a valuable resource to investigate invertebrate genome reduction.

Monday, 27 January 2025

A reference genome for the eastern bettong (Bettongia gaimardi)

Silver LW, Edwards RJ, Neaves L,A Manning, CJ Hogg & S Banks (2025): A reference genome for the eastern bettong (Bettongia gaimardi) [version 2; peer review: 3 approved]. F1000Research 13:1544. [F1000Res] [PubMed]

Abstract

The eastern or Tasmanian bettong (Bettongia gaimardi) is one of four extant bettong species and is listed as ‘Near Threatened’ by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.36Kb and scaffold N50 of 2.93Mb. Additionally, we used GeMoMa to provide and accompanying annotation for the reference genome. The generation of a reference genome for the eastern bettong provides a vital resource for the conservation of the species.

Tuesday, 17 December 2024

Parental assigned chromosomes for cultivated cacao provides insights into genetic architecture underlying resistance to vascular streak dieback

As a big fan of chocolate, it was great to have the opportunity to help out with a cacao genome. This study presents the first diploid, fully scaffolded, and parentally phased genome resource for Theobroma cacao L. to provide insights into the genetic architecture underlying resistance and susceptibility to vascular streak dieback (VSD), a significant threat to cacao production in Southeast Asia and Melanesia. By analyzing NLR gene clusters and other disease response gene candidates in proximity to informative QTLs, the research identifies structural variants within NLRs inherited from resistant and susceptible parents, offering potential breeding targets for VSD resistance.

Tobias PA, Downs J, Epaina P, Singh G, Park RF, Edwards RJ, Brugman E, Zulkifli A, Muhammad J, Purwantara A & Guest DI (2024): Parental assigned chromosomes for cultivated cacao provides insights into genetic architecture underlying resistance to vascular streak dieback. The Plant Genome doi: 10.1002/tpg2.20524. [Plant Genome] [bioRxiv] [PubMed]

Abstract

Diseases of Theobroma cacao L. (Malvaceae) disrupt cocoa bean supply and economically impact growers. Vascular streak dieback (VSD), caused by Ceratobasidium theobromae, is a new encounter disease of cacao currently contained to southeast Asia and Melanesia. Resistance to VSD has been tested with large progeny trials in Sulawesi, Indonesia, and in Papua New Guinea with the identification of informative quantitative trait loci (QTLs). Using a VSD susceptible progeny tree (clone 26), derived from a resistant and susceptible parental cross, we assembled the genome to chromosome-level and discriminated alleles inherited from either resistant or susceptible parents. The parentally phased genomes were annotated for all predicted genes and then specifically for resistance genes of the nucleotide-binding site leucine-rich repeat class (NLR). On investigation, we determined the presence of NLR clusters and other potential disease response gene candidates in proximity to informative QTLs. We identified structural variants within NLRs inherited from parentals. We present the first diploid, fully scaffolded, and parentally phased genome resource for T. cacao L. and provide insights into the genetics underlying resistance and susceptibility to VSD.

#AusEvol2024 - Depth-based correction of gene duplications and losses in genome assemblies

The Australasian Evolution Society conference has always been one of my favourites, due to its laid back culture of inclusivity and kindness. (And low cost!) It therefore feels quite fitting that my last conference as an Aussie academic was AES2024.

This talk was a bit of an update from my AES2021 presentation. This showcased some of the latest additions to DepthKopy, including depth-based copy number correction of genome features, such as rDNA genes, repeat families, or multicopy genes. This includes a feature that classifies multicopy “Duplicated” genes identified by BUSCO as true (biological) or false (artefactual) duplicates. TL/DR version: analysis of draft genome assemblies for 45 species of fish across five different depths/qualities indicates that DepthKopy can correct the copy number and total length of multicopy features to within 10% of the true number. (The lower-quality raw assemblies ranged from a 30% under-estimate to a 60% over-estimate.)

This will be of most importance when low quality draft genomes are included in a comparative genomics analysis. However, even the best genome assemblies appear to have some “collapsed” or duplicated loci where the copy number in the assembly does not accurately reflect the copy number in the genome. DepthKopy is useful for exploring the magnitude of such disparities, and can help to identify and correct specific disrepancies in genes or features of interest.

Tuesday, 3 December 2024

So long, Ocean Genomes... and thanks for all the fish!

After a successful couple of years, today was my last day at UWA. I am proud of the team that I helped to build at the Minderoo Oceanomics Centre at UWA, and the things we have accomplished together. The UWA Oceans Institute has been a fantastic place to work, and I look forward to completing some exciting ongoing collaborations in my capacity as adjunct. It's been exciting to see Ocean Genomes grow from a concept with a largely empty lab to a fully-fledged genome factory capable of generating multiple high-quality genomes a week. The associated publications should hopefully be following soon, and I look forward to continued collaboration with the team as an Oceans Institute adjunct.

Developments in DNA sequencing technology over the past few years have been immense, but the most impressive part for me has been witnessing the laboratory technical team optimising the sample preparations for sequencing. Everything gets so much harder when you move from human samples (the focus of most methods development and testing) into non-model organisms, and I am convinced that the quality of genomes we’ve been producing is in large part due to the quality of the DNA going into the sequencers.

I am now looking for my next challenge and am officially Open For Work. We’ll be moving back to Dublin at the end of January. If you are based in Ireland and need an experienced interdisciplinary problem solver with broad expertise across bioinformatics and biomolecular science, please get in touch! Academic and non-academic opportunities are welcome.