
Get started with microbiomedataset
Xiaotao Shen xiaotao.shen@outlook.com
2026-03-04
Source:vignettes/microbiomedataset.Rmd
microbiomedataset.Rmdmicrobiomedataset provides a rectangular data container
for microbiome analysis. Its design follows the core idea of
massdataset: keep the abundance matrix and all related
metadata synchronized, then make common processing steps explicit and
traceable.
Load the demo dataset
library(microbiomedataset)
data("global_patterns", package = "microbiomedataset")
object <- global_patterns
object
#> --------------------
#> microbiomedataset version: 0.99.1
#> --------------------
#> 1.expression_data:[ 19216 x 26 data.frame]
#> 2.sample_info:[ 26 x 8 data.frame]
#> 3.variable_info:[ 19216 x 8 data.frame]
#> 4.sample_info_note:[ 8 x 2 data.frame]
#> 5.variable_info_note:[ 8 x 2 data.frame]
#> --------------------
#> Processing information (extract_process_info())
#> create_microbiome_dataset ----------
#> Package Function.used Time
#> 1 microbiomedataset create_microbiome_dataset() 2022-07-11 01:56:13The package demo object already contains the three core data layers:
dim(object@expression_data)
#> [1] 19216 26
head(object@sample_info[, c("sample_id", "SampleType", "class")])
#> sample_id SampleType class
#> 1 CL3 Soil Subject
#> 2 CC1 Soil Subject
#> 3 SV1 Soil Subject
#> 4 M31Fcsw Feces Subject
#> 5 M11Fcsw Feces Subject
#> 6 M31Plmr Skin Subject
head(object@variable_info[, c("variable_id", "Kingdom", "Phylum", "Genus")])
#> variable_id Kingdom Phylum Genus
#> 1 549322 Archaea Crenarchaeota <NA>
#> 2 522457 Archaea Crenarchaeota <NA>
#> 3 951 Archaea Crenarchaeota Sulfolobus
#> 4 244423 Archaea Crenarchaeota <NA>
#> 5 586076 Archaea Crenarchaeota <NA>
#> 6 246140 Archaea Crenarchaeota <NA>Understand the object design
microbiome_dataset keeps these layers aligned:
-
expression_data: a feature-by-sample abundance matrix. -
sample_info: sample metadata keyed bysample_id. -
variable_info: feature metadata keyed byvariable_id. - Optional microbiome-specific attachments:
otu_tree,taxa_tree, andref_seq. - Explicit tree link metadata:
otu_tree_linkandtaxa_tree_link.
The object structure is:
flowchart LR
A["expression_data"] --> B["variable_info"]
A --> C["sample_info"]
B --> D["otu_tree_link"]
B --> E["taxa_tree_link"]
D --> F["otu_tree"]
E --> G["taxa_tree"]
B --> H["ref_seq"]
The built-in validator checks that these parts remain synchronized.
check_microbiome_dataset_class(object)
#> [1] TRUEExplore the basic accessors
nrow(object)
#> variables
#> 19216
ncol(object)
#> samples
#> 26
head(extract_sample_info(object))
#> sample_id Primer Final_Barcode Barcode_truncated_plus_T Barcode_full_length
#> 1 CL3 ILBC_01 AACGCA TGCGTT CTAGCGTGCGT
#> 2 CC1 ILBC_02 AACTCG CGAGTT CATCGACGAGT
#> 3 SV1 ILBC_03 AACTGT ACAGTT GTACGCACAGT
#> 4 M31Fcsw ILBC_04 AAGAGA TCTCTT TCGACATCTCT
#> 5 M11Fcsw ILBC_05 AAGCTG CAGCTT CGACTGCAGCT
#> 6 M31Plmr ILBC_07 AATCGT ACGATT CGAGTCACGAT
#> SampleType Description class
#> 1 Soil Calhoun South Carolina Pine soil, pH 4.9 Subject
#> 2 Soil Cedar Creek Minnesota, grassland, pH 6.1 Subject
#> 3 Soil Sevilleta new Mexico, desert scrub, pH 8.3 Subject
#> 4 Feces M3, Day 1, fecal swab, whole body study Subject
#> 5 Feces M1, Day 1, fecal swab, whole body study Subject
#> 6 Skin M3, Day 1, right palm, whole body study Subject
head(extract_variable_info(object))
#> variable_id Kingdom Phylum Class Order Family
#> 1 549322 Archaea Crenarchaeota Thermoprotei <NA> <NA>
#> 2 522457 Archaea Crenarchaeota Thermoprotei <NA> <NA>
#> 3 951 Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfolobaceae
#> 4 244423 Archaea Crenarchaeota Sd-NA <NA> <NA>
#> 5 586076 Archaea Crenarchaeota Sd-NA <NA> <NA>
#> 6 246140 Archaea Crenarchaeota Sd-NA <NA> <NA>
#> Genus Species
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 Sulfolobus Sulfolobusacidocaldarius
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> 6 <NA> <NA>
extract_expression_data(object)[1:6, 1:6]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw M31Plmr
#> 549322 0 0 0 0 0 0
#> 522457 0 0 0 0 0 0
#> 951 0 0 0 0 0 0
#> 244423 0 0 0 0 0 0
#> 586076 0 0 0 0 0 0
#> 246140 0 0 0 0 0 0Objects saved with an older package version can be upgraded to the current schema:
updated_object <- update_microbiome_dataset(object)
updated_object@version
#> [1] "0.99.1"Use tidy filtering without breaking the object
Activate sample_info or variable_info, then
use familiar dplyr verbs.
Run the most common first steps
Transform counts to relative abundance:
relative_object <- transform_counts(object, method = "relative")
colSums(relative_object@expression_data)[1:5]
#> CL3 CC1 SV1 M31Fcsw M11Fcsw
#> 1 1 1 1 1Aggregate features to a taxonomic rank:
genus_object <-
summarise_taxa(
genus_defined,
taxonomic_rank = "Genus",
what = "sum_intensity"
)
dim(genus_object@expression_data)
#> [1] 983 26
head(genus_object@variable_info[, c("variable_id", "Genus")])
#> variable_id Genus
#> 1 Sulfolobus Sulfolobus
#> 2 Cenarchaeum Cenarchaeum
#> 3 Nitrosopumilus Nitrosopumilus
#> 4 CandidatusNitrososphaera CandidatusNitrososphaera
#> 5 Natronococcus Natronococcus
#> 6 Natronorubrum NatronorubrumExport tidy long tables:
feature_table <- melt_features(object, relative = TRUE)
taxa_table <- melt_taxa(object, taxonomic_rank = "Phylum", relative = TRUE)
head(feature_table[, c("sample_id", "variable_id", "abundance")])
#> sample_id variable_id abundance
#> 1 CL3 549322 0
#> 2 CC1 549322 0
#> 3 SV1 549322 0
#> 4 M31Fcsw 549322 0
#> 5 M11Fcsw 549322 0
#> 6 M31Plmr 549322 0
head(taxa_table[, c("sample_id", "Phylum", "abundance")])
#> sample_id Phylum abundance
#> 1 AQC1cm ABY1_OD1 0.0005138174
#> 2 AQC1cm AC1 0.0021409059
#> 3 AQC1cm AD3 0.0002569087
#> 4 AQC1cm Acidobacteria 1.2521730194
#> 5 AQC1cm Actinobacteria 2.2601114984
#> 6 AQC1cm Armatimonadetes 0.0690228049Where to go next
The rest of the tutorial set is organized by task:
-
import_data: create objects and convert from other ecosystems. -
preprocess: filtering, transformation, and synchronization-safe verbs. -
merge: taxonomic aggregation and long-table export. -
relative_intensity: diversity and ordination analysis. -
filtering: trees, sequences, and interoperability. -
microbiome_workflow: a compact end-to-end workflow.