
Create and import microbiome datasets
Xiaotao Shen xiaotao.shen@outlook.com
2026-03-04
Source:vignettes/import_data.Rmd
import_data.RmdThis article shows the supported entry points into
microbiome_dataset.
library(microbiomedataset)
data("global_patterns", package = "microbiomedataset")Start from the packaged demo object
global_patterns
#> --------------------
#> microbiomedataset version: 0.99.1
#> --------------------
#> 1.expression_data:[ 19216 x 26 data.frame]
#> 2.sample_info:[ 26 x 8 data.frame]
#> 3.variable_info:[ 19216 x 8 data.frame]
#> 4.sample_info_note:[ 8 x 2 data.frame]
#> 5.variable_info_note:[ 8 x 2 data.frame]
#> --------------------
#> Processing information (extract_process_info())
#> create_microbiome_dataset ----------
#> Package Function.used Time
#> 1 microbiomedataset create_microbiome_dataset() 2022-07-11 01:56:13This is the fastest way to learn the object structure:
head(global_patterns@sample_info[, c("sample_id", "SampleType")])
#> sample_id SampleType
#> 1 CL3 Soil
#> 2 CC1 Soil
#> 3 SV1 Soil
#> 4 M31Fcsw Feces
#> 5 M11Fcsw Feces
#> 6 M31Plmr Skin
head(global_patterns@variable_info[, c("variable_id", "Kingdom", "Phylum")])
#> variable_id Kingdom Phylum
#> 1 549322 Archaea Crenarchaeota
#> 2 522457 Archaea Crenarchaeota
#> 3 951 Archaea Crenarchaeota
#> 4 244423 Archaea Crenarchaeota
#> 5 586076 Archaea Crenarchaeota
#> 6 246140 Archaea CrenarchaeotaCreate an object from rectangular data
At minimum, a new object needs:
-
expression_data: rows are features and columns are samples. -
sample_info: first column must besample_id. -
variable_info: first column must bevariable_id.
expression_data <- global_patterns@expression_data[1:20, 1:6]
sample_info <-
global_patterns@sample_info[1:6, c("sample_id", "SampleType", "class")]
variable_info <-
global_patterns@variable_info[
1:20,
c("variable_id", "Kingdom", "Phylum", "Class", "Order",
"Family", "Genus", "Species")
]
mini_object <- create_microbiome_dataset(
expression_data = expression_data,
sample_info = sample_info,
variable_info = variable_info
)
mini_object
#> --------------------
#> microbiomedataset version: 0.99.1
#> --------------------
#> 1.expression_data:[ 20 x 6 data.frame]
#> 2.sample_info:[ 6 x 3 data.frame]
#> 3.variable_info:[ 20 x 8 data.frame]
#> 4.sample_info_note:[ 3 x 2 data.frame]
#> 5.variable_info_note:[ 8 x 2 data.frame]
#> --------------------
#> Processing information (extract_process_info())
#> create_microbiome_dataset ----------
#> Package Function.used Time
#> 1 microbiomedataset create_microbiome_dataset() 2026-03-04 20:43:20Convert to and from phyloseq
The package supports interoperability with phyloseq.
phyloseq_object <- convert2phyloseq(global_patterns)
phyloseq_object
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 19216 taxa and 26 samples ]
#> sample_data() Sample Data: [ 26 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 19216 taxa by 7 taxonomic ranks ]
#> phy_tree() Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]roundtrip_object <- convert2microbiome_dataset(phyloseq_object)
dim(roundtrip_object@expression_data)
#> [1] 19216 26
head(roundtrip_object@variable_info[, c("variable_id", "Kingdom", "Phylum")])
#> variable_id Kingdom Phylum
#> 1 549322 Archaea Crenarchaeota
#> 2 522457 Archaea Crenarchaeota
#> 3 951 Archaea Crenarchaeota
#> 4 244423 Archaea Crenarchaeota
#> 5 586076 Archaea Crenarchaeota
#> 6 246140 Archaea Crenarchaeota
check_microbiome_dataset_class(roundtrip_object)
#> [1] TRUEInspect imported schema
During creation and conversion, taxonomy columns are standardized to the canonical ranks:
colnames(roundtrip_object@variable_info)
#> [1] "variable_id" "Kingdom" "Phylum" "Class" "Order"
#> [6] "Family" "Genus" "Species"This standardization is what allows downstream taxonomy-aware verbs to work consistently across input sources.