Skip to contents

This article shows the supported entry points into microbiome_dataset.

library(microbiomedataset)
data("global_patterns", package = "microbiomedataset")

Start from the packaged demo object

global_patterns
#> -------------------- 
#> microbiomedataset version: 0.99.1 
#> -------------------- 
#> 1.expression_data:[ 19216 x 26 data.frame]
#> 2.sample_info:[ 26 x 8 data.frame]
#> 3.variable_info:[ 19216 x 8 data.frame]
#> 4.sample_info_note:[ 8 x 2 data.frame]
#> 5.variable_info_note:[ 8 x 2 data.frame]
#> -------------------- 
#> Processing information (extract_process_info())
#> create_microbiome_dataset ---------- 
#>             Package               Function.used                Time
#> 1 microbiomedataset create_microbiome_dataset() 2022-07-11 01:56:13

This is the fastest way to learn the object structure:

head(global_patterns@sample_info[, c("sample_id", "SampleType")])
#>   sample_id SampleType
#> 1       CL3       Soil
#> 2       CC1       Soil
#> 3       SV1       Soil
#> 4   M31Fcsw      Feces
#> 5   M11Fcsw      Feces
#> 6   M31Plmr       Skin
head(global_patterns@variable_info[, c("variable_id", "Kingdom", "Phylum")])
#>   variable_id Kingdom        Phylum
#> 1      549322 Archaea Crenarchaeota
#> 2      522457 Archaea Crenarchaeota
#> 3         951 Archaea Crenarchaeota
#> 4      244423 Archaea Crenarchaeota
#> 5      586076 Archaea Crenarchaeota
#> 6      246140 Archaea Crenarchaeota

Create an object from rectangular data

At minimum, a new object needs:

  1. expression_data: rows are features and columns are samples.
  2. sample_info: first column must be sample_id.
  3. variable_info: first column must be variable_id.
expression_data <- global_patterns@expression_data[1:20, 1:6]

sample_info <-
  global_patterns@sample_info[1:6, c("sample_id", "SampleType", "class")]

variable_info <-
  global_patterns@variable_info[
    1:20,
    c("variable_id", "Kingdom", "Phylum", "Class", "Order",
      "Family", "Genus", "Species")
  ]

mini_object <- create_microbiome_dataset(
  expression_data = expression_data,
  sample_info = sample_info,
  variable_info = variable_info
)

mini_object
#> -------------------- 
#> microbiomedataset version: 0.99.1 
#> -------------------- 
#> 1.expression_data:[ 20 x 6 data.frame]
#> 2.sample_info:[ 6 x 3 data.frame]
#> 3.variable_info:[ 20 x 8 data.frame]
#> 4.sample_info_note:[ 3 x 2 data.frame]
#> 5.variable_info_note:[ 8 x 2 data.frame]
#> -------------------- 
#> Processing information (extract_process_info())
#> create_microbiome_dataset ---------- 
#>             Package               Function.used                Time
#> 1 microbiomedataset create_microbiome_dataset() 2026-03-04 20:43:20

Convert to and from phyloseq

The package supports interoperability with phyloseq.

phyloseq_object <- convert2phyloseq(global_patterns)
phyloseq_object
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 19216 taxa and 26 samples ]
#> sample_data() Sample Data:       [ 26 samples by 7 sample variables ]
#> tax_table()   Taxonomy Table:    [ 19216 taxa by 7 taxonomic ranks ]
#> phy_tree()    Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
roundtrip_object <- convert2microbiome_dataset(phyloseq_object)

dim(roundtrip_object@expression_data)
#> [1] 19216    26
head(roundtrip_object@variable_info[, c("variable_id", "Kingdom", "Phylum")])
#>   variable_id Kingdom        Phylum
#> 1      549322 Archaea Crenarchaeota
#> 2      522457 Archaea Crenarchaeota
#> 3         951 Archaea Crenarchaeota
#> 4      244423 Archaea Crenarchaeota
#> 5      586076 Archaea Crenarchaeota
#> 6      246140 Archaea Crenarchaeota
check_microbiome_dataset_class(roundtrip_object)
#> [1] TRUE

Inspect imported schema

During creation and conversion, taxonomy columns are standardized to the canonical ranks:

colnames(roundtrip_object@variable_info)
#> [1] "variable_id" "Kingdom"     "Phylum"      "Class"       "Order"      
#> [6] "Family"      "Genus"       "Species"

This standardization is what allows downstream taxonomy-aware verbs to work consistently across input sources.