--- title: "Feature List and Roadmap" vignette: > %\VignetteIndexEntry{Feature List and Roadmap} %\VignetteEncoding{UTF-8} %\VignetteEngine{quarto::html} knitr: opts_chunk: collapse: true comment: "#>" --- This document outlines the features of the **{laminr}** package and the roadmap for future development. ## Features ### Connect to an instance * [x] Connect to a LaminDB instance (`connect()`). * [x] Handle authentication and authorization. * [ ] Connect to a LaminDB instance without needing to install the `lamin_cli` Python package. ### Query & search * [x] **Query exactly one record** (`Registry$get(...)`): Fetch a single record by ID. * [ ] **Query sets of records** (`Registry$filter()`): Fetch multiple records based on filters. - [x] `$df()`: Returns a data frame with each record in a row. - [ ] `$all()`: Returns all records as a `QuerySet`. - [ ] `$one()`: Return exactly one record. - [ ] `$one_or_none()`: Return one record or `NULL`. * [ ] **Leverage relationships when querying** (`Artifact$filter(created_by__handle__startswith = "testuse")$df()`): Query records based on relationships. * [ ] **Comparators**: Use comparators in filters. - [ ] `and`: Example: `Artifact$filter(suffix = ".jpg", created_by = user)` - [ ] `less than` / `greater than`: Example: `Artifact$filter(size__lt = 1e4)` - [ ] `in`: Example: `Artifact$filter(suffix_in = [".jpg", ".fastq.gz"])` - [ ] `order by`: Example: `Artifact$filter().order_by("created_at")` - [ ] `contains`: Example: `Artifact$filter(name__contains = "test")` - [ ] `startswith`: Example: `Artifact$filter(name__startswith = "test")` - [ ] `or`: Example: `...` - [ ] `not`: Example: `...` * [ ] **Search for records** (`Registry$search(...)`): Search for records based on a query string. * [ ] **Pagination**: Support pagination for large query results. * [ ] **Field lookups**: Provide convenient functions for looking up field values (e.g., `Artifact$lookup("description")`). ### Manage data & metadata * [ ] **Create artifacts**: Create new artifacts from various data sources (e.g., files, data frames, in-memory objects). - [x] `$from_df()`: Create an artifact from a data frame. - [x] `$from_path()`: Create an artifact from a path. - [x] `$from_anndata()`: Create an artifact from an `AnnData`. * [x] **Save artifacts**: Save artifacts to LaminDB with appropriate metadata. * [ ] **Load artifacts**: Load artifacts from LaminDB into R: - [x] `csv`: Load a data frame from a CSV file. - [ ] `fcs`: Load flow cytometry data. - [x] `h5ad`: Load an AnnData from an HDF5 file. - [ ] `h5mu`: Load a MuData from an HDF5 file. - [x] `html`: Load content from an HTML file. - [x] `jpg`: Load an image from JPG. - [x] `json`: Load data from a JSON file. - [x] `parquet`: Load a data frame from a Parquet file. - [x] `png`: Load an image from PNG. - [x] `rds`: Load an R object from an RDS file. - [x] `svg`: Load an image from SVG. - [x] `tsv`: Load a data frame from a TSV file. - [x] `yaml`: Load data from a YAML file. - [ ] `zarr`: Load an AnnData from a Zarr store. * [ ] **Cache artifacts**: Cache artifacts locally for faster access: - [x] `s3`: Interact with S3 storage. - [ ] `gcp`: Interact with Google Cloud Storage. * [ ] **Version artifacts**: Create new versions of artifacts. * [x] **Delete artifacts**: Delete an existing artifact. * [ ] **Manage artifact metadata**: Add, update, and delete artifact metadata. * [ ] **Work with collections**: Create, manage, and query collections of artifacts. * [ ] **Stream backed artifacts**: Connect to file-backed artifacts (`$open`). - [x] `tiledbsoma`: Stream TileDB-SOMA objects ### Track notebooks & scripts * [x] **Track code execution**: Automatically track the execution of R scripts and notebooks. * [ ] **Capture run context**: Record information about the execution environment (e.g., package versions, parameters). * [x] **Link code to artifacts**: Associate code execution with generated artifacts. - [x] Link to artifacts loaded from other instances * [ ] **Visualize data lineage**: Create visualizations of data lineage and dependencies. * [x] **Finalize tracking**: End and save a run. ### Curate datasets * [ ] **Validate data**: Validate data against predefined schemas or constraints. * [ ] **Standardize data**: Apply standardization rules to ensure data consistency. * [ ] **Annotate data**: Add annotations and labels to data. * [ ] **Use the Curator class**: Implement the `Curator` class for a streamlined curation workflow. ### Access public ontologies * [ ] **Access ontology data**: Fetch data from public ontologies (e.g., gene names, protein IDs). * [ ] **Search ontologies**: Search for entities within ontologies. * [ ] **Use ontology terms in queries**: Use ontology terms to filter and query data. * [ ] **Manage ontology versions**: Access different versions of ontologies. ### Manage biological registries * [ ] **Create and manage records in bionty registries**: Add, update, and delete records for genes, proteins, cell types, etc. * [ ] **Utilize hierarchical relationships**: Navigate and query based on parent-child relationships in ontologies. * [ ] **Manage synonyms**: Add and use synonyms for biological entities. ### Manage schema modules * [x] **List available modules**: Retrieve a list of available modules in an instance. * [x] **Access module registries**: Access registries within specific modules. * [ ] **(Advanced) Create custom modules**: Define and register custom schema modules. ### Transfer data * [x] **Upload data**: Upload data files to LaminDB storage. * [x] **Download data**: Download data files from LaminDB storage. * [ ] **(Advanced) Support zero-copy data transfer**: Implement efficient data transfer mechanisms. ## Roadmap ### Version 0.1.0 A first version of the package that allows users to: * Connect to a LaminDB instance. * List all records in a registry. * Fetch one record by ID or UID. * Cache S3 artifacts locally. * Load AnnData artifacts. ### Version 0.2.0 * Implement basic data and metadata management features (create, save, load and delete artifacts). * Expand support for different data formats. * Implement code tracking. ### Version 0.3.0 * Track input artifacts. * Support for more storage backends using a reticulate Python backend. ### Version 0.4.0 * Expand support for different storage backends. * Expand query functionality with comparators, relationships, and pagination. * Implement data lineage visualization. * Introduce data curation features (validation, standardization, annotation). * Enhance support for bionty registries and ontology interactions. * Connect to TileDB-SOMA artifacts. ### Future versions * Implement advanced features like custom module creation and zero-copy data transfer. * Continuously improve performance, usability, and documentation.