---
title: "Feature List and Roadmap"
vignette: >
  %\VignetteIndexEntry{Feature List and Roadmap}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{quarto::html}
knitr:
  opts_chunk: 
    collapse: true
    comment: "#>"
---

This document outlines the features of the **{laminr}** package and the roadmap for future development.

## Features

### Connect to an instance

* [x] Connect to a LaminDB instance (`connect()`).
* [x] Handle authentication and authorization.
* [ ] Connect to a LaminDB instance without needing to install the `lamin_cli` Python package.

### Query & search

* [x] **Query exactly one record** (`Registry$get(...)`): Fetch a single record by ID.
* [ ] **Query sets of records** (`Registry$filter()`): Fetch multiple records based on filters.
  - [x] `$df()`: Returns a data frame with each record in a row.
  - [ ] `$all()`: Returns all records as a `QuerySet`.
  - [ ] `$one()`: Return exactly one record.
  - [ ] `$one_or_none()`: Return one record or `NULL`.
* [ ] **Leverage relationships when querying** (`Artifact$filter(created_by__handle__startswith = "testuse")$df()`): Query records based on relationships.
* [ ] **Comparators**: Use comparators in filters.
  - [ ] `and`: Example: `Artifact$filter(suffix = ".jpg", created_by = user)`
  - [ ] `less than` / `greater than`: Example: `Artifact$filter(size__lt = 1e4)`
  - [ ] `in`: Example: `Artifact$filter(suffix_in = [".jpg", ".fastq.gz"])`
  - [ ] `order by`: Example: `Artifact$filter().order_by("created_at")`
  - [ ] `contains`: Example: `Artifact$filter(name__contains = "test")`
  - [ ] `startswith`: Example: `Artifact$filter(name__startswith = "test")`
  - [ ] `or`: Example: `...`
  - [ ] `not`: Example: `...`
* [ ] **Search for records** (`Registry$search(...)`): Search for records based on a query string.
* [ ] **Pagination**: Support pagination for large query results.
* [ ] **Field lookups**: Provide convenient functions for looking up field values (e.g., `Artifact$lookup("description")`).

### Manage data & metadata

* [ ] **Create artifacts**: Create new artifacts from various data sources (e.g., files, data frames, in-memory objects).
  - [x] `$from_df()`: Create an artifact from a data frame.
  - [x] `$from_path()`: Create an artifact from a path.
  - [x] `$from_anndata()`: Create an artifact from an `AnnData`.
* [x] **Save artifacts**: Save artifacts to LaminDB with appropriate metadata.
* [ ] **Load artifacts**: Load artifacts from LaminDB into R:
  - [x] `csv`: Load a data frame from a CSV file.
  - [ ] `fcs`: Load flow cytometry data.
  - [x] `h5ad`: Load an AnnData from an HDF5 file.
  - [ ] `h5mu`: Load a MuData from an HDF5 file.
  - [x] `html`: Load content from an HTML file.
  - [x] `jpg`: Load an image from JPG.
  - [x] `json`: Load data from a JSON file.
  - [x] `parquet`: Load a data frame from a Parquet file.
  - [x] `png`: Load an image from PNG.
  - [x] `rds`: Load an R object from an RDS file.
  - [x] `svg`: Load an image from SVG.
  - [x] `tsv`: Load a data frame from a TSV file.
  - [x] `yaml`: Load data from a YAML file.
  - [ ] `zarr`: Load an AnnData from a Zarr store.
* [ ] **Cache artifacts**: Cache artifacts locally for faster access:
  - [x] `s3`: Interact with S3 storage.
  - [ ] `gcp`: Interact with Google Cloud Storage.
* [ ] **Version artifacts**: Create new versions of artifacts.
* [x] **Delete artifacts**: Delete an existing artifact.
* [ ] **Manage artifact metadata**: Add, update, and delete artifact metadata.
* [ ] **Work with collections**: Create, manage, and query collections of artifacts.
* [ ] **Stream backed artifacts**: Connect to file-backed artifacts (`$open`).
  - [x] `tiledbsoma`: Stream TileDB-SOMA objects

### Track notebooks & scripts

* [x] **Track code execution**: Automatically track the execution of R scripts and notebooks.
* [ ] **Capture run context**: Record information about the execution environment (e.g., package versions, parameters).
* [x] **Link code to artifacts**: Associate code execution with generated artifacts.
  - [x] Link to artifacts loaded from other instances
* [ ] **Visualize data lineage**: Create visualizations of data lineage and dependencies.
* [x] **Finalize tracking**: End and save a run.

### Curate datasets

* [ ] **Validate data**: Validate data against predefined schemas or constraints.
* [ ] **Standardize data**: Apply standardization rules to ensure data consistency.
* [ ] **Annotate data**: Add annotations and labels to data.
* [ ] **Use the Curator class**: Implement the `Curator` class for a streamlined curation workflow.

### Access public ontologies

* [ ] **Access ontology data**: Fetch data from public ontologies (e.g., gene names, protein IDs).
* [ ] **Search ontologies**: Search for entities within ontologies.
* [ ] **Use ontology terms in queries**: Use ontology terms to filter and query data.
* [ ] **Manage ontology versions**: Access different versions of ontologies.

### Manage biological registries

* [ ] **Create and manage records in bionty registries**: Add, update, and delete records for genes, proteins, cell types, etc.
* [ ] **Utilize hierarchical relationships**: Navigate and query based on parent-child relationships in ontologies.
* [ ] **Manage synonyms**: Add and use synonyms for biological entities.

### Manage schema modules

* [x] **List available modules**: Retrieve a list of available modules in an instance.
* [x] **Access module registries**: Access registries within specific modules.
* [ ] **(Advanced) Create custom modules**: Define and register custom schema modules.

### Transfer data

* [x] **Upload data**: Upload data files to LaminDB storage.
* [x] **Download data**: Download data files from LaminDB storage.
* [ ] **(Advanced) Support zero-copy data transfer**: Implement efficient data transfer mechanisms.

## Roadmap

### Version 0.1.0

A first version of the package that allows users to:

* Connect to a LaminDB instance.
* List all records in a registry.
* Fetch one record by ID or UID.
* Cache S3 artifacts locally.
* Load AnnData artifacts.

### Version 0.2.0

* Implement basic data and metadata management features (create, save, load and delete artifacts).
* Expand support for different data formats.
* Implement code tracking.

### Version 0.3.0

* Track input artifacts.
* Support for more storage backends using a reticulate Python backend.

### Version 0.4.0

* Expand support for different storage backends.
* Expand query functionality with comparators, relationships, and pagination.
* Implement data lineage visualization.
* Introduce data curation features (validation, standardization, annotation).
* Enhance support for bionty registries and ontology interactions.
* Connect to TileDB-SOMA artifacts.

### Future versions

* Implement advanced features like custom module creation and zero-copy data transfer.
* Continuously improve performance, usability, and documentation.