Watch Our Webinar With Bioptimus: Unlocking the Promise of Foundation ModelsSIGN UP

Accelerating Tumor Segmentation Model Development with Concentriq Embeddings

Proscia AI R&D Team
By Proscia AI R&D Team | October 1, 2024

Developing AI models for pathology has traditionally been a resource-intensive process, requiring complex data handling and significant computational power. Large whole slide images (WSIs) demand massive storage, and the manual steps of processing and transforming this data into a usable format for model development is traditionally a cumbersome and time-intensive task. With the rise of foundation models (trained on huge corpora of images), the initial steps of model development have been dramatically streamlined.

In this tutorial, we’ll explore how Proscia’s Concentriq® Embeddings streamlines the use of foundation models to transform the way data scientists and researchers approach AI development projects. We demonstrate the power and simplicity of building a tumor segmentation model with Concentriq Embeddings on a standard laptop, without the need for expensive GPU infrastructure or the logistical complexities associated with handling terabytes of training data. With Concentriq Embeddings, data scientists can efficiently generate WSI embeddings using multiple foundation models, enabling faster, more efficient AI development (Figure 1).

Figure 1. Workflow to generate embeddings using Concentriq Embeddings.

In the following sections, we showcase how Concentriq Embeddings accelerates the AI innovation path from concept to execution, making sophisticated model development accessible and expedient.

Tumor Segmentation Using Concentriq Embeddings

In this tutorial, we’ll use the CAMELYON17 dataset, a collection of WSIs in TIFF format from five medical centers in the Netherlands, with lesion-level annotations provided for 100 slides. This dataset is ideal for demonstrating how to quickly build a tumor segmentation model using high-resolution tile embeddings from Concentriq Embeddings.

We’ll illustrate how to:

  1. Generate embeddings at 1 micron per pixel (mpp, approx. 10X) using the DINOv2 model.
  2. Load embeddings and labels.
  3. Define and train a simple multi-layer perceptron (MLP) model in PyTorch.
  4. Evaluate patch-level performance.
  5. Visualize predictions with heatmaps.

We take this simple approach (no deep learning involved) to show 1) the power of embeddings derived from a foundation model—even one trained only on natural images—and, 2) how scientists with basic programming skills can leverage Concentriq Embeddings to achieve significant results.

import cv2
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from PIL import Image

from utils.client import ClientWrapper as Client
from utils import utils

Image.MAX_IMAGE_PIXELS = None
email = os.getenv("CONCENTRIQ_EMAIL")
pwd = os.getenv("CONCENTRIQ_PASSWORD")
endpoint = os.getenv("CONCENTRIQ_ENDPOINT_URL")

# To use CPU instead of GPU, set `device` parameter to `"cpu"`
ce_api_client = Client(url=endpoint, email=email, password=pwd, device=0)
ce_api_client

<utils.client.ClientWrapper at 0x7f2031d25f60>

Generating Embeddings

Now let’s embed the CAMELYON17 repository (stored on our Concentriq instance as repo ID 2784) at 1mpp resolution using the default (DINOv2) model, and print out the ticket ID.

The embeddings are returned in a compressed safetensors format, reducing the file size by a factor of 256 compared to the original WSIs, making them manageable even on standard hardware.

Our website uses cookies. By using this site, you agree to its use of cookies.