Paper (version 2 from 2022-10-05)
The paper introduces a new algorithm called CEBRA which is “a new dimensionality reduction method that can be applied jointly to behavioral and neural recordings to reveal meaningful lower dimensional neural population dynamics.” So from this sentence it appears to compete with the likes of t-SNE and UMAP but has a narrower focus than those.
Since the paper focuses on the neuroscience domain I am not the best person to share my opinion (as I am more focused on dimension reduction itself), so I’ll focus more on the method itself.
CEBRA is a contrastive learning method that has two different modes of operation. In lieu of data augmentations for neuro data they go back to the setup like in Contrastive Predictive Coding by van den Oord et al. (2019). The positive samples for a given reference point are derived by proximity in the time domain (in the time-driven mode).
The other mode, called hypothesis-driven mode, lets you specify the labels that you want to leverage for finding positive pairs in close proximity. So if you would use the same labels as for your downstream task (think of class labels for image classification), you would arrive at Supervised Contrastive Learning as introduced by Khosla et al. (2021). The important thing to note here is that the labels usually are not actually ground-truth labels, but instead can consist of extra information (for instance they encode extra information about the position of the mouse here, while the input to the network is the neuron spikes).
Surprisingly, despite featuring those two modes prominently, they also have a third mode that is a hybrid of those modes. It can take more than one dataset (along with behavior data) and embed that in a shared latent space. I would have liked some more exploration of this concept as it seems interesting to me.
They claim that CEBRA can also help with removing the batch effect from datasets. Would love to read more about that, I guess it ties into the previous point about the hybrid mode.
All in all, it seems like a bit of a standard contrastive learning setup applied to neuro data. In the current state it’s written a bit confusingly, but I also don’t know all of the terminology of that domain.
Why are the “position only” embeddings in Figure 2b and c different? I would expect them to fit the same loss and they should look more similar.