cortex-lab / kilosort Goto Github PK

GPU code for spike sorting

License: GNU General Public License v2.0

MATLAB 79.83% M 0.08% Cuda 20.09%

kilosort's Introduction

Kilosort2 has been released! Try it out here.

Fast spike sorting for hundreds of channels

Implements an integrated template matching framework for detecting and clustering spikes from multi-channel electrophysiological recordings. Very fast when a GPU is available, but can also run on the CPU side. Described in this NIPS paper

Pachitariu M, Steinmetz NA, Kadir S, Carandini M and Harris KD (2016). Fast and accurate spike sorting of high-channel count probes with Kilosort. Advances In Neural Information Processing Systems. 4448-4456 https://papers.nips.cc/paper/6326-fast-and-accurate-spike-sorting-of-high-channel-count-probes-with-kilosort

The following preprint contains slightly more information:

Pachitariu M, Steinmetz NA, Kadir S, Carandini M and Harris KD (2016). Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. bioRxiv dx.doi.org/10.1101/061481, link.

Installation

If you are running on the GPU, you must run mexGPUall in the CUDA folder after setting up mexcuda in Matlab. Steps for achieving this in Windows 10 are below; see the Docs folder for more details and other operating systems.

Windows 10 installation

Install matlab R2018a from www.mathworks.com. You can use older versions of matlab but see notes below.
Install cuda 9.0 from https://developer.nvidia.com/cuda-90-download-archive. This version of CUDA works with Matlab R2018a. For older versions of matlab, skip this for now, and at a later stage it will tell you what version of CUDA you need. Then search on google or the NVIDIA site for that version.
Install visual studio 2013 community from https://www.visualstudio.com/vs/older-downloads/. Installing this should come after installing matlab or else matlab may not find the compiler, in which case you won't see it in the mex setup list (see below). You should be able to install again on top of an old installation, if you already had it from before matlab installation.
Now in matlab, choose the compiler:.

>> mex -setup C++ % gives an output like below:

To choose a different C++ compiler, select one from the following:
MinGW64 Compiler (C++)  mex -setup:'C:\Program Files\MATLAB\R2018a\bin\win64\mexopts\mingw64_g++.xml' C++
MinGW64 Compiler with Windows 10 SDK or later (C++)  mex -setup:'C:\Program Files\MATLAB\R2018a\bin\win64\mexopts\mingw64_g++_sdk10+.xml' C++
Microsoft Visual C++ 2013  mex -setup:'C:\Program Files\MATLAB\R2018a\bin\win64\mexopts\msvcpp2013.xml' C++

% you want to pick the 2013 edition, so copy and paste the corresponding instruction. For me that's: 
>> mex -setup:'C:\Program Files\MATLAB\R2018a\bin\win64\mexopts\msvcpp2013.xml' C++

Compile kilosort

>> cd C:\my\github\directory\KiloSort\CUDA % your kilosort repository directory
>> mexGPUall % should get a bunch of warnings plus "MEX completed successfully", several times. 
% at this stage it may tell you that you need a different version of cuda for older matlab installations.

For more about mexcuda installation, see these (instructions).

Verifying installation and test run

You can verify that the code has been installed correctly by running master_eMouse inside the eMouse folder. See first readme_eMouse.txt. You can also use these scripts to understand how to pass the right settings into Kilosort (will depend on your probe, channel map configuration etc), and what you should be seeing in Phy during manual cleanup of Kilosort results. There are many parameters of the simulation which you can tweak to make it harder or easier, and perhaps more similar to your own data.

General instructions for running Kilosort

Make a copy of master_file_example_MOVEME.m and \configFiles\StandardConfig_MOVEME.m and put them in the directory with your data.
Generate a channel map file for your probe using \configFiles\createChannelMap.m as a starting point.
Edit the config file with desired parameters. You should at least set the file paths (ops.fbinary, ops.fproc (this file will not exist yet - kilosort will create it), and ops.root), the sampling frequency (ops.fs), the number of channels in the file (ops.NchanTOT), the number of channels to be included in the sorting (ops.Nchan), the number of templates you want kilosort to produce (ops.Nfilt), and the location of your channel map file (ops.chanMap).
Edit master_file so that the paths at the top (lines 3-4) point to your local copies of those github repositories, and so that the configuration file is correctly specified (lines 6-7).

To understand the parameters that can be adjusted in Kilosort, please refer to the example configuration files. The description of each parameter is inline with its assigned (default) setting, which you can change.

Integration with Phy GUI

Kilosort provides a results file called "rez", where the first column of rez.st are the spike times and the second column are the cluster identities. However, the best way to use this software is together with Phy, which provides a manual clustering interface for refining the results of the algorithm.

NOTE that you need to use a special branch of Phy with Kilosort. Instructions in Docs/phy_installation_with_templates.txt

You also need to install npy-matlab, to provide read/write functions from Matlab to Python, because Phy is written in Python.

Detailed instructions for interpreting results are provided here.

NOTE that if you run the auto-merge feature ("merge_posthoc2"), then the fifth column of rez.st contains the final cluster identities. This grouping is then exported to Phy and interpreted the same way a manual merge is interpreted. This means that you can still use Phy's functionality to break the cluster back up.

Matlab output structures

Kilosort is best used in conjunction with Phy. The .npy and .csv output files can then be loaded back into Matlab, following these general instructions: https://github.com/kwikteam/phy-contrib/blob/master/docs/template-gui.md. For a full example, see the tutorial with Neuropixels recordings obtained with a single probe and with two simultaneous probes.

However, in some situations you might need to use the Matlab results structures. Here is an explanation of these variables, available inside the struct called "rez"

xc, yc: x and y coordinates of each channel on the probe, in the order of channels provided in the channel map (default is linear, 1:1:nChannels).

connected: whether a channel in the original binary dat is "connected", or "active". Inactive channels are ignored.

Wrot: cross-channel whitening matrix. Wrot * high_pass_filtered_data = post_data, where post_data is the postprocessed data on which the Kilosort algorithm is applied.

WrotInv: is the matrix inverse of Wrot. WrotInv * post_data = high_pass_filtered_data

ops: keeps all the configuration settings provided by the user, and cumulative information added throghout the Kilosort steps.

st: first column is the spike time in samples, second column is the spike template, third column is the extracted amplitude, and fifth column is the post auto-merge cluster (if you run the auto-merger).

mu: mean amplitude for each template

U: low-rank components of the spatial masks for each template

W: low-rank components of the temporal masks for each template

dWU: average of a subset of spikes corresponding to each template. The low-rank decomposition of this matrix results in W and U.

Wraw: the spike template, un-whitened by the operation Wraw(:,:,n) = Wrotinv' * (U(:,n,:) * W(:,n,:)'), for each template n.

simScore: correlation between all pairs of templates.

cProj: projections of each detected spike onto the principal components of the channels corresponding to the spike's assigned template. The channel order for each template is available in iNeigh.

iNeigh: for each template, the channels with largest amplitudes are indexed in order (default 12). This indexing is used to sort coefficients in cProj. Notice this is a fundamentally sparse scheme: only the top channels for each template are stored.

cProjPC: projections of each detected spike onto the top templates most similar to the spike's assigned template. The nearest-template order for each template is available in iNeighPC.

iNeighPC: for each template, the other templates with largest similarity are indexed in order (default 12). This indexing is used to sort coefficients in cProjPC. Notice this is a fundamentally sparse scheme: only the top closest template for each template are stored.

Questions

Please create an issue for bugs / installation problems.

Licence

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

kilosort's People

Contributors

Stargazers

Watchers

Forkers

marbauza sundeepteki ablot theilmbh sasen gentnerlab kkatlowitz nachosi simonster hooshmandshr samreiter shanelen micheleacox lbhb lucasnnobrega mspacek sadacca fpbattaglia jcbyts dshimaoka attractorlu zekearneodo peichao robodoig hihiy davidtingley yger newmanmemorylab ecarmichael eschlaf2 sgyf kiani xavier04 johnsonanyu omidyaghmazadeh nithyar89 espenhgn jonathansy ferchaure lucadellasantina janlukasklee chris-angeloni helenxhou brykko warnerwarner nilorak mmyros black aliddell acumpelik longyang10 takahashikazutaka visionandcognition drzamzow fjflores galenlynch adehad jonnew gfetterman cuckoong dr446 elscrijns dimokaramanlis silviur tritschlab richard-warren aszara huklab tapacob johnleeuk12 zhixiaosu dcol27 efjepsen eliezyer aproca haixinliuneuro akihase bladonjay sudhasharma529 foudasarah jtmiles davidjtitus priyanka-cshl lor8 winnieyangwannan rishidhingra ajay-zou brainodyssey2050 wehr-lab lauragwilliams komlosigergely lmfklaver jinshachen hasannaraghi education-for-evry-one eegkit weiwen-tian gmore044

kilosort's Issues

Channel map

Hi, first of all thank you for this software. I managed to set it up on a windows 10 notebook with a NVidia quadro k2000m and it is running extremely fast. My question is, if I defined my channel map correctly. I am recording with a 16 channel probe, with a similar shape as the 32 channel Buzsaki probe. I used the following values: xcoords = [-35 34 -33 32 -31 30 -29 28 -27 26 -25 24 -23 22 -21 20], ycoords = [310 300 290 280 270 260 250 240 230 220 210 200 190 180 170 160], kcoords = ones(Nchannels,1). After Kilosort has finished I then use phy template-gui to manually cluster my data. The arrangement of the channels is the waveform view is right, though the depth of the channels is inversed. I believe this might be a result of my parameters. I will try some different parameters, but is there maybe an example of a probe so that I can figure it out easily. Best wishes Hendrik

make runKilosort a function rather than a script

This would allow, for instance, calling it on an array of files; currently not conveniently possible due to the clearvars command. The user could edit runKilosort directly of course, but then they have to merge when you (for instance) add a parameter or something like that. Turning it into a function would be cleaner - variables you intend to keep around that aren't currently in rez could be optional return arguments.

xcoords/ycoords can go undefined in preprocessData

Lines 57 & 58 in preprocessData() expect the variables xcoords & ycoords to exist, but it looks like there is only one branch (~isempty(ops.chanMap) && ischar(ops.chanMap)) that generates them (by loading ops.chanMap).

I'm running into this as I'm working on #41. Instead of re-generating ops from scratch, I'm loading it from rez.

Handle multiple files

It would be really useful if several files can be dealt with as one continuous recording without the need to concatenate the files.

reporting un-matched events

Events of high enough magnitude that didn't match any template should be reported. As an easiest thing, it could just tell you how many of these there were (on each channel, maybe), so you know if it's a problem. Second stage, it could report the event times and peak channels of all of these, perhaps even adding them to the clustering with an extra template for each channel or local group of channels, and give it a flat template shape. This way these events could be easily visualized in phy.

It might be tricky though - one might expect lots of residuals (events left after subtracting some spike) to be unmatched, but these would look worrying to the user since they will see the raw data, without the first spike subtracted.

post hoc merge fails

I know the example master file says that posthoc merge templates is still under construction however I was hoping I could get some help getting it working or an explanation on where to start a pull request to fix it.

The error I'm getting is:

Error using <
Matrix dimensions must agree.

Error in merging_score (line 22)
steps = sum(hs1(1:m1)<mmax/troughToPeakRatio) + ...

Error in merge_posthoc2 (line 89)
            mo = merging_score(f1old - f2old, f1new-f2new, ops.fracse);

Error in master (line 112)
rez = merge_posthoc2(rez);

and seems to be caused by l1 being positive in merging_score causing b2 to be empty and everything to fail. I don't have a good enough sense on what the variables mean to know what should be happening here.

add merging and/or criteria for dropping clusters

In order to settle on a "correct" number of clusters.

eMouse cannot be run without a GPU

Description
I'm a bit new to MATLAB, so I apologize if this issue is due to a lack of experience. However, if that's the case it might be nice to expand the documentation to cover this issue.

In master_eMouse.m, there's a useGPU flag, works to avoid using the GPU when making the data. However, the flag doesn't prevent optimizePeaks.m from being run, which does require a GPU. Is there a similar function that doesn't require a GPU that could be swapped for optimizePeaks, or is the program meant to be used on a device with a GPU?

I saw that ops.initialize could be used to prevent optimizePeaks from being run, but based on the comments it seemed like that would prevent the data from make_eMouseData from being used in the analysis.

Any insight you can offer would be greatly appreciated.

Error Message

There is a problem with the CUDA driver or with this GPU device. Be sure that you have a
supported GPU and that the latest driver is installed.

Error in optimizePeaks (line 51)
U = gpuArray(uBase(ind_filt, :))';

Error in fitTemplates (line 32)
        WUinit = optimizePeaks(ops,uproj);%does a scaled kmeans

Error in master_eMouse (line 25)
rez                = fitTemplates(rez, DATA, uproj);  % fit templates iteratively

Caused by:
    The CUDA driver could not be loaded. The library name used was
    '/usr/local/cuda/lib/libcuda.dylib'. The error was:
    dlopen(/usr/local/cuda/lib/libcuda.dylib, 10): image not found

automatic setting of number of channels

Marius' note:

It needs to know how many total channels there are so it loads them correctly from the bin file. In the past I left this as a user-defined option because there were cases where we ran it without a prb file.... But I think these days the prb file is a requirement for loading into Phy, so it needs to be made for any type of probe anyway. So I should really just determine the number of channels from the prb file.

Fields in rez.mat file

Hello,

i wanted to ask what each field in rez.mat file corresponds to?

Thanks,

Double counting?

I think it would be helpful if one could change the snippet length to avoid cases like the one shown below. This happens a lot in IC responses to speech. Also, I'm confused by the spike at zero in the cross-correlogram ...

Spike detection in dataset with electrical stimulation

Hi, I have a dataset recorded with a multichannel electrode with 16 channels. Simultaneous to the recording, electrical stimulation was applied in close proximity to the recording electrode. This electrical stimulation is clearly visible in the raw data on all channels. I wonder if these stimulation artifacts make it even possible to detect spikes in the data. Do you have any opinions on that?

consider adding automatic cluster quality assessment with Greengard methods

I.e. http://arxiv.org/abs/1508.06936

E.g. as a way to decide when to merge clusters.

Large amplitude spikes classified incorrectly into multiple clusters

This issue is best illustrated with a picture (see attached)

Kilosort originally gave me two units (yellow and green in the attached plot) but I noticed that there was unphysiologically high 0ms correlation. I then noticed that the green unit showed two distinct clusters which, when manually split revealed the source of the high correlation (lavender cluster). Notice in the attached cross correlations that the lavender and yellow show many 0ms correlations and share a refractory period: they are the same spikes. Lavender and green do not share a refractory period: they must originate from different neurons.

This issue happens every time there is a large amplitude unit (or any large amplitude noise, actually) and is easily spotted by identifying: unusually high 0ms correlations without any jitter, bimodal waveform clusters that, when split, reveal an absence of a common refractory period. This manual stage is, in my opinion, essential for the software to be unusable for our purposes, and easily adds double the manual sorting time as compared to the klustakwik pipeline. Failing to inspect and correct for this issue, unfortunately, takes the best signal (large amplitude spikes) and distributes it across many units, making it the worst signal for looking at the correlations between single units.

I understand the root of the problem: variance and mean scale together, so large amplitude spikes are the most variable meaning that the residuals will be highest when looking for template (mis)matches. Perhaps it is possible to adjust the inclusion threshold according to spike amplitude?

thank you
sam
double_spike.pdf

Indexing of numpy output is unclear

@rswilliamson and I have been using KiloSort and Phy for spike sorting over at @Polley-Lab.

I have been working from this example's comments to correctly align templates and spike times to concatenated raw data.

Direct examination of the spike_templates.npy and spike_clusters.npy arrays in python suggest that, contrary to the comment in the above file, they store as values 1-indexed indicies:

In [30]: spike_templates = np.load('spike_templates.npy')
In [34]: spike_templates.min()
Out[34]: 1
In [35]: spike_templates.max()
Out[35]: 127

In [36]: spike_clusters_clu = np.load('spike_clusters.npy')
In [37]: spike_clusters_clu.min()
Out[37]: 1
In [38]: spike_clusters_clu.max()
Out[38]: 167

If this is the case, then this code is probably inconsistent, because clu and ss are unchanged from their numpy representation but spikeTemplates has 1 added to all its values, and doing both of those things at the same time doesn't seem correct.

%% Load in the results from Phy and plot statistics about the good clusters
    clu = readNPY(fullfile(penDir,  'spike_clusters.npy'));
    ss = readNPY(fullfile(penDir,  'spike_times.npy'));

    load(fullfile(penDir,'rez.mat')); %the raw templates are in rez.Wraw
    raw_templates = rez.Wraw;
    clear rez;

    spikeTemplates = readNPY(fullfile(penDir,  'spike_templates.npy')) + 1; % note: zero-indexed

    [cids, cgs] = readClusterGroupsCSV(fullfile(penDir,  'cluster_groups.csv'));

    %% Plot the good clusters with their filtered raw data and templates.
    goodClusters = cids(cgs==2);

Confusingly, we get reasonable-looking output when plotting, for each cluster, the template(s) that was/were used versus the averaged data across each channel.

Also confusing, the output of this function, when assessing cluster quality and separation index, does not list a clusterID of 0 either.

I might be over-thinking this, but i'm now confused about whether or not there's a fencepost error in my processing code.

Redefining principal components

I was wondering if you have a script for calculating the predefined PCs. You mentioned in the config file that these should be recomputed for new data sets. Is this just the weight matrix for a sample of representative spikes? Are the PCs computed on raw data or filtered data (which filter)?

Many many thanks,
Mark

Extract relevant channels of each cluster

Hello Marius,

I am trying to figure out how to retrieve the information that the phy template-gui is using to create the WaveformView with the blue and grey channels.

I cannot seem to find a way to tell which channels are the relevant ones for each cluster (the ones that phy paints blue).

I assume since phy is using only the info in the .npy files that kilosort generates that this info is there somewhere.

Could you help me on this?

Thanks

artifact removal / upper threshold when detecting putative spikes

Best practices for concatenating?

In our lab (as in many others), recording session consists of data split across a series of files. In our case, each file corresponds to a different task/stimulus sequence. Often, we want to look at the response of a given unit across those tasks. It seems like the best way to do this with KiloSort is to concatenate the files before sorting units.

Is this indeed what people are doing? Are there any best practices? For example:

Should data be filtered before or after concatenation assuming there will be some degree of offset between the voltages at the end of one file and the beginning of another (could be negligible or large if a high-frequency low-amplitude event has occurred)?
Should files be concatenated in the order that they were recorded?
Can files that you may not need for analysis be excluded, even if this leaves a large time gap between files?
Any tips as to how to speed the concatenation itself?

Thanks!

doesn't run on linux because of "memory" function?

I get an error right at the start because it says "memory" is not available on this platform. You should be able to reproduce this by logging into martinotti.cortexlab.net, opening matlab, and doing:

>> cd /mnt/zserver/Data/multichanspikes/DimitarCerebellum/
>> ls
100_CH.dat  batches  forPRBimecToOE.mat  runKS.m

>> addpath('/home/nick/data/code/KiloSort/')
>> runKS
Warning: Name is nonexistent or not a directory: C
> In path (line 109)
  In addpath (line 86)
  In runKS (line 1)
Warning: Name is nonexistent or not a directory: \CODE\GitHub\KiloSort
> In path (line 109)
  In addpath (line 86)
  In runKS (line 1)
Error using memory
Function MEMORY is not available on this platform.

Error in load_data_buff (line 26)
    dmem         = memory;

Error in runKS (line 39)
load_data_buff; % loads data into RAM + residual data on SSD

Only uses gpuDevice(1)

In @gentnerlab we are attempting to run KiloSort on a server with multiple GPUs.
We have to ration the use of each GPU as multiple lab members use the different GPUs at the same time.

It appears that KiloSort overrides any previous gpuDevice setting by explicitly calling gpuDevice(1). We'd like for there to be a setting to tell KiloSort which gpuDevice it should be using.

Whitened data NOT written to disk

Within preprocessData.m, the fwrite call for the whitened data is imbedded in a for loop (line 207) and an if statement (lines 278-283; see below).

for ibatch = 1:Nbatch
[...]
    if ibatch<=Nbatch_buff
        DATA(:,:,ibatch) = gather_try(datr);
    else
        datcpu  = gather_try(int16(datr));
        fwrite(fidW, datcpu, 'int16');
    end
[...]
end

When Nbatch_buff == Nbatch, the fwrite command doesn't actually get executed. In practice, this means the the file on disk, temp_wh.dat, is sized 0 bytes. Is this a problem?

Post-processsing "matrix dimensions" error

Thanks for this package, it's working great so far. I've been running it on shorter recordings without problem.

I ran it on a 100GB+ recording and it exited with some matrix error. However, as far as I can see the .npy spiketimes and unit IDs are correct, perhaps the .mat file is messed up?

No rush on this, just wanted to report it.

Thanks!
catubc

`>> master_eMouse

pathToYourConfigFile =

/media/cat/12TB/in_vivo/tim/cat/2017_01_31_barrel_ephys_ophys/sort_alltrack/kilosort

Time 0s. Loading raw data...
Time 699s. Channel-whitening filters computed.
Time 699s. Loading raw data and applying filters...
Time 2113.56. Whitened data written to disk...
Time 2113.56. Preprocessing complete!
Time 2128s. Optimizing templates ...
Time 3781.23, batch 37661/37680, mu 25.14, neg-err 1575953.918397, NTOT 2312151, n100 8575, n200 2277, n300 0, n400 0
Time 3784s. Running the final template matching pass...
Time 6854.59, batch 6201/6280, NTOT 3860480
Error using <
Matrix dimensions must agree.

Error in merging_score (line 23)
sum(hs2(1:m2)<mmax/troughToPeakRatio);

Error in merge_posthoc2 (line 89)
mo = merging_score(f1old - f2old, f1new-f2new, ops.fracse);

Error in master_eMouse (line 57)
rez = merge_posthoc2(rez);

java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeHi(Unknown Source)
at java.util.TimSort.mergeAt(Unknown Source)
at java.util.TimSort.mergeForceCollapse(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.TimSort.sort(Unknown Source)
at java.util.Arrays.sort(Unknown Source)
at sun.awt.datatransfer.DataTransferer.setToSortedDataFlavorArray(Unknown Source)
at sun.awt.datatransfer.DataTransferer.getFlavorsForFormatsAsArray(Unknown Source)
at sun.awt.dnd.SunDropTargetContextPeer.getTransferDataFlavors(Unknown Source)
at sun.awt.datatransfer.TransferableProxy.getTransferDataFlavors(Unknown Source)
at java.awt.dnd.DropTargetContext$TransferableProxy.getTransferDataFlavors(Unknown Source)
at com.mathworks.widgets.grouptable.transfer.TransferUtils.unwrapObjectList(TransferUtils.java:84)
at com.mathworks.widgets.grouptable.transfer.TransferController.isDragToSelectTransferable(TransferController.java:383)
at com.mathworks.widgets.grouptable.transfer.TransferController.access$200(TransferController.java:48)
at com.mathworks.widgets.grouptable.transfer.TransferController$DropListener.dragOver(TransferController.java:236)
at com.mathworks.widgets.grouptable.transfer.TransferController$DropListener.dragEnter(TransferController.java:227)
at java.awt.dnd.DropTarget.dragEnter(Unknown Source)
at sun.awt.dnd.SunDropTargetContextPeer.processEnterMessage(Unknown Source)
at sun.awt.X11.XDropTargetContextPeer.processEnterMessage(Unknown Source)
at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEnterEvent(Unknown Source)
at sun.awt.dnd.SunDropTargetContextPeer$EventDispatcher.dispatchEvent(Unknown Source)
at sun.awt.dnd.SunDropTargetEvent.dispatch(Unknown Source)
at java.awt.Component.dispatchEventImpl(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.awt.LightweightDispatcher.trackMouseEnterExit(Unknown Source)
at java.awt.LightweightDispatcher.processDropTargetEvent(Unknown Source)
at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.awt.Container.dispatchEventImpl(Unknown Source)
at java.awt.Window.dispatchEventImpl(Unknown Source)
at java.awt.Component.dispatchEvent(Unknown Source)
at java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.awt.EventQueue.access$200(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.awt.EventQueue$3.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.awt.EventQueue$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(Unknown Source)
at java.awt.EventQueue.dispatchEvent(Unknown Source)
at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.awt.EventDispatchThread.run(Unknown Source)

`

0 similarity score even for large clusters

Hello,

When I look at the results of Kilosort in Phy, I get similarity scores of 0 even for clusters with >1000 spikes. Please help.

Thanks,

Best wishes,

Nick Lesica

mexMPmuFEAT (or mexMPmuFEAT) returns empty

I occasionally get the following error:

Time   0s. Loading raw data... 
Time   7s. Channel-whitening filters computed. 
Time   7s. Loading raw data and applying filters... 
Time 10.44. Whitened data written to disk... 
Time 10.44. Preprocessing complete!
Time  11s. Optimizing templates ...
Time 19.09, batch 121/132, mu 0.00, neg-err NaN, NTOT 288, n100 3, n200 3, n300 3, n400 3
Time  20s. Running the final template matching pass...
Time 20.44, batch 1/22,  NTOT 0

Index exceeds matrix dimensions.

Error in fullMPMU (line 205)
[~, isort]      = sort(st3(:,1), 'ascend');

Error in testrun (line 20)
rez                = fullMPMU(rez, DATA);% extract final spike times (overlapping extraction)

This seems to be a result of mexMPmuFEAT returning empty variables, specifically "st".

At first I thought that this error was a result of spike not being found, but I seems to occur in the optimization stage, not the fitting stage. So I tried to trace it back, to see if the error was really a result of something in fitTemplates.m

I discovered that "dWU" in fitTemplates.m was full of NaNs, which I suspect (but could be wrong) is caused by "uproj" being empty. Is this possible?

I'm using the "fromData" initialization option; in my hands it seems that "uproj" is set full of zeros in line 204 of preprocessData and then emptied on line 287. When I don't use the "fromData" initialization option, I do not get the error above. Any thoughts on why this is happening?

Note, with the "fromData" initialization option, "if isproc(ibatch)" on line 208 of preprocessData is always returning true, so lines 270-275 are not run.

broken similarity matrix after auto-merges

Auto-merges are supposed to work exactly like manual merges, but something seems to be broken.

Waveforms

I have a few questions about plotting waveforms in MATLAB after the data is sorted. I am pasting Figure 2 from Pachitariu M. et al., 2016, for reference.

(1) Does KiloSort output waveforms that correspond to each spike? If so, which .npy file and/or rez field contains these? If not, is it best to use the timestamps and retrieve the data directly from the source data file?
(2) Can the PC and template outputs of KiloSort be used to create the red traces in the figure below? If so, how?

Force removal of template?

I'm interested in experimenting with something. Would it be straightforward to kill a cluster manually, then re-run part of the sorting algorithm to resort that cluster's spikes among the remaining templates? One special case would be where I suspect that a particular cluster may actually be several neurons firing together (also see #10). Would it be possible to remove the template for that neuron manually, then rerun the step which determines overlapping spikes (from the remaining templates)?

HDF5 Import (MultiChannel raw data) and file merge

Hello,
I have raw data from a 60 electrode MultiChannelSystem. Those come in a special MultiChannel format ".msrd". MultiChannel provides a tool to convert those files in HDF5 format.

Is it possible to import those HDF5 format files into Kilosort?
AND
Is it possible to import multiple files and analyze them all together as a merged file?Since those merged files tend to become very big, it would be even better to get some kind of Unit-file out of every analysis. This file can be used to analyze following files whether the same units on the same electrodes are still present (In addition to possible new ones) or gone.

Greetings

Kilosort fails for nchan=1

I've been trying to run kilosort on some single channel data.
FitTemplates line 43 fails because squeezing with two singleton dimensions yields a matrix with differently ordered dimensions.
There may also be some additional nchan=1 bugs (because trying to fix the squeeze issue myself resulted in further downstream issues).

Too many output arguments on zeroOutKcoords

Line 3 seems to expect squeeze to output more than one output.

Are you guys not using the built in squeeze?

Having trouble installing on Ubuntu 16.04

Could you help walk me through the process. I installed cuda via
sudo apt install nvidia-cuda-dev

I've tried to follow the instructions but when I run mexGPUall I get:

mexGPUall
Warning: Version 7.0 of the CUDA toolkit could not be found. If installed, set MW_NVCC_PATH environment variable to location of nvcc
compiler.

In mexGPUall (line 12)
Error using mex
No supported compiler or SDK was found. For options, visit http://www.mathworks.com/support/compilers/R2015b/glnxa64.html.

Error in mexGPUall (line 12)
mex -largeArrayDims mexWtW.cu

multi-neuron clusters

This seems to happen when there are two neurons that fire so frequently that there are many examples of near-coincidence events:

In this case they are particularly "non-local" but I'm not sure that is really the primary feature of them, they could just as well happen with two rapidly-firing neurons on the same channel - I saw that once and will post it here if I see it again.

Spike2 format (.smrx) or .mat to .dat or .bin

Hello,
I'm starting to use your software and I really like how it works.
The problem is my data is acquired using CED Spike2 its stored in .smrx, but inside of their software it allow me to export it in .mat format but not in .dat or .bin. I don't know how your software read the .dat or .bin files.
Please, if you can explain me how to create the .bin (or .bin) data file from a .mat file, I will really appreciate it.

Thank you :)

Spike Window

Would it be possible to make the number of samples to take around each spike into a parameter? Currently it is set to 61 - sampling at 20kHz makes it into about 3ms.

cpuMPmuFEAT returns empty sts

Hello,
I am trying to analyze 128 channel data using only a cpu at the moment (useGPU = 0, ops.initialize = 'no'). I am getting an index exceeds matrix dimensions error in fullMPMU (line 204) because the variable st3 is empty. This seems to be because the function cpuMPmuFEAT is returning an empty st variable but I cannot figure out why. Any help you could provide would be greatly appreciated.

consider/test using overlapping-spike-resolution during main fitting steps

Simplify mexcuda compilation on linux systems?

I noticed that I can compile the CUDA files by removing the .xml files and changing mex * to mexcuda * in lines 9-11 of mexGPUall.m. eMouse ran fine afterwards. Running Matlab R2016b on a linux cluster w/ CUDA 7.5 loaded up.

save out PC features for each spike

to facilitate quickly viewing results in phy.

Spike detected twice?

Is it possible that two templates match the very same spike, therefore duplicating it? I have a few clusters that look like that (see screenshot). They have similar waveforms and crosscorrelograms plus a crosscorrelogram with a double peak at +/- 0.4 ms (bins are 0.2ms on the screenshot)

The slightly annoying point is that it seems to be only partially a duplicate, so I cannot just put one of the cluster to noise and move on with the other. Should I just reduce the number of cluster to detect (ops.NFILT) to try to avoid that?

issues with ops.nt0

Thank you very much for adding this. If I increase the value from 61 I run into errors with the hardcoded values for removing peaks in isolated_peaks.m and for setting the spike peak time in get_PCproj.m when I initialize from data. What do you think about having these values adjust with obj.nt0? I also run into an error with CUDA if I try to make the samples per spike over a certain size.
I tried to set ops.nt0 = 97, dt in get_PCproj.m to 41, and in isolated_peaks.m I made line 10
peaks([1:40 end-80:end], :) = 0;

No error if I keep ops.nt0 in the 70s.

incorporate waveform variability/drift

Of some kind...

Pre-proccesing filter

line (#63) in preprocessData.m
[b1, a1] = butter(3, ops.fshigh/ops.fs, 'high');
I think should be
[b1, a1] = butter(3, ops.fshigh/ops.fs*2, 'high');
Because it's normalized to Nyquist, not sampling rate.

Duplicated Spike Times

I have some clusters (n~~5) that have the exact same spike time replicated (n~~10 redundant), (though with slightly different amplitudes).
Im not sure if the patch is to throw these out post-hoc, to assume that the algorithm is suggesting that there are two simultaneous waveforms within my sampling resolution, or if this needs to be changed inside the algorithm.

Vigi

Kilosort output spike_clusters.npy

At the moment, the spike_clusters.npy file is only created when Phy is run for the first time. It might be useful to have kilosort create this file so that preliminary analysis code can be run on all clusters prior to manual curation.

-Ross

Handling of drifts for large high density probes

High density probes with a large number of contacts in theory can allow identifying the same neurons even across drift events (when the brain drifts up or down with respect to the probe).

Handle very large datasets quickly

I record 128 channels x 30 hours. You mentioned it should go faster for me than it does.

Things that were mentioned in a discussion by us:

Assess time issue yourself by creating a simulated file of 30 hours 128 channel recordings
Consider Nick S's suggestion to: "adaptively change the scaling of the runtime with the number of spikes"

master_eMouse - error message CUDA_ERROR_ILLEGAL_ADDRESS

I have some issues trying to build the proper NVIDIA CUDA Compiler and wonder whther anyone had the same issue. I tried to run the mexGPUall using CUDA 7.5 in various Maltab version (2015a, 2015b, 2016a) and it only seems to have worked in 2016a since I get the message below when I run mexGPUall. However when I then try to to run master_eMouse I still get an error message about CUDA_ERROR_ILLEGAL_ADDRESS (please see below)

Is the mexGPUall done properly if one get the message below?
And what might be the cause of the illegal address?
I followed all steps described in readme_win_linux.txt and wonder whether there is anything else that has to be done.

(Windows7, GeForce GTX 1080, CUDA7.5, Visual Studio Community2013 )
NVIDIA Driver 378.49 Release Data 1/23/2017

gpu =
CUDADevice with properties:
Name: 'GeForce GTX 1080'
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5899e+09
MultiprocessorCount: 20
ClockRateKHz: 1809500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

2016a:

mexGPUall
Building with 'NVIDIA CUDA Compiler'.
C:/Users/SRL_KiloSort/Documents/KiloSort/CUDA/mexMPmuFEAT.cu(98): warning: variable "NTOT" was declared but never referenced
C:/Users/SRL_KiloSort/Documents/KiloSort/CUDA/mexMPmuFEAT.cu(144): warning: variable "Ci" was declared but never referenced
….
MEX completed successfully.

Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS

In make_eMouseData (line 6)
In master_eMouse (line 21)
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
In make_eMouseData (line 6)
In master_eMouse (line 21)
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
In make_eMouseData (line 6)
In master_eMouse (line 21)
Error using gpuArray/filter
An unexpected error occurred during CUDA execution. The CUDA error was:
an illegal memory access was encountered

Error in my_conv2 (line 47)
S1 = filter(gaus, 1, cat(1, S1, zeros([tmax, dsnew2(2)])));

Error in make_eMouseData (line 71)
dat = my_conv2(dat, [tsmooth chsmooth], [1 2]);

Error in master_eMouse (line 21)
make_eMouseData(fpath, useGPU);

Drifting issue

Hello,
I am spike sorting data collected with an O3 Neuropixel probe during acute experiment.

During some of my recording sessions the brain tissue moved in respect to the probe.
The outcome is that during the recording you observe a variation in the size of the spikes fired by a given neuron and in which recording sites they are.

Often what Kilosort does in this situation is simply to split the same unit into multiple clusters over time, and all I have to do is to merge them.

However there are situations in which which clusters belong to the same unit is not so obvious or there are so many units in which the drift occurred that the manual merging becomes time consuming. I was wondering whether Kilosort has an automatic way to correct for this problem or if there is a combination of initial parameters which helps with this.

Thank you
Dario

test performance for correctly detecting neurons with small numbers of spikes.

I.e. create a ground-truth file with a cluster inserted at several different amplitudes/positions. Pick one or more of these amplitude-position combinations and create GT files with progressively fewer numbers of spikes.

One thing that might make this sort of thing convenient could be if the SVD-reconstructed waveforms were all stored in a library, such that they could quickly be called up and inserted into the raw data on the fly (rather than computing them and creating a whole new GT file on the hard drive). Let me know if that would be useful, I could do it. An even shorter cut, of course, would be to simply take a generic waveform shape and insert it into the file - this would probably be fine as a first pass, e.g. in this case for diagnosing whether there are any fundamental problems with small numbers of spikes.