kunzmi / imagestackalignator Goto Github PK

Implementation of Google's Handheld Multi-Frame Super-Resolution algorithm (from Pixel 3 and Pixel 4 camera)

License: GNU General Public License v3.0

Cuda 5.81% C# 94.19%

imagestackalignator's Introduction

ImageStackAlignator

Implementation of Google's Handheld Multi-Frame Super-Resolution algorithm (from Pixel 3 and Pixel 4 camera)

This project aims in implementing the algorithm presented in “Handheld Multi-Frame Super-Resolution” by Wronski et al. from Google Research (https://doi.org/10.1145/3306346.3323024).

The paper contains several errors and inaccuracies, which is why an implementation is not straight forward and simple. One can only wonder why and how this could have passed the peer review process at ACM, but also the authors never replied my mails trying to get clarification. Nevertheless, I managed to get a working implementation that is capable of reproducing the results, my assumptions and guesses about what the authors actually meant, thus seem to be not too far off. As neither the authors nor ACM (the publisher) seem to be interested in having a corrected version of the paper, I will go through the individual steps and try to explain the reasoning in my implementation. I also improved things here and there compared to the version in the paper and I will point out what I changed.

The algorithm is mainly implemented in CUDA, embedded in a framework able to read RAW files coming from my camera, a PENTAX K3. DNG files are also supported, so many raw files coming from mobile phones should work and most DNG files converted from other camera manufacturers should also work, but I couldn’t test every file. The RAW file reading routines are based on DCRAW and the Adobe DNG SDK, and just for my own fun I implemented a while ago everything in C# and don’t use a ready-to-use library. As the GUI is based on WPF, you need a Windows PC with powerful NVIDIA GPU to play with the code. Main limitation for the GPU is memory, the more you have, the better. Having an older GeForce TITAN with 6GB of RAM, I can align 50 images with 24 mega pixels each.

The algorithm / overview

I will describe the single steps and my modifications of the algorithm based on that figure taken from the original paper:

a) RAW Input burst: Having a Google Pixel 3 or 4 (or an Android APP accessing the camera API) one could get a burst of RAW files in sort of a movie mode, meaning no physical interaction is necessary to record the next image. Here instead, I simply rely on RAW images that are either taken manually or in a burst mode on a SLR. The major difference is, that we have much larger movements in the latter case due to the increased time gap in-between the single frames. I will compensate for this by using a global pre-alignment before running the actual algorithm.
b) and c) This part is described in chapter 5.1 of the paper and many things don’t fit together here. Given that this section is sort of the main contribution of the paper, I can’t understand how this can be so wrong.
The first inaccuracy is about the number of frames used in this step. They write: “We compute the kernel covariance matrix by analyzing every frame’s local gradient structure tensor”. Together with the picture above showing multiple frames at step b), which indicates that for each frame a structure tensor (for each pixel) is computed independently of all the other frames. Whereas a bit before it is said “Instead, we evaluate them at the final resampling grid positions”. Given that in an ideal condition all frames should be perfectly aligned on the final resampling grid and that this final resampling grid is just the reference image, i.e. the one frame that doesn’t move, the only reasonable way to compute the local gradients and kernels is to use only the reference frame. In this implementation, only the reference image is used to compute the reconstruction kernels. Doing this for every frame doesn't make any sense in my eyes.
The idea of the reconstruction kernel is to use as many sample points in one frame as possible to reconstruct a pixel in the final image with reduced noise level. But without the destruction of image features. To do this, one uses the information of the local gradients to compute a structure tensor which indicates if the pixel shows an edge like feature, a more point like feature or just a plain area without any structure. For a fine point no neighboring pixels can be used to reconstruct the final pixel, but a line or edge can be averaged along the direction of edge and in a flat area everything can be taken together.
The eigen values of the structure tensor can be used to determine the “edginess” of a pixel. For some weird reasons the authors claim that the simple ratio $fraclam1lam2$ with and being the eigen values of the structure tensor, corresponds to that information, and even more, they claim that the ratio is in the range [0..1]. They explicitly write that is the dominant eigen value, thus < , why the ratio $fraclam1lam2$ is always > 1. In common literature the “edginess” indicator, called coherence measure, is given by - here in it's normalized form gives values in the range [0..1]. Using this instead of just $fraclam1lam2$ , e.g. in the equation for the term A in the supplementary material, works well and seems to be the desired value.
For me there’s no reasoning in limiting oneself to only a reduced resolution source image to computed the local structure tensors. Instead of using a sub-sampled image as done in the paper, I use a slightly low-pass filtered full-resolution image coming from a default debayering process and then converted to grayscale. Instead of using tiles of 3x3 and computing the structure tensor only for each single tile, I gaussian-smooth the gradients and compute a structure tensor for each pixel at full resolution, which gives me much more control over the final result.
Further, I chose the size for the kernel as 5x5 and not 3x3. Why? Because I can... There’s no real reasoning behind that choice apart from seeing what is happening (mainly nothing).
d) This part is the same as published in “Burst photography for high dynamic range and low-light imaging on mobile cameras” from Hasinoff et al., where the authors use a L2 difference to find displacements of patches at several scales and at the final full resolution scale they use a L1 difference to find a displacement of only one remaining pixel. This is mainly due to implementation details and to boost performance. The shift of each tile is only computed once, the shift from one frame to the reference frame. This is prone to outliers and errors. Here in this implementation, on the other hand, we don’t have the restrictions of the environment of a mobile phone, we have a big GPU available, so let’s get things done right.
As already mentioned before, I perform a global pre-alignment, which means that every frame is aligned entirely using cross-correlation towards the first frame of the stack. Further I also search for small rotations by choosing the rotation with the highest cross-correlation score. There’s no magic in here, just a simple brute force approach. This also gives us the final reference frame, given that we can choose the one frame as reference that minimizes the global overall shifts.
Once all frames globally aligned, we move on to patch tracking, potentially on multiple resolution levels. For all levels I stick to L2 norm using cross-correlation, the same way as Hasinoff et al. on the lower levels.
Instead of computing the shift only from one frame to the reference frame (independently for each tile), one can also make use of further dependencies: The movement of one tile to the next frame, and the movement of this frame to the third frame, should be equal to the direct movement of frame 1 to frame 3. A principle e.g. published by Yifan Cheng in "Electron counting and beam-induced motion correction enable near-atomic-resolution single-particle cryo-EM", doi:10.1038/nmeth.2472. By measuring multiple dependent shifts to finally obtain the sequential shifts from one frame to another, allows to detect and reject outliers and get a substantially better displacement field.
To obtain sub-pixel displacements I use the same interpolation method as described by Hasinoff et al., even though I believe that spline interpolation should give better results. Because I wanted to try that one out, I implemented it and sticked to it here, given that only half a pixel precision is needed and only for super-resolution (no down-scaling in my implementation).
Because of the outlier rejection during alignment, I can also omit the M threshold of equation 7 in the paper.
Additionally, if patch tracking doesn’t find a reasonable peak to determine the shift, it falls back to zero shift. Thus, either the shift from the previous level is unchanged or the shift from global pre-alignment is used as last fall back. As this usually happens in flat areas without features, this seems to be a reasonable way to go.
Another part not mentioned at all in any of these Google research papers is the effect of high-pass filtering the images. Obviously for good alignment one applies some gaussian smoothing prior to patch tracking to reduce noise. But images also contain a lot of low frequent noise, I assume mainly due to some read-out characteristics and read-out noise. I saw especially for long exposure images an increased low frequent fixed pattern that made precise tracking impossible. I thus include a high-pass filter prior to tracking.
e) and f) Here things start to be described relatively well in chapter 5.2, but then in the text just before equation 6, I’m getting lost. I am not familiar with the term “Wiener shrinkage” and in the referenced paper from Kuan et al. the term “shrinkage” isn’t mentioned once. I assume that sort of a Wiener filter is used as one can see in equation 6. Also the description for is wonderful: “We obtain and through a series of Monte Carlo simulations for different brightness levels...”. The authors don't give any further explanations... Well, what they actually meant, I guess: obtain from the noise model described in Foi et al., a paper mentioned just a chapter before: with and being the model parameters and the brightness value (I ignore clipping in this implementation). To determine these two model parameters one can take several photos of a blank surface with varying illumination or exposure times. Then assuming that all pixels in the image should have the same value (constant illumination of a flat surface), all deviations from an image mean value is noise of a certain standard deviation. Having enough measurements, one can fit the model variables to these measurements. With some phantasy one could call this measurement a Monte Carlo simulation.
And for I haven’t figured out what this is supposed to represent, at least in equation 6 I don’t see how this fits.
Given two patches of some mean value but both affected of noise with the same standard deviation . The distance is then the difference but affected with noise of a standard deviation of . Same holds for the measured distance . Assuming the real distance between the patches is small, (otherwise no need for the Wiener filtering step), one can replace by and by . Putting this in the equation 6 the term cancels out and is just Wiener filtered by and , two values that we already have determind. This is what I have done in my implementation and it works.
g) and h) Having all the previous bricks together, the actual merging part is straight forward. In case of super-resolution I restrict my self to only the central part of the unscaled image because for a 24 mega pixel image this would get a bit too much data... I heavily make use of the NPPi library coming with CUDA, and there the maximum image size seems to be restricted to 2 Gigabytes.

And finally, as a little fun fact: despite that I had to re-invent the wheel at some important steps using assumptions on what could be possibly meant, the parameter ranges given in the paper's supplementary material work perfectly. My guesses thus seem to be right. But once more: Why Wronski et al. publish so many errors, given that they obviously know that it is wrong? One doesn't replace by $fraclam1lam2$ by mistake...

The application: Aligning an image stack

Click on “Select images...” to open a few RAW images. If you click on a file in the list you can see a simple debayered preview image in the left image pane. Note that image size and exposure time should be identical in all files (this is not verified...)!
The pre-align tab allows to set the following settings:
Sigma LP: the standard deviation for a gaussian blur filter (in real space)
High pass: The cut-off frequency for a high pass filter in Fourier space as radius (maximum is thus 0.5, but limited to reasonable values here).
HP sigma: Standard deviation of the blur of the previous cut-off frequency.
Clear axis: Sets the pixels with the given distance to the main axis to zero in Fourier space. This was for some testing and shouldn’t be used.
Green channel only: Create the B/W image only using the green channel and not a weighted average of RGB. Might help in case of heavy chromatic aberrations.
Rot range and incr.: the range (+/- the value) and search increment for rotational search. Pentax camera measure an absolute roll angle during acquisition, why this search then only tries to determine a small offset. For other cameras the values must be chosen larger.
Clicking on Test values or Test CC shows the selected image filtered with these parameters or performs a cross-correlation check where one image is intentionally shifted by 5 pixels and this shift must be found by cross-correlation.
Finally click on Compute shifts. Browsing through the files shows them now including pre-alignment.
Having done the pre-alignment, we move on to patch tracking. First, we define the patch sizes, scaling factors and maximum allowed shift per frame.
Reference chooses the reference image to use. By default this is the one determined in the step before. Block size is the size of a block to measure the displacements. Choosing strategy Full cancels out the block size parameter as the entire series is one big block. But memory is restricted and you might run out of memory if choosing Full for large image stacks. Only on reference is the simple tracking routine as used by Google research, On reference block groups images of block size around the reference frame and every frame is compared to each frame in the block. Blocks measures distances only in frames inside a moving block around the current frame. The reference frame is neglected here. Threshold: If the minimum L2 distance plus the threshold is larger than the maximum L2 distance of the same patch (for a different shift), the found shift is set to zero. This filters out flat areas. After Track patches one can inspect the result by clicking on Show tracked patches. By switching between reference frame and another frame one can verify the correct tracking. The checkbox Include pre-alignment adds the pre-alignment to the shift vector and removes it from the image.
It is to note that Max. Shift has to be given as the desired value + 1 as the largest shift in the displacement / cross-correlation map cannot be used due to the interpolation. For a value of 2 the largest possible shift is thus 1.5 pixels.
Accumulation:
Sigma ST: standard deviation to blur the derivates to compute the structure tensor.
Sigma LP: standard deviation to apply before computing the deviations. (This time no high-pass filter is applied)
Dth, Dtr, kDetail, kDenoise are parameters as described in the paper. In short: on the left side for low noise pictures, on the right side for noisy pictures.
Iteration LK: How many iterations of Lucas-Kanade optical flow to perform for final precise alignment.
Window LK: The window size for Lucas Kanade.
Min. Det. LK: Determinant threshold for Lucas Kanade. If the matrix in Lucas-Kanade has a eigen value smaller than the threshold, no shift is applied.
Erode size: Size of the erosion kernel in the uncertainty mask.
After clicking on Prepare accumulation one can inspect the reconstruction kernel for every pixel my moving the mouse over the image.
Clicking then on Accumulate adds the selected images to the final result. This way one can add one image after the other, but also all images at once by selecting them all together. This helps debugging...
If Clear results is set, the final image buffer is cleared before every new image added.
Super resolution activates the super resolution feature, resampling the final image at double the original resolution. When changing this flag, Prepare accumulation must be done again!
Finally, the last tab takes the result buffer from the step before, applies the chosen tone curve and color settings and allows to save the result as a 16 bit TIFF image.

Some results

New York scene - 5 frames

Reference frame decoded with Adobe Camera RAW:

Merge of 5 frames using this implementation (The applied tone curves differ a little):

Reference frame decoded with Adobe Camera RAW (Crop 1:1):

Merge of 5 frames using this implementation (Crop 1:1):

Super-Resolution test chart with a Samsung Galaxy S10e (20 frames)

Area Cropped:

Merge of 20 images in super-resolution mode:

Developped DNG (reference frame) using Adobe Camera RAW (resized by factor 2 using bicubic interpolation):

Out of camera JPEG (resized by factor 2 using bicubic interpolation):

Night sky at Grand Canyon (34 frames with 10 second exposure each)

imagestackalignator's People

Contributors

Stargazers

Watchers

imagestackalignator's Issues

Exception unhandled with ManagedCuda

Hi,

I build the three projects successfully and generate each ptx file in the correct location but it shows exception thrown as follows when I execute it

System.NullReferenceException: 'Object reference not set to an instance of an object.'

at ImageStackAlignatorController.cs, line 1004 CUmodule modDebayer = _ctx.LoadModulePTX("DeBayerKernels.ptx");

and output:

Exception thrown: 'ManagedCuda.CudaException' in ManagedCuda.dll
'PEFStudioDX.exe' (CLR v4.0.30319: PEFStudioDX.exe): Loaded 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\Remote Debugger\x64\Runtime\Microsoft.VisualStudio.Debugger.Runtime.dll'. 
Exception thrown: 'System.NullReferenceException' in PEFStudioDX.exe
An unhandled exception of type 'System.NullReferenceException' occurred in PEFStudioDX.exe
Object reference not set to an instance of an object.

is it seems load ptx file failed?

(I'm developing in visual studio 2019 and CUDA toolkit 10.1 with .NET framework 4.7.2)

Error loading DNG files from Fuji

I converted Fuji RAF files to DNG using Adobe DNG Converter.
The program crashes when trying to load them.
How hard would it be to add support for Fuji's non-standard 6x6 sensor patter?

error when open the project with vs2015

Kernels\Kernels.vcxproj: The application which this project type is based on was not found. Please try this link for further information: http://go.microsoft.com/fwlink/?LinkId=441332&projecttype=8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942

Do quality analysis mentioned in Google's paper with TIFF and PNG photos

Hi Michael, thanks for your code.
I want to run the quality analysis in Section 6.1 of Google's paper with your code, but the photo datasets they use (Kodak and McMaster) seems only offer tiff and png pictures.
Is there any way to accomplish the analysis, for example transforming TIFF to DNG or something?

ManagedCuda.NPP.NPPException:“Device allocation error”

Hi Michael, Thanks for the helpful job.
I meet an error and have not got the final results. I have finished step1 and 2 and 3 successfully, but when clicking Prepare accumulation, it shows the exception: ManagedCuda.NPP.NPPException:“Device allocation error” .
Could you please help me to settle this problem?
Thank you so much~

ManagedCuda exception

Hello, could you help me with this exception?

ManagedCuda.CudaException: 'ErrorInvalidHandle: This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like CUstream and CUevent.'

It was raised here:
public float RunSafe(NPPImage_32fC1 imgIn, NPPImage_32fC3 imgOut, float3 blackPoint, float3 scale)

I'm running using a NVIDIA RTX A3000 GPU with 6GB of memory.

Example RAW images

Would it be possible for you to share some sample RAW images, e.g. the 5 New York images shown in the README? Thanks!

How to create synthetic image bursts to run Google's Handheld Multi-Frame Super-Resolution algorithm?

Hi! Thank you so much for your sharing of the implementation and correction of Google's Handheld Multi-Frame Super-Resolution algorithm. I struggled with this paper for a long time......

I was going to try this algorithm on my computer, but unfortunately, I don't have Google Pixel or SLR to get a burst of the RAW picture. And I found that the paper part 6.1 mentioned that they created synthetic image bursts by:
generate a set of random offsets (Bivariate Gaussian distribution with a standard deviation of two pixels)
resample the image using nearest-neighbor interpolation
create a Bayer mosaic (discard 2/3 color channels)

But I failed to get the expected results... I just wonder whether you create these synthetic image bursts and if you'd like to post the code for creating synthetic image bursts on your Github? ：）~

Is it possible to run the code in Ubuntu?

Hi @kunzmi , your great code was implemented using C#. I found that Microsoft has provided the C# or .NET runtime library running in Ubuntu.

My question is, can the ManagedCUDA and the code in this repository be run in Ubuntu with that .NET SDKs? Have you tried to run the code in Ubuntu? Would it be slow to run the code using .NET framework?

Your ManagedCUDA has packaged many low-level CUDA functions. Is there an alternative using C++?
I found that, if I need to convert the ManagedCUDA to its C++ version, that would be a huge project for me.

If convenient, could you please give some advice? Thanks.

Parameter tuning

How do I adjust the parameter to remove haze with super-resolution enabled?
I follow the steps as you mentioned with the default setting. But the result has some artifacts and purple haze after pressing the "accumulate".

Super-resolution
https://imgur.com/a/4OqWKAR

Single frame zoom-in
https://imgur.com/a/7CcUfca

You can download my dataset from here(https://reurl.cc/j77mNL).

Thank you.

Run on ios using C++

Hi,

Is it possible to run this code on device in C++ on ios RAW DNG files? Does it have any specific hardware limitations that prevents it from being run on other platform? I don't have GPU since i use an old Macbbook so i could not try the demo.

Pipeline & Result Comparison

Thanks for open source implementation, and I also have my own implementation under development hasn't been public.
May I ask some questions about detail?
After kernel reconstruction, I add one more aligned respect to base frame, but it doesn't show in the pipeline of paper, which cause some information lost because of alignment or registration. Do you keep this?

Have you tried compare with other methods of demosaicing, e.g. VNG etc. I created some synthetic data from Kodak and McMaster, but the performance is not good as they said in paper, e.g. SSIM or PSNR.

Thanks, Hao!

about max and min in RobustnessModell.cu

In RobustnessModell.cu line 67 to 70，

			maxShift.x = fmaxf(s.x, shiftf.x);
			maxShift.y = fmaxf(s.y, shiftf.y);
			minShift.x = fminf(s.x, shiftf.x);
			minShift.y = fminf(s.y, shiftf.y);

It seems that you are finding minimum and maximum number of s, but it doesn't.
maybe the code below is your original idea

			maxShift.x = fmaxf(s.x, maxShift.x);
			maxShift.y = fmaxf(s.y, maxShift.y);
			minShift.x = fminf(s.x, minshift.x);
			minShift.y = fminf(s.y, minshift.y);

Does this approach need the camera parameters?

Hi @kunzmi , I found that in your code, there is a file named ExtraCameraProfiles.xml to get the camera profiles.
Is this a necessity for this approach?

If we just have several images without the camera parameters, is this method still suitable for their multi-frame super-resolution?

a PTX JIT compilation failed

Hello, I am a beginner of CUDA. I downloaded your code and tried to run it, but I met some problems.

My laptop environment is VS2017(professional)+ GTX960M+CUDA10.0+DirectX12.

When I start the “TestGoogleSuperRes.sln” file, it reported an error " ManagedCuda.CudaException:“ErrorInvalidPtx: This indicates that a PTX JIT compilation failed.” (located at “CUmodule modDebayer = _ctx.LoadModulePTX("DeBayerKernels.ptx");“) . CTX cannot be evaluated correctly.

Then I tried to debug each project. When I started a new instance on the ‘Kernels“ project alone, I got an error “fail to find the file Kernels in PEFStudioDX”. Is this matter? And is the ”PTX JIT compilation failed“ related to this situation?

I would be grateful if you could let me know what caused the above two situations and help me to solve them..

Best wishes.

“System.IO.FileNotFoundException”( ManagedCuda.dll )
“ManagedCuda.CudaException”(ManagedCuda.dll )
ErrorInvalidPtx: This indicates that a PTX JIT compilation failed.

The program is crashing when opening .dng files

Hello,

When I open the program using dng files, the program crash.
Here is the dng files : https://drive.google.com/drive/folders/105Vzur0sDWTvamjL43Ampbcg4InNT7EQ?usp=sharing

Liza

Gradient structure tensor size

The paper said that the gradient is computed at a quater of the full resolution(In section 5.1.2, we create a single pixel from a 2 × 2 Bayer quad by combining four different color channels together. This way, we can operate on single channel luminance images and perform the computation at a quarter of the full resolution cost and with improved signal-to-noise ratio), but the code allocation size is still the RawWidth*Rawheight(the snippet in class prepareAccumulation : _structureTensor4=new NPPImage_32fC4(_pefFiles[refImage].RawWidth,_pefFiles[refImage].Height)). Can you tell me what is the difference between your relization and the description in the paper, thanks!

Possibility of using .tiff images?

Is it possible to use .tiff images as inputs?

color points

There is no color points in the original night image. After reading it into the project, it is found that there are color points. Have you ever been in this situation?

5 merged "New York scene" looks worse than output from Adobe Camera RAW

There's less noise, but it looks like everything has been smoothed out. It is significantly harder to make out fine detail such as pavement lines, borders between windows, etc. Is this just a case where there aren't enough frames, or the SNR is simply too low(in the paper, figure 21 is the only direct comparison between classic demosaicing and their method, but it's quite a bit brighter than the "New York Scene")? Of course, it's unlikely this algorithm is better in all cases, but I'm still curious as to why.

On a side note, can you explain more about Dth, Dtr, kDetail, kDenoise? I actually don't see it in the paper...

Only RAW pictures can be selected as input, can I choose other formats, such as bmp?

The super resolution can't resize the image with a two-fold resolution

It's a great work! I'm a little confused, I Check the box of Super resolution and also re-clicked Prepare accumulation, but the output image still keep the same size as the original image, without a two fold pixels. Thank you very much for your answer！

How do you open the program

How do you open the main window what file do I run or open?

how to populate ExtraCameraProfiles.xml

Hi Michael @kunzmi,
I've tried to test run this project on a set of converted DNG raw files (from canon's CR2).
I'm getting a null exception right after loading a few images: basically at the point of getting a value from the null profile (absent in ExtraCameraProfiles.xml):
DNGfile.cs L306:

ExtraCameraProfile profile = profiles.GetProfile(make, uniqueModel); 
// profile is null for everything besides PENTAX K-3

Question: what is the suggested way to generate or get one for my camera?

Thank you very much!

The system cannot find the file specified

Hello!
First of all I want to say great work, very thorough. If you ever get an explanaition to why the eigen value "division" is implemented in the way described in the report, I am interested to know the reasoning.

But now to my question. I have never worked with either C# or CUDA, so perhaps this is a beginners mistake. I have installed CUDA 10.2 and if I open the project in VS2019 and try to build, I get 2 succesful and one skipped is that the way it should be? If I try to run it after I've built the solution, I get an error message that says "Unable to start program ...\PEFStudioDX\Kernels the system cannot find the file specified.".

Cross correlation checking issue

Hi and thanks for the implementation and the detailed explanations on the original article - you definitely helped me a lot.

At the cross correlation validation there is a minor bug that returns a shift check of 5 pixels as true even when sometimes the result is false.
ImageStachAlignatorController.cs - TestCC():
if (sx1-sx2 != 5 && sy1-sy2 != 5) -> should be || instead of &&.

regards, Eyal

functional correctness question

Are there tests that verify functional correctness overall and of individual steps?

In other words, does it work? :)

System.DllNotFoundException：Unable to load DLL“nppisu64_10”:

Hi Michael, Thanks for your helpful job.
My tutor gave me a 3060, but it does not support cuda10. So the error shown in the title appeared under cuda11. How can I update ImageStackAlignator with your latest version of ManagedCuda?
Thanks again！

do you think it has a chance to run in *nix?

Hi @kunzmi
Do you think this project has a reasonable chance to run in mono in linux?
One obvious requirement is that all directX stuff is stripped off.
How about managedCuda and other dependencies? Would they survive and stay functional in *nix (say Ubuntu)?

Could you please help to check this exception?

Hi, thanks for sharing this code!

I'm trying to run the code from either the New York images or the TestMode. But it always generates the exceptions. The system is Windows 10 and the CUDA version is 11.2. The GPU is GTX 1060 Super (6Gb).

If you are convenient, could you please give some advice? Thanks a lot!

ManagedCuda.CudaException
  HResult=0x80131500
  Message=ErrorInvalidHandle: This indicates that a resource handle passed to the API call was not valid. Resource handles are opaque types like CUstream and CUevent.
  Source=ManagedCuda
  StackTrace:
   at ManagedCuda.CudaKernel.Run(Object[] parameters)
   at PEFStudioDX.ComputeDerivatives2Kernel.RunSafe(NPPImage_32fC1 imgSource, NPPImage_32fC1 Ix, NPPImage_32fC1 Iy) in F:\wy\ImageStackAlignator\PEFStudioDX\OpticalFlowKernels.cs:line 139
   at PEFStudioDX.ImageStackAlignatorController.PrepareAccumulation() in F:\wy\ImageStackAlignator\PEFStudioDX\ImageStackAlignatorController.cs:line 1852
   at PEFStudioDX.MainWindow.PrepareAccumulationBtn_Click(Object sender, RoutedEventArgs e) in F:\wy\ImageStackAlignator\PEFStudioDX\MainWindow.xaml.cs:line 265
   at System.Windows.EventRoute.InvokeHandlersImpl(Object source, RoutedEventArgs args, Boolean reRaised)
   at System.Windows.UIElement.RaiseEventImpl(DependencyObject sender, RoutedEventArgs args)
   at System.Windows.Controls.Primitives.ButtonBase.OnClick()
   at System.Windows.Controls.Button.OnClick()
   at System.Windows.Controls.Primitives.ButtonBase.OnMouseLeftButtonUp(MouseButtonEventArgs e)
   at System.Windows.RoutedEventArgs.InvokeHandler(Delegate handler, Object target)
   at System.Windows.RoutedEventHandlerInfo.InvokeHandler(Object target, RoutedEventArgs routedEventArgs)
   at System.Windows.EventRoute.InvokeHandlersImpl(Object source, RoutedEventArgs args, Boolean reRaised)
   at System.Windows.UIElement.ReRaiseEventAs(DependencyObject sender, RoutedEventArgs args, RoutedEvent newEvent)
   at System.Windows.UIElement.OnMouseUpThunk(Object sender, MouseButtonEventArgs e)
   at System.Windows.RoutedEventArgs.InvokeHandler(Delegate handler, Object target)
   at System.Windows.RoutedEventHandlerInfo.InvokeHandler(Object target, RoutedEventArgs routedEventArgs)
   at System.Windows.EventRoute.InvokeHandlersImpl(Object source, RoutedEventArgs args, Boolean reRaised)
   at System.Windows.UIElement.RaiseEventImpl(DependencyObject sender, RoutedEventArgs args)
   at System.Windows.UIElement.RaiseTrustedEvent(RoutedEventArgs args)
   at System.Windows.Input.InputManager.ProcessStagingArea()
   at System.Windows.Input.InputManager.ProcessInput(InputEventArgs input)
   at System.Windows.Input.InputProviderSite.ReportInput(InputReport inputReport)
   at System.Windows.Interop.HwndMouseInputProvider.ReportInput(IntPtr hwnd, InputMode mode, Int32 timestamp, RawMouseActions actions, Int32 x, Int32 y, Int32 wheel)
   at System.Windows.Interop.HwndMouseInputProvider.FilterMessage(IntPtr hwnd, WindowMessage msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
   at System.Windows.Interop.HwndSource.InputFilterMessage(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
   at MS.Win32.HwndWrapper.WndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam, Boolean& handled)
   at MS.Win32.HwndSubclass.DispatcherCallbackOperation(Object o)
   at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs)
   at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)
   at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(DispatcherPriority priority, TimeSpan timeout, Delegate method, Object args, Int32 numArgs)
   at MS.Win32.HwndSubclass.SubclassWndProc(IntPtr hwnd, Int32 msg, IntPtr wParam, IntPtr lParam)
   at MS.Win32.UnsafeNativeMethods.DispatchMessage(MSG& msg)
   at System.Windows.Threading.Dispatcher.PushFrameImpl(DispatcherFrame frame)
   at System.Windows.Application.RunDispatcher(Object ignore)
   at System.Windows.Application.RunInternal(Window window)
   at PEFStudioDX.App.Main()

Loaded images do not display in GUI

I am able to compile, open the program, load the sample image files, but the images do not display in the GUI even afte double clicking on the loaded file name. (Tried Cuda 10.1 and 10.2 and Visual Studio 2019 and 2022 with the same result).

Hovering the mouse over the area where the image should be suggest that the image information is retained as the box on top by the image coordinates changes colors with different coordinates, but yet no image is displayed.

Any suggestions? Thanks!

How does the super resolution works?

Hello Michael.
I am reading your code recently and get confused about how the super resolution works.
In file DeBayerKernels.cu,
line 398 to 402 writes

float posX = ((float)x + 0.5f + dimX / 2) / 2.0f / dimX;
float posY = ((float)y + 0.5f + dimY / 2) / 2.0f / dimY;

float4 kernel = tex2D<float4>(kernelParam, posX, posY);// *(((const float3*)((const char*)kernelParam + (y / 2 + dimY / 4) * strideKernelParam)) + (x / 2 + dimX / 4));
float2 shift = tex2D<float2>(shifts, posX, posY);// *(((const float2*)((const char*)shifts + (y / 2 + dimY / 4) * strideShift)) + (x / 2 + dimX / 4));

I think this will result in pixels in neighborhood of 2*2 area of the final high resolution image get the same kernel and shift.

Similarly, line 414 to 423 writes

int ppsx = x + px + sx + dimX / 2;
int ppsy = y + py + sy + dimY / 2;
int ppx = x + px + dimX / 2;
int ppy = y + py + dimY / 2;

ppsx = min(max(ppsx/2, 0 + dimX / 4), dimX/2 - 1 + dimX / 4);
ppsy = min(max(ppsy/2, 0 + dimY / 4), dimY/2 - 1 + dimY / 4);

ppx = min(max(ppx / 2, 0 + dimX / 4), dimX / 2 - 1 + dimX / 4);
ppy = min(max(ppy / 2, 0 + dimY / 4), dimY / 2 - 1 + dimY / 4);

Making four pairs of (x, y) get the same (ppsx, ppsy) and (ppx, ppy). I think this will result in four pixels in the final image point to the same pixel in the input low resolution image.

As four pixels in neighbourhood of 2*2 area get the same kernel, shift, uncertaintymask and raw img data, I think this will make them have same value in the final image.

What point do I miss in your code to achieve the high resolution?

Besides, it seems values of the 4th channel of the variable _structureTensor4 are always zero, which instantiated at line 1839 in file ImageStackAlignatorController.cs. What is the difference between _structureTensor4 and _structureTensor ?

any update on aligning CUDA versions?

Quick context:
This project requires 10.1 for building cuda kernels
Dependencies in Managed CUDA (NPP dlls specifically) rely on 10.0.
Latest published Managed CUDA is 10.0.

@kunzmi any update on bumping all of that to single CUDA version? Let's say 10.2? :)

Question Only: provenance of code ?

HI,
I'm curious whether you were able to get the original code from the Google authors as a starting point, or did you develop everything from scratch? Do you know if the Google authors released any code and if so, where to find it?

I will compare your analysis with their paper and see if I can shed further light on what you've discovered.

Thanks,
Max Buchheit

tuning parameter k2

Hi,

I'm curious that in the paper k2h = kDetail / (kShrink * A), but in your implementation k2h = kDetail / kShrink * A, why did you modified it?
From formula derivation is [c,s]T corresponding to k1, and [s,-c]T corresponding to k2? Also, in your implementation they were reversed.

regards max

Version Update

Can you update solution for DirectX12 and Visual Studio 2022 ?
I try to make it work at this but still have problems with CUDA, SlimDX etc....