Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed on Laptop RTX 2060 #16

Closed
stevealexrs opened this issue Oct 28, 2023 · 13 comments
Closed

Speed on Laptop RTX 2060 #16

stevealexrs opened this issue Oct 28, 2023 · 13 comments

Comments

@stevealexrs
Copy link

Is it normal that it runs at around 2 MH/s on RTX 2060 Laptop? I use the default option on README.

@stevealexrs
Copy link
Author

I tried the original profanity tool and it runs at around 200 MH/s

@ytrezq
Copy link

ytrezq commented Nov 4, 2023

@stevealexrs : it seems he is using 32 bits integers for his multiprecision arithmetics. And in a naïve alghoritmic way (compared to ɢᴍᴘ on ᴄᴘᴜ).

@stevealexrs
Copy link
Author

stevealexrs commented Nov 4, 2023

After troubleshooting for a while, the speed doesn't even appear anymore. The fan is running but the progress bar isn't there. I removed all files and clone the repo again but the problem persists.

I am using Windows and it requires to 2 changes to compile. The unsigned int and removing the whole CL_DEVICE_TOPOLOGY_AMD block in getUniqueDeviceIdentifier.

It doesn't work in my dual boot ubuntu linux too. Someone With Linux

@ytrezq
Copy link

ytrezq commented Nov 4, 2023

@stevealexrs : Windows is a different ᴀʙɪ even compared to Linux vs Osx. I think ints are 16‑bits… I have a running Linux binary that detects nothing : even with the command line of the Readme.

We don’t even know the OpenCl version being used as it needs to be declared outside Apple. A compiler warning is generated for OpenCl version 3 but it states that it should be declared.

@stevealexrs
Copy link
Author

stevealexrs commented Nov 5, 2023

After troubleshooting for a while, the speed doesn't even appear anymore. The fan is running but the progress bar isn't there. I removed all files and clone the repo again but the problem persists.

This progress bar is shown on Windows on my first run. My laptop has a mux switch that let me choose between dGPU or both iGPU and dGPU (hybrid mode). The original setting was hybrid mode. Later I switched to dGPU only mode to see if the performance is different. Then I switched back to hybrid mode but the progress bar disappeared. It reappeared after I switched to dGPU only mode again. Weird bug.

@ytrezq
Copy link

ytrezq commented Nov 5, 2023

After troubleshooting for a while, the speed doesn't even appear anymore. The fan is running but the progress bar isn't there. I removed all files and clone the repo again but the problem persists.

This progress bar is shown on Windows on my first run. My laptop has a mux switch that let me choose between dGPU or both iGPU and dGPU (hybrid mode). The original setting was hybrid mode. Later I switched to dGPU only mode to see if the performance is different. Then I switched back to hybrid mode but the progress bar disappeared. It reappeared after I switched to dGPU only mode again. Weird bug.

No. you ll just switch to a gpu where memory wasn t initialized (initialization step).

@stevealexrs
Copy link
Author

No. you ll just switch to a gpu where memory wasn t initialized (initialization step).

I use --skip arg when there are 2 gpu.
When I run the same command profanity.x64 --benchmark --skip 0 on both original profanity and this brute force tool, the original prints this and continues to run:

Mode: benchmark
Target: Address
Devices:
  GPU1: NVIDIA GeForce RTX 2060, 6442123264 bytes available, 30 compute units (precompiled = yes)

Initializing OpenCL...
  Creating context...OK
  Loading kernel from binary...OK
  Building program...OK

Initializing devices...
  This should take less than a minute. The number of objects initialized on each
  device is equal to inverse-size * inverse-multiple. To lower
  initialization time (and memory footprint) I suggest lowering the
  inverse-multiple first. You can do this via the -I switch. Do note that
  this might negatively impact your performance.

  GPU1 initialized

Initialization time: 3 seconds
Running...
  Always verify that a private key generated by this program corresponds to the
  public key printed by importing it to a wallet of your choice. This program
  like any software might contain bugs and it does by design cut corners to
  improve overall performance.

Total: 195.446 MH/s - GPU1: 195.446 MH/s

while this brute force tool prints the same thing minus last line Total: 195.446 MH/s - GPU1: 195.446 MH/s and stops. Maybe skip arg is somehow broken under certain condition.

@stevealexrs
Copy link
Author

while this brute force tool prints the same thing minus last line Total: 195.446 MH/s - GPU1: 195.446 MH/s and stops. Maybe skip arg is somehow broken under certain condition.

I figure out the cause of this bug, the author changed the printSpeed() code quite a bit and it skips GPU with m_index = 0(why??). Initially my nvidia GPU was index 0, intel GPU was index 1, that's why the speed is shown, then at some point intel becomes GPU0 and nvidia becomes GPU1. Not sure the exact flow but it works after I remove the skip code.

@ytrezq
Copy link

ytrezq commented Nov 6, 2023

while this brute force tool prints the same thing minus last line Total: 195.446 MH/s - GPU1: 195.446 MH/s and stops. Maybe skip arg is somehow broken under certain condition.

I figure out the cause of this bug, the author changed the printSpeed() code quite a bit and it skips GPU with m_index = 0(why??). Initially my nvidia GPU was index 0, intel GPU was index 1, that's why the speed is shown, then at some point intel becomes GPU0 and nvidia becomes GPU1. Not sure the exact flow but it works after I remove the skip code.

Please submit a pull request.

@stevealexrs
Copy link
Author

Not sure if there is any side effect, I will wait until I successfully run the whole thing. It will take me around 80 hours to run all 8 batches with default options. I don't understand why rtx 2060(mobile) is so slow compared to apple m1 pro. The author says it took him 26 mins to brute force his friend key.

@stevealexrs
Copy link
Author

Apple silicon mac gpu uses unified memory while Nvidia dGPU uses external memory (dedicated is 6GB only). Is it possible this slow it down?

@ytrezq
Copy link

ytrezq commented Nov 8, 2023

Apple silicon mac gpu uses unified memory while Nvidia dGPU uses external memory (dedicated is 6GB only). Is it possible this slow it down?

Not sure it uses Unified Memory. For those type of devices, while the memory is shared physically it s partitionned at the firmware level (on Intel you can even configure the split).
And anyway, memory is normally initialized directly on the gpu by the gpu.

@ytrezq
Copy link

ytrezq commented Nov 8, 2023

Back to the original issue, I may found why or an additional problem.

Doesn’t this

#define CL_DEVICE_PCI_BUS_ID_NV 0x4008
looks like to be a hardcoded value that won’t work on every systems ?

No idea on how to fix it though…

@stevealexrs stevealexrs closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants