-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use unified memory for scalar indexing of permutation matrices (#313)
Co-authored-by: Tim Besard <[email protected]>
- Loading branch information
Showing
1 changed file
with
7 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ff7c7eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
ff7c7eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/116466
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
ff7c7eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
private array/construct
26687.5
ns23715.25
ns1.13
private array/broadcast
465979.5
ns474145.5
ns0.98
private array/random/randn/Float32
993270.5
ns994125
ns1.00
private array/random/randn!/Float32
632166.5
ns644458.5
ns0.98
private array/random/rand!/Int64
568500
ns569958
ns1.00
private array/random/rand!/Float32
583500
ns606250
ns0.96
private array/random/rand/Int64
880458
ns831750
ns1.06
private array/random/rand/Float32
844333.5
ns897625
ns0.94
private array/copyto!/gpu_to_gpu
614333
ns660666
ns0.93
private array/copyto!/cpu_to_gpu
739479
ns555208
ns1.33
private array/copyto!/gpu_to_cpu
599208
ns709417
ns0.84
private array/accumulate/1d
1447750.5
ns1430125
ns1.01
private array/accumulate/2d
1496375
ns1499500
ns1.00
private array/iteration/findall/int
2263917
ns2210520.5
ns1.02
private array/iteration/findall/bool
1989875
ns2041209
ns0.97
private array/iteration/findfirst/int
1678000
ns1704833
ns0.98
private array/iteration/findfirst/bool
1663625
ns1645334
ns1.01
private array/iteration/scalar
2393834
ns2430625
ns0.98
private array/iteration/logical
3431520.5
ns3432895.5
ns1.00
private array/iteration/findmin/1d
1794125
ns1763667
ns1.02
private array/iteration/findmin/2d
1403416
ns1353479
ns1.04
private array/reductions/reduce/1d
805792
ns730853.5
ns1.10
private array/reductions/reduce/2d
704146
ns709708
ns0.99
private array/reductions/mapreduce/1d
815812.5
ns800041
ns1.02
private array/reductions/mapreduce/2d
716666.5
ns713125
ns1.00
private array/permutedims/4d
943959
ns949333
ns0.99
private array/permutedims/2d
938875
ns930958
ns1.01
private array/permutedims/3d
1005416.5
ns1018708.5
ns0.99
private array/copy
862875
ns582583
ns1.48
latency/precompile
4407793041
ns4403995333
ns1.00
latency/ttfp
6915521687.5
ns6895957979
ns1.00
latency/import
726643917
ns723655188
ns1.00
integration/metaldevrt
749270.5
ns757604
ns0.99
integration/byval/slices=1
1557959
ns1623541
ns0.96
integration/byval/slices=3
8832020.5
ns8853854
ns1.00
integration/byval/reference
1611291
ns1573521
ns1.02
integration/byval/slices=2
2583750
ns2624459
ns0.98
kernel/indexing
476584
ns455583
ns1.05
kernel/indexing_checked
441500
ns461916
ns0.96
kernel/launch
10875
ns10875
ns1
metal/synchronization/stream
19208
ns19250
ns1.00
metal/synchronization/context
19750
ns19791
ns1.00
shared array/construct
23756.916666666664
ns23972.166666666668
ns0.99
shared array/broadcast
469584
ns478708
ns0.98
shared array/random/randn/Float32
1020166
ns987500
ns1.03
shared array/random/randn!/Float32
634458
ns641062.5
ns0.99
shared array/random/rand!/Int64
572000
ns576520.5
ns0.99
shared array/random/rand!/Float32
593208.5
ns592333.5
ns1.00
shared array/random/rand/Int64
742792
ns870458
ns0.85
shared array/random/rand/Float32
898812.5
ns935229
ns0.96
shared array/copyto!/gpu_to_gpu
659667
ns546667
ns1.21
shared array/copyto!/cpu_to_gpu
94458
ns94125
ns1.00
shared array/copyto!/gpu_to_cpu
84333
ns84208
ns1.00
shared array/accumulate/1d
1418250
ns1434979
ns0.99
shared array/accumulate/2d
1500167
ns1497729
ns1.00
shared array/iteration/findall/int
1939666
ns1971125
ns0.98
shared array/iteration/findall/bool
1746333
ns1777500
ns0.98
shared array/iteration/findfirst/int
1413458
ns1410291
ns1.00
shared array/iteration/findfirst/bool
1374750
ns1388708
ns0.99
shared array/iteration/scalar
189167
ns189562.5
ns1.00
shared array/iteration/logical
3212770.5
ns3205291
ns1.00
shared array/iteration/findmin/1d
1481709
ns1479229
ns1.00
shared array/iteration/findmin/2d
1379250
ns1373083.5
ns1.00
shared array/reductions/reduce/1d
659583
ns616666
ns1.07
shared array/reductions/reduce/2d
706354
ns716854.5
ns0.99
shared array/reductions/mapreduce/1d
620667
ns686417
ns0.90
shared array/reductions/mapreduce/2d
704958.5
ns710584
ns0.99
shared array/permutedims/4d
963438
ns960250
ns1.00
shared array/permutedims/2d
939020.5
ns925458.5
ns1.01
shared array/permutedims/3d
1003520.5
ns1015208.5
ns0.99
shared array/copy
880541
ns598354.5
ns1.47
This comment was automatically generated by workflow using github-action-benchmark.