Showing 1 - 4 results of 4 for search '(( binary hastv driven optimization algorithm ) OR ( library based gpu optimization algorithm ))', query time: 0.42s Refine Results
  1. 1

    Data_Sheet_1_Fast Simulation of a Multi-Area Spiking Network Model of Macaque Cortex on an MPI-GPU Cluster.PDF by Gianmarco Tiddia (10824118)

    Published 2022
    “…NEST GPU is a GPU library written in CUDA-C/C++ for large-scale simulations of spiking neural networks, which was recently extended with a novel algorithm for remote spike communication through MPI on a GPU cluster. …”
  2. 2

    An Ecological Benchmark of Photo Editing Software: A Comparative Analysis of Local vs. Cloud Workflows by Pierre-Alexis DELAROCHE (22092572)

    Published 2025
    “…Technical Architecture Overview Computational Environment Specifications Our experimental infrastructure leverages a heterogeneous multi-node computational topology encompassing three distinct hardware abstraction layers: Node Configuration Alpha (Intel-NVIDIA Heterogeneous Architecture) Processor: Intel Core i7-12700K (Alder Lake microarchitecture) - 12-core hybrid architecture (8 P-cores + 4 E-cores) - Base frequency: 3.6 GHz, Max turbo: 5.0 GHz - Cache hierarchy: 32KB L1I + 48KB L1D per P-core, 12MB L3 shared - Instruction set extensions: AVX2, AVX-512, SSE4.2 - Thermal design power: 125W (PL1), 190W (PL2) Memory Subsystem: 32GB DDR4-3200 JEDEC-compliant DIMM - Dual-channel configuration, ECC-disabled - Memory controller integrated within CPU die - Peak theoretical bandwidth: 51.2 GB/s GPU Accelerator: NVIDIA GeForce RTX 3070 (GA104 silicon) - CUDA compute capability: 8.6 - RT cores: 46 (2nd gen), Tensor cores: 184 (3rd gen) - Memory: 8GB GDDR6 @ 448 GB/s bandwidth - PCIe 4.0 x16 interface with GPU Direct RDMA support Node Configuration Beta (AMD Zen3+ Architecture) Processor: AMD Ryzen 7 5800X (Zen 3 microarchitecture) - 8-core monolithic design, simultaneous multithreading enabled - Base frequency: 3.8 GHz, Max boost: 4.7 GHz - Cache hierarchy: 32KB L1I + 32KB L1D per core, 32MB L3 shared - Infinity Fabric interconnect @ 1800 MHz - Thermal design power: 105W Memory Subsystem: 16GB DDR4-3600 overclocked configuration - Dual-channel with optimized subtimings (CL16-19-19-39) - Memory controller frequency: 1800 MHz (1:1 FCLK ratio) GPU Accelerator: NVIDIA GeForce GTX 1660 (TU116 silicon) - CUDA compute capability: 7.5 - Memory: 6GB GDDR5 @ 192 GB/s bandwidth - Turing shader architecture without RT/Tensor cores Node Configuration Gamma (Intel Raptor Lake High-Performance) Processor: Intel Core i9-13900K (Raptor Lake microarchitecture) - 24-core hybrid topology (8 P-cores + 16 E-cores) - P-core frequency: 3.0 GHz base, 5.8 GHz max turbo - E-core frequency: 2.2 GHz base, 4.3 GHz max turbo - Cache hierarchy: 36MB L3 shared, Intel Smart Cache technology - Thermal velocity boost with thermal monitoring Memory Subsystem: 64GB DDR5-5600 high-bandwidth configuration - Quad-channel topology with advanced error correction - Peak theoretical bandwidth: 89.6 GB/s GPU Accelerator: NVIDIA GeForce RTX 4080 (AD103 silicon) - Ada Lovelace architecture, CUDA compute capability: 8.9 - RT cores: 76 (3rd gen), Tensor cores: 304 (4th gen) - Memory: 16GB GDDR6X @ 716.8 GB/s bandwidth - PCIe 4.0 x16 with NVLink-ready topology Instrumentation and Telemetry Framework Power Consumption Monitoring Infrastructure Our energy profiling subsystem employs a multi-layered approach to capture granular power consumption metrics across the entire computational stack: Hardware Performance Counters (HPC): Intel RAPL (Running Average Power Limit) interface for CPU package power measurement with sub-millisecond resolution GPU Telemetry: NVIDIA Management Library (NVML) API for real-time GPU power draw monitoring via PCIe sideband signaling System-level PMU: Performance Monitoring Unit instrumentation leveraging MSR (Model Specific Register) access for architectural event sampling Network Interface Telemetry: SNMP-based monitoring of NIC power consumption during cloud upload/download phases Temporal Synchronization Protocol All measurement vectors utilize high-resolution performance counters (HPET) with nanosecond precision timestamps, synchronized via Network Time Protocol (NTP) to ensure temporal coherence across distributed measurement points. …”
  3. 3

    Aluminum alloy industrial materials defect by Ying Han (20349093)

    Published 2024
    “…</p><p dir="ltr">Install PyTorch based on your system:</p><p dir="ltr">For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge</p><p dir="ltr">Install some necessary libraries:</p><p dir="ltr">Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1</p><p dir="ltr">Install astropy with: conda install astropy=4.2.1</p><p dir="ltr">Install pandas using: conda install anaconda pandas=1.2.4</p><p dir="ltr">Install Matplotlib with: conda install conda-forge matplotlib=3.5.3</p><p dir="ltr">Install scipy by entering: conda install scipy=1.10.1</p><h4><b>Repeatability</b></h4><p dir="ltr">For PyTorch, it's a well-known fact:</p><p dir="ltr">There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. …”
  4. 4

    LinearSolve.jl: because A\b is not good enough by Christopher Rackauckas (9197216)

    Published 2022
    “…Short list: LU, QR, SVD, RecursiveFactorization.jl (pure Julia, and the fastest?), GPU-offload LU, UMFPACK, KLU, CG, GMRES, Pardiso, ...…”