Algorithms & optimization Engineer
Listed on 2025-12-02
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Hardware Engineer
Multicore Ware is a global software solutions & products company with its HQ in San Jose, CA, USA. With worldwide offices, it serves its clients and partners in North America, EMEA and APAC regions. Started by a group of researchers, Multicore Ware has grown to serve its clients and partners on HPC & Cloud computing, GPUs, Multicore & Multithread CPUS, DSPs, FPGAs and a variety of AI hardware accelerators.
Multicore Ware was founded by a team of researchers that wanted a better way to program for heterogeneous architectures. With the advent of GPUs and the increasing prevalence of multi-core, multi-architecture platforms, our clients were struggling with the difficulties of using these platforms efficiently.
We started as a boot‑strap services company and have since expanded our portfolio to span products and services related to compilers, machine learning, video codecs, image processing and augmented/virtual reality. Our hardware expertise has also expanded with our team; we now employ experts on HPC and Cloud Computing, GPUs, DSPs, FPGAs, and mobile and embedded platforms. We specialize in accelerating software and algorithms, so if your code targets a multi‑core, heterogeneous platform, we can help.
Job DescriptionWe are seeking a talented engineer to implement and optimize machine learning, computer vision, and numeric libraries for target hardware architecture, including CPUs, GPUs, DSPs, and other accelerators. Your expertise will be instrumental in enabling efficient and high-performance execution of algorithms on these hardware platforms.
Key Responsibilities- Implement and optimize machine learning, computer vision, and numeric libraries for target hardware architectures, including CPUs, GPUs, DSPs, and other accelerators.
- Work closely with software and hardware engineers to ensure optimal performance on target platforms.
- Implement low‑level optimizations, including algorithmic modifications, parallelization, vectorization, and memory access optimizations, to fully leverage the capabilities of the target hardware architectures.
- Work with customers to understand their requirements and implement libraries to meet their needs.
- Develop performance benchmarks and conduct performance analysis to ensure the optimized libraries meet the required performance targets.
- Stay current with the latest advancements in machine learning, computer vision, and high-performance computing.
- More than 4 years of experience working in Algorithm Development, Porting, Optimization & Testing.
- Proficient in programming languages such as C/C++, CUDA, OpenCL, or other relevant languages for hardware optimization.
- Hands‑on experience with hardware architectures, including CPUs, GPUs, DSPs, and accelerators, and familiarity with their programming models and optimization techniques.
- Knowledge of parallel computing, SIMD instructions, memory hierarchies, and cache.
- Experience with performance analysis tools and methodologies for profiling and optimization.
- Knowledge of deep learning frameworks and techniques is good to have.
- Strong problem‑solving skills and ability to work independently or within a team.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).