Agentic AI,LLM Evaluation,and Systems Research Internship Job Princeton area,New Jersey USA,Research/Development

Position: Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship
** Job Family:
** Research & Predevelopment

** Req :
** 510552

** Agentic AI, LLM Evaluation, and Trustworthy Systems Research Internship*
* Here at Siemens, we take pride in enabling sustainable
progress through technology. We do this through empowering customers by
combining the real and digital worlds. Improving how we live, work, and move
today and for the next generation! We know that the only way a business thrive is if our people are
thriving. That's why we always put our people first. Our global, diverse team
would be happy to support you and challenge you to grow in new ways.

Siemens Research & Predevelopment (RPD) is the
central R&D department of Siemens and thus has a key role to shape the
future of our products. RPD acts as a strategic partner to support the
executive units of Siemens. In consequence the main research focus is on future
technologies for industry, infrastructure, mobility, and healthcare. In this
context, we are looking for an Intern that supports our Software Systems and
Processes team in Princeton, NJ by researching and developing scalable intelligent
systems using LLMs and semantic technologies.

** Transform the everyday with us!*
* Are you passionate about ensuring the reliability and
robustness of cutting-edge AI systems? We're looking for an innovative PhD
intern to join our team and contribute to groundbreaking research focused on
implementing a Verification and Validation (V&V) framework for multi-agent
systems.

Modern software is rapidly moving from static
applications to agentic AI systems that plan, reason, call tools, coordinate
across agents, and adapt over multiple steps. As these LLM-powered systems
enter industrial workflows, the critical challenge is no longer only building
capable agents-it is evaluating, verifying, and validating that they behave
reliably, safely, and transparently in complex, uncertain environments. In this
internship, you will research and prototype next-generation methods for LLM and
multi-agent system evaluation, including benchmarks, guardrails, failure-mode
analysis, runtime monitoring, formal methods, and testing technologies. You
will help advance trustworthy AI for real-world industrial software systems
where robustness, explainability, and dependable performance matter.

The internship provides a unique experience to contribute
to innovative industrial applications while mentored by experienced
professionals in an international setting.

This role is preferred to be on-site in Princeton, NJ,
for a hands-on and collaborative experience, however remote candidates will be
considered. The position is a full-time
role for at least 3 months with the possibility of extension.

** Key Responsibilities*
* + Research, design, and prototype V&V methods for multi-agent and agentic AI systems, with emphasis on reliability, safety, repeatability, explainability, and robustness under uncertain operating conditions.

+ Develop evaluation harnesses, benchmarks, and test scenarios for LLM-based agents, including tool use, multi-step reasoning, orchestration, failure-mode analysis, and adversarial or edge-case behavior.

+ Implement proof-of-concept prototypes in Python using modern AI and agent frameworks, formal methods, testing technologies, and retrieval-augmented or knowledge-grounded architectures where appropriate.

+ Investigate verification strategies such as model checking, property-based testing,fuzz testing, static or dynamic analysis, runtime monitoring, guardrails,and trace-based observability for complex intelligent systems.

+ Collaborate with researchers and engineers to define milestones, run experiments, analyze results, and translate research insights into scalable industrial software concepts.

+ Document findings,contribute to scientific publications or technical reports, and present results clearly to internal and external technical audiences.

** Basic Qualifications*
* + Currently enrolled in aPhD program in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Formal Methods, or a closely related technical field.

+ 3+ years of research orhands-on experience in AI, machine learning, generative AI, software engineering, formal methods, autonomous systems,…

Agentic AI, LLM Evaluation, and Systems Research Internship