-
Microsoft
- Greater Seattle Area, WA, USA
- https://romanlutz.github.io
- in/romanlutz
- @romanlutz.bsky.social
Starred repositories
Repository for "Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models"
A tool that validates academic paper references
An extremely fast Python type checker and language server, written in Rust.
This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our v…
[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Benchmarking LLM agents on Cyber Threat Investigation.
Simple Prompt Injection Kit for Evaluation and Exploitation
Recursively scan a Python module and export numpydoc docstrings to JSON
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
A simple screen parsing tool towards pure vision based GUI agent
AI Red Teaming playground labs to run AI Red Teaming trainings including infrastructure.
Gather metrics on issues/prs/discussions such as time to first response, count of issues opened, closed, etc.
A Text-Based Environment for Interactive Debugging
Library for building WebSocket servers and clients in Python
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
A Comprehensive Assessment of Trustworthiness in GPT Models
This repository curates a collection of monthly white papers focused on the latest LLM attack and defenses.
Creating a non-player character in a game backed by generative AI that will stay focused on its goals
Results and Analysis of Single-Turn Crescendo Attacks (STCA) on Large Language Models: Evaluating vulnerabilities in content moderation through adversarial techniques.
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line …
Test Software for the Characterization of AI Technologies