Please Note:
1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)
2. If you already have a Candidate Account, please Sign-In before you apply.
Job Description:
Requirements / Experience:
- Provide support to Broadcom AI Infrastructure based on VKS with multiple LLMs
- Management of day-to-day operations for the AI environment, end user support for issues/questions/clarifications
- Strong knowledge in AI area
- Strong Python Proficiency: Deep experience in Python, as it's the primary language for most AI/ML frameworks and scripting.
- LLM API Integration: Proven experience integrating with major Large Language Model (LLM) APIs such as Anthropic (Claude models), and Google (Gemini).
- AI Orchestration Frameworks: Hands-on experience with frameworks like LangChain, project Goose, Google Agent Builder, multi-step AI workflows and agents.
- RAG (Retrieval-Augmented Generation): Practical knowledge of building RAG pipelines. This includes generating embeddings and using vector databases. Knowledge of reranker models
- Prompt Engineering: A strong understanding of how to design, test, and refine effective prompts to get reliable, accurate, and consistent outputs.
- Backend & API Development: Experience building and maintaining REST APIs (using frameworks like Flask, FastAPI, or Django) to serve AI-powered endpoints to other internal systems.
- Database Knowledge: Proficiency with SQL (e.g., PostgreSQL, MySQL) and/or NoSQL databases for storing logs, user data, and operational metrics.
- Core DevOps Skills: Familiarity with Git, Docker, and CI/CD pipelines to properly test and deploy your AI applications
- Model Context Protocol (MCP): Experience developing and running MCP servers. This is critical for building a standardized bridge between our AI agents and our internal operational tools, databases, and APIs, enabling the AI to safely perform actions on the operations team's behalf.
Nice to have:
- Experience fine-tuning smaller, open-source models (e.g., Llama 3, Mistral) for specific tasks
- Familiarity with simple frontend or app-building tools (like Streamlit or Gradio) to quickly build web-based UIs for the tools you create
- Knowledge of cloud provider AI services (Google Vertex AI & Google AI suite)
- Strong knowledge of Linux operating systems and VKS Kubernetes platform
- Willingness to support end user issues / resolution, fulfillment of service & change requests consistent with standard process & procedures
- Works with multiple IT teams to ensure “Keep the Lights On” goal for the AI Environment
- Looks for opportunities to improve day-to-day operations by implementing pro-active monitoring/management and automation of routine tasks
- Develops processes, checklists for handing projects and subsequent operations tasks. Oversees upkeep of process/procedures & technical documentation. Makes improvements by incorporating the learnings from recent issues, security compliance reports, upgrades & patches.
- Ability to work in a fast-moving environment & self-driven.
- Coaches and mentors junior team members
Qualifications:
- Bachelor’s degree in Computer Science, Information Systems Management, Computer Engineering, and Mathematics or a related discipline or equivalent work experience.
- 5+ Years of experience in Information Technology
- 2+ years of experience in the areas of AI Infrastructure, architecture & design specifically in highly virtualized environments
- Strong knowledge AI Orchestration Framework, LLM API integrations, RAG, Prompt engineering, Backend API integrations, Knowledge of database integrations etc.
- Must be self-motivated with excellent teamwork, interpersonal, communication, presentation, and organizational skills
- Ability to work effectively with clients, management, support team members and staff members in a fast-paced environment
- Deep technical knowledge of Unix/Linux and virtualization technologies especially VMware and Nutanix
- Scripting Skills (Shell, Perl, Python)
- Strong problem analysis and troubleshooting skills
- Must be proficient in hardware/OS monitoring concepts and automation framework
- Applies broad concepts and theories to achieve innovative and cost-effective solutions to complex problems
- Knowledge of networking and other OS technologies is a plus
- Proactive management techniques, exposure to automation, scripting skills is a plus
- Determines own priorities, both tactical and strategic
Additional Job Description:
Compensation and Benefits
The annual base salary range for this position is $81,000 - $130,000.
This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements.
Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence.
Broadcom is proud to be an equal opportunity employer. We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law. We will also consider qualified applicants with arrest and conviction records consistent with local law.
If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.