Infstones

Blockchain Site Reliability Engineer

Texas Full Time
Job Position: Blockchain Site Reliability Engineer
Location: Dallas, TX, USA (Remote Acceptable - USA Applicants Only)
 
About Company
InfStones is an advanced, enterprise-grade Platform as a Service (PaaS) blockchain infrastructure provider trusted by the top blockchain companies in the world. InfStones’ AI-based infrastructure provides developers worldwide with a rugged, powerful node management platform alongside an easy-to-use API. With over 20,000 nodes supported on over 80 blockchains, InfStones gives developers all the control they need - reliability, speed, efficiency, security, and scalability - for cross-chain DeFi, NFT, GameFi, and decentralized application development. InfStones is trusted by the biggest blockchain companies in the world including Binance, CoinList, BitGo, OKX, Chainlink, Polygon, Harmony, and KuCoin, among a hundred other customers.  InfStones is dedicated to developing the next evolution of a better world through limitless Web3 innovation.

To date, InfStones has raised over $110 million in capital and is backed by Softbank, GGV Capital, Susquehanna International Group (SIG), Dragonfly Capital, Qiming Venture Partners, Plug and Play, and many renowned institutional investors. InfStones is proud to offer medical, vision, dental, short-term and long-term disability insurance, 401(k) plan with company matching, FSA, and other benefits to all full-time employees, along with flexible paid time off, sick days, and holidays.

If you enjoy being on the cutting edge of technology, we encourage you to apply!
 
Job Description
As a Blockchain Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, availability, and performance of blockchain nodes and related infrastructure. You’ll monitor, troubleshoot, and resolve incidents in production environments, while also building automation tools to improve efficiency and reduce operational risks.

This role requires strong Linux system expertise, solid on-call and incident response experience, and the ability to work under pressure to quickly restore services. You’ll also collaborate with protocol engineers and open-source communities to ensure smooth upgrades and long-term system stability.
 
Key Responsibilities
1. Deploy, monitor, and maintain blockchain nodes across multiple networks.
2. Ensure system reliability and uptime by actively managing incidents, troubleshooting, and resolving node failures.
3. Develop automation and maintenance tools (using Golang, Shell, Python, etc.) to streamline operations.
4. Build and maintain monitoring, alerting, and logging systems to proactively detect and address issues.
5. Collaborate with engineering teams and solution architects on reliability improvements and incident prevention.
6. Participate in the on-call rotation to provide timely incident response and resolution.

Qualifications
1. Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
2. Strong Linux system administration skills (networking, performance tuning, debugging, security).
3. Expertise with at least one mainstream programming language such as Golang, Python, Javascript, Rust, etc., and have good programming skills and programming habits.
4. Experience with monitoring/alerting tools (e.g., Prometheus, Grafana, ELK, etc.).
5. Strong problem-solving skills and the ability to respond quickly under pressure.
6. Solid technical documentation skills.

Prefers (Nice to have)
1. Hands-on experience with blockchain node deployment, maintenance, and upgrades.
2. Familiarity with mainstream blockchain protocols (e.g., Ethereum, Cosmos, Polkadot, Solana).
3. Experience with containerization/orchestration tools (Docker, Kubernetes).
4. Knowledge of smart contracts, Web3 RPC, or Solidity is a plus.