Building Jarvis: An AI Butler for Minecraft

How a simple idea turned into a full-featured AI companion plugin


I've been running a Minecraft server for a while now, and like a lot of server owners, I've always wanted something smarter than the usual command blocks and basic NPC plugins. Something that could actually understand what I wanted. Not "click this sign to teleport" — I mean just... talking to it. Naturally. Like you'd talk to an assistant.

That itch turned into a project called Jarvis, and this is the story of building it.


Where It Started

The original concept was simple: spawn an NPC that follows you around and does stuff when you ask. Mine some ores, fight mobs, carry your loot. The kind of thing you'd see in a Minecraft movie and think "why doesn't this actually exist."

The technical foundation was pretty clear from the start — Citizens for the NPC layer, and one of the modern AI APIs for the "understanding" part. The tricky part was everything in between.

The first version hooked into OpenAI's API and could parse basic commands. Say "summon" in chat, Jarvis appears. Say "mine" and he'd start digging. Primitive, but it worked. More importantly, it felt like something.


Teaching Him to Mine

Mining turned out to be deceptively hard to get right.

The naive approach — find the nearest ore, walk to it, break it — falls apart almost immediately in practice. What happens when pathfinding fails? What if another player's NPC is in the way? What if the ore is behind a wall? What if he gets stuck in a 1-block gap?

We ended up building a proper state machine with five phases:

SEARCHING → MOVING → MINING → COLLECTING → RETURNING

Each phase has its own logic, timeout handling, and fallback behavior. If pathfinding fails, he teleports. If a block won't break after a few attempts, he marks it and moves on. If he's been in one phase too long, he resets. It took a few iterations to get the transitions right, but once it clicked, the mining felt genuinely competent.

Then came the inventory problem. Jarvis would come back from a mining run with his pockets full of... cobblestone. And dirt. And gravel. He'd been faithfully picking up every item that dropped during navigation, which meant the actual ore was buried under a pile of junk. The fix was a junk filter — a defined set of worthless materials he simply ignores during collection. Seems obvious in hindsight.

The other mining bug that took a while to track down was ore targeting. If you said "mine diamond," he'd run off and mine the nearest ore — which was usually coal. The issue was that the ore selection wasn't filtering at all. The fix was building a keyword map that translates natural language ("netherite," "ancient debris," "debris") into sets of Minecraft material types, which then gets passed as a filter into the search routine. Now "mine diamond" actually mines diamond.


The AI Layer

One of the things I'm genuinely proud of here is the multi-backend AI setup. Jarvis doesn't lock you into one provider. He supports:

  • OpenAI (GPT-4o and others)
  • Anthropic Claude
  • xAI Grok
  • Google Gemini
  • Ollama (local models, no API key required)

They all fail over to each other automatically. If OpenAI returns an error, Jarvis tries Claude. If Claude's down, he tries Grok. You can configure the order however you like. For a server plugin where reliability matters, that kind of resilience is worth the extra wiring.

The natural language layer works by sending a structured prompt to whichever AI is active, asking it to return a JSON object with an action type and parameters. Something like:

{                                                                                                                                                                                                                                                                                                                                          
    "action": "mine",                                       
    "parameters": { "ore": "diamond" },                                                                                                                                                                                                                                                                                                                                                                                                     
    "response": "Right away, sir. I'll get on that immediately."                                                                                                                                                                                                                                                                                                                                                                            
  }

The response field is what Jarvis says back to you — and that's where the personality comes in. Getting the system prompt right so the AI returns valid JSON and writes something witty took more iteration than I expected. But it's one of those things that makes the whole experience feel alive.


The Command Interface

Early on, all commands were single-word: /jarvis mine, /jarvis summon. That was fine until users started trying natural things like /jarvis set the time to night and getting "not a valid command" errors.

The fix was to route any unrecognized subcommand straight to the natural language parser. If the first word after /jarvis isn't a known command, it gets passed to the AI as-is. So /jarvis I need some diamonds works exactly like typing it in chat would. The command becomes a natural language interface, not just a keyword dispatcher.


The Butler Expansion

After the core was solid, the question became: what else should a digital butler do?

The answer turned out to be: quite a lot.

We built out a full suite of admin actions that Jarvis can execute on request. Things like giving players items, changing game modes, setting weather, adjusting difficulty, broadcasting announcements, clearing mobs, saving the world. All of these route through a confirmation system — dangerous actions show a clickable [Confirm] [Cancel] prompt before anything happens. The AI describes what it's about to do in plain English, you confirm, it runs. No accidental surprises.

The console command execution feature was the one that felt most powerful (and most dangerous). You can tell Jarvis to run a command and he'll do it — but only after making you click Confirm. Every console action is gated.

We also built a player request system. Regular players can ask Jarvis for items — "can I get 64 iron ingots?" — and instead of executing immediately, it queues the request and pings online admins. Admins see a list and approve or deny with one click. It's a nice middle ground between "give players everything" and "make them open a ticket."

Then there were the ambient butler behaviors:

  • Auto-greet — when a player joins, Jarvis fires off an AI-generated welcome message.
  • Death commentary — when someone dies, Jarvis provides commentary broadcast to the server. It's usually pretty funny.
  • TPS monitor — Jarvis watches server performance and quietly alerts admins in-game if things start to lag. No more finding out the server was struggling for an hour because nobody was watching the console.

The Bugs That Humbled Us

A few moments in development were genuinely humbling.

The right-click menu regression was a good one. Everything was working, then suddenly right-clicking Jarvis did nothing. Turns out the event listener was using a standard Bukkit event — but Citizens NPC clicks get intercepted by the Citizens API before Bukkit ever sees them. Switch to the Citizens-native click event, problem solved. Sometimes the framework is smarter than you are.

The mining break detection bug was subtler. The code was telling Jarvis to break a block, then scheduling a check one tick later to see if it worked. In practice, the block-breaking call is synchronous — the block becomes air on the same tick you call it. So the one-tick delay meant Jarvis would sometimes try to break the same block twice, report failure, and give up on ores he'd already successfully mined. Removing the delay and checking immediately fixed the whole thing.

The lesson: read the API docs carefully, but also just test the actual behavior. Assumptions about async vs sync will get you every time.


Where We Are Now

Version 0.0.9 is the current release. The full source is on GitHub with a proper README covering installation, configuration, and the full command reference.

What started as "wouldn't it be cool if the NPC could mine" turned into a legitimately useful server tool. Jarvis can manage your world, entertain your players, handle item requests, watch performance, and hold an actual conversation. He's not perfect — no AI-driven system is — but he's reliable, extensible, and genuinely fun to use.

There's more to build. Smarter pathfinding, better build planning, more quest variety, maybe a web dashboard for the request queue. But that's for future versions.

For now, he's a pretty good butler.


Jarvis is open source under the MIT license. Requires Citizens 2, Java 17, and at least one AI API key (or a local Ollama install). Grab the latest release at github.com/iamgadgetman/jarvis.

Comments

Popular Posts