We need to wait for RedKit – which I hope we'll get, one of these days. And I'm hoping that RedKit will give me a way of opening a socket to a helper program.
The mod I want to build is generic speech – so that the NPCs you meet in the street have actual conversation. The voices will have to be done with text-to-speech technology, and if it's going to be a free mod (which it will have to be) I can't use Lyrebird voices. But I hope I can make it not too uncanny-valley.
Basically, an NPC needs to be able to answer questions about
- The weather;
- Local streets, vendors, places of interest (particularly how to get to these);
- Local crime situation/gangs/areas of particular danger;
- Recent events in the game world (much the same as you get on current TV news);
Some NPCs should additionally be able to talk about aspects of game lore – for example, old folk may remember the events of 2020, and so on.
I don't know, at this stage, how Cyberpunk is modelling faction hostility – I know that gangs are supposed to be more or less hostile to you depending on how you've interacted with them in the past, but how this works and even whether it actually works I don't yet know. But if faction hostility is modelled then whether an NPC is willing to answer your questions has got to vary with faction hostility.
Obviously stage one of this is to throw you up a menu of questions you can ask, or things you can say, you pick one, the NPC responds to it.
But stage two, which is what I really want to get to, is, you just talk into a microphone and the NPC you are looking at talks back.
The pipeline goes like this:
- speech to text;
- decision whether to co-operate bases on faction hostility;
- rough parser engine to derive quey – more on this below;
- query database of game knowledge;
- filter results based on what this NPC should be expected to know;
- generate text using an appropriate dialect phrasebook for the NPC;
- text to speech, ideally (but obviously not in a free mod) using a lyrebird voice based on the voice of the voice actor who speaks anything this NPC currently speaks.
So the key to this is being able to parse what the player says into something the NPC can respond to. Obviously, there are going to be things that you can't parse, and you have to have a library of bits of canned text to deal with that. But I used to write opportunistic parsers for text adventure games, thirty-mumble years ago, and by the time text adventure games were going out of fashion we'd got quite good at that. The parser doesn't have to even nearly have a full understanding of the player's language to be able to parse things like "how do I get to Jig Jig Street?", or "where can I get shotgun shells round here?". Add some simple modelling of greeting protocols and you've got something which
might create a much more immersive city.
The other end of it is that a Voodoo Boy doesn't talk like and Aldecaldo and a corpo doesn't talk like a streetkid, so you have to parse whatever you get back from the database into a semantic structure and then turn that semantic structure back into text using what I'm calling a dialect phrasebook, which is essentially a library of sentence templates in the dialect of the particular NPC. You probably need a dozen of these phrasebooks to give a reasonable variety to NPC responses.
This all may seem like a mad idea, and it is. I think I can build it (provided I can call out from RedKit's scripting to a helper program of my own); but until I've built it in the context of a large open world game like Cyberpunk, I won't really know whether it will be fun to play with.