use Chatbot::Eliza; # see below for details
This program is a faithful implementation of the program described by Weizenbaum. It uses a simplified script language (devised by Charles Hayden). The content of the script is the same as Weizenbaum's.
This module encapsulates the Eliza algorithm in the form of an object. This should make the functionality easy to incorporate in larger programs.
The current version of Chatbot::Eliza.pm is available on CPAN:
http://www.perl.com/CPAN-local/modules/by-module/Chatbot/
perl Makefile.PL make test make make install
This will copy Eliza.pm to your perl library directory for use by all perl scripts. You probably must be root to do this, unless you have installed a personal copy of perl.
use Chatbot::Eliza;
$mybot = new Chatbot::Eliza; $mybot->command_interface;
You can also customize certain features of the session:
$myotherbot = new Chatbot::Eliza;
$myotherbot->name( "Hortense" ); $myotherbot->debug( 1 );
$myotherbot->command_interface;
These lines set the name of the bot to be ``Hortense'' and turn on the debugging output.
When creating an Eliza object, you can specify a name and an alternative scriptfile:
$bot = new Chatbot::Eliza "Brian", "myscript.txt";
If you don't specify a script file, then the Eliza module will initialize the new Eliza object with a default script that the module contains within itself.
You can use any of the internal functions in a calling program. The code below takes an arbitrary string and retrieves the reply from the Eliza object:
my $string = "I have too many problems."; my $reply = $mybot->transform( $string );
You can easily create two bots, each with a different script, and see how they interact:
use Chatbot::Eliza
my ($harry, $sally, $he_says, $she_says);
$sally = new Chatbot::Eliza "Sally", "histext.txt"; $harry = new Chatbot::Eliza "Harry", "hertext.txt";
$he_says = "I am sad.";
# Seed the random number generator. srand( time ^ ($$ + ($$ << 15)) );
while (1) { $she_says = $sally->transform( $he_says ); print $sally->name, ": $she_says \n"; $he_says = $harry->transform( $she_says ); print $harry->name, ": $he_says \n"; }
Mechanically, this works well. However, it critically depends on the actual script data. Having two mock Rogerian therapists talk to each other usually does not produce any sensible conversation, of course.
After each call to the transform()
method, the debugging
output for that transformation is stored in a variable called $debug_text.
my $reply = $mybot->transform( "My foot hurts" ); my $debugging = $mybot->debug_text;
This feature always available, even if the instance's $debug
variable is set to 0.
%reasmblist
, except that these rules are only invoked when a user comment is being
retrieved from memory. These contain comments such as ``Earlier you
mentioned that...,'' which are only appropriate for remembered comments.
Rules in the script must be specially marked in order to be included in
this list rather than %reasmblist
. The default script only has a few of these rules.
my $chatterbot = new Chatbot::Eliza;
new()
creates a new Eliza object. This method also calls the
internal _initialize()
method, which in turn calls the
parse_script_data()
method, which initializes the script data.
my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';
The eliza object defaults to the name ``Eliza'', and it contains default script data within itself. However, using the syntax above, you can specify an alternative name and an alternative script file.
See the method parse_script_data().
for a description of the
format of the script file.
$chatterbot->command_interface;
command_interface()
opens an interactive session with the
Eliza object, just like the original Eliza program.
If you want to design your own session format, then you can write your own
while loop and your own functions for prompting for and reading user input,
and use the transform()
method to generate Eliza's responses.
(Note: you do not need to invoke preprocess()
and
postprocess()
directly, because these are invoked from within
the transform()
method.)
But if you're lazy and you want to skip all that, then just use
command_interface().
It's all done for you.
During an interactive session invoked using
command_interface(),
you can enter the word ``debug'' to
toggle debug mode on and off. You can also enter the keyword ``memory'' to
invoke the _debug_memory()
method and print out the contents
of the Eliza instance's memory.
$string = preprocess($string);
preprocess()
applies simple substitution rules to the input
string. Mostly this is to catch varieties in spelling, misspellings,
contractions and the like.
preprocess()
is called from within the
transform()
method. It is applied to user-input text, BEFORE
any processing, and before a reassebly statement has been selected.
It uses the array %pre
, which is created during the parse of the script.
$string = postprocess($string);
postprocess()
applies simple substitution rules to the
reassembly rule. This is where all the ``I'''s and ``you'''s are exchanged.
postprocess()
is called from within the
transform()
function.
It uses the array %post
, created during the parse of the script.
if ($self->_testquit($user_input) ) { ... }
_testquit()
detects words like ``bye'' and ``quit'' and
returns true if it finds one of them as the first word in the sentence.
These words are listed in the script, under the keyword ``quit''.
$self->_debug_memory()
_debug_memory()
is a special function which returns the
contents of Eliza's memory stack.
$reply = $chatterbot->transform( $string, $use_memory );
transform()
applies transformation rules to the user input
string. It invokes preprocess(),
does transformations, then
invokes postprocess().
It returns the tranformed output
string, called $reasmb
.
The algorithm embedded in the transform()
method has three
main parts:
transform()
takes two parameters. The first is the string we
want to transform. The second is a flag which indicates where this sting
came from. If the flag is set, then the string has been pulled from memory,
and we should use reassembly rules appropriate for that. If the flag is not
set, then the string is the most recent user input, and we can use the
ordinary reassembly rules.
The memory flag is only set when the transform()
function is
called recursively. The mechanism for setting this parameter is embedded in
the transoform method itself. If the flag is set inappropriately, it is
ignored.
$max_memory_size
(default: 5) user input strings. Eliza remembers any comment when it
matches a docomposition rule for which there are any reassembly rules for
memory. In the script, such reassembly rules are marked with the keyword
``reasm_for_memory''.
If the transform()
method fails to find any appropriate
decomposition rule for a user's comment, and if there are any comments
inside the memory array, then Eliza may elect to ignore the most recent
comment and instead pull out one of the strings from memory. In this case,
the transform method is called recursively with the memory flag.
Honestly, I am not sure exactly how this memory functionality was implemented in the original Eliza program. Hopefully this implementation is not too far from Weizenbaum's.
$self->parse_script_data; $self->parse_script_data( $script_file );
parse_script_data()
is invoked from the
_initialize()
method, which is called from the
new()
function. However, you can also call this method at any
time against an already-instantiated Eliza instance. In that case, the new
script data is added
to the old script data. The old script data is not deleted.
You can pass a parameter to this function, which is the name of the script file, and it will read in and parse that file. If you do not pass any parameter to this method, then it will read the data embedded at the end of the module as its default script data.
If you pass the name of a script file to parse_script_data(),
and that file is not available for reading, then the module dies.
Each line in the script file can specify a key, a decomposition rule, or a reassembly rule.
key: remember 5 decomp: * i remember * reasmb: Do you often think of (2) ? reasmb: Does thinking of (2) bring anything else to mind ? decomp: * do you remember * reasmb: Did you think I would forget (2) ? reasmb: What about (2) ? reasmb: goto what pre: equivalent alike synon: belief feel think believe wish
The number after the key specifies the rank. If a user's input contains the
keyword, then the transform()
function will try to match one
of the decomposition rules for that keyword. If one matches, then it will
select one of the reassembly rules at random. The number (2) here means
``use whatever set of words matched the second asterisk in the
decomposition rule.''
If you specify a list of synonyms for a word, the you should use a ``@'' when you use that word in a decomposition rule:
decomp: * i @belief i * reasmb: Do you really think so ? reasmb: But you are not sure you (3).
Otherwise, the script will never check to see if there are any synonyms for that keyword.
Reassembly rules should be marked with reasm_for_memory rather than reasmb when it is appropriate for use when a user's comment has been extracted from memory.
key: my 2 decomp: * my * reasm_for_memory: Let's discuss further why your (2). reasm_for_memory: Earlier you said your (2). reasm_for_memory: But your (2). reasm_for_memory: Does that have anything to do with the fact that your (2) ?
parse_script_data()
function parses each line out, and splits
the ``entry'' and ``entrytype'' portion of each line into two variables,
$entry
and $entrytype
.
Next, it uses the string $entrytype
to determine what sort of stuff to expect in the $entry
variable, if anything, and parses it accordingly. In some cases, there is
no second level of key-value pair, so the function does not even bother to
isolate or create $key
and $value
.
$key
is always a single word. $value
can be null, or one single word, or a string composed of several words, or
an array of words.
Based on all these entries and keys and values, the function creates two
giant hashes:
%decomplist
, which holds the decomposition rules for each keyword, and %reasmblist
, which holds the reassembly phrases for each decomposition rule. It also
creates %keyranks
, which holds the ranks for each key.
Six other arrays are created: %reasm_for_memory, %pre, %post,
%synon, @initial,
and @final
.
Implements the classic Eliza algorithm by Prof. Joseph Weizenbaum. Script format devised by Charles Hayden.