Tags: %TAGME{ tpaction="" web="SCMP118" tag="" }% view all tags

A test indexing program

Text Index

Last time we created a class to create an word index:



Parsing a document into words

Now we want to build an index of an actual text. In order to do this we need to read all the words from a text, and place them in the index.

Here is a program that reads a text file word by word:


We could build an index from this. What is the problem?

Parse a line into strings

Here is a program that takes a line and turns it in to worsds using a class call stringstream.


Note that a string stream reads from a string as though it is a string.

Now we can get the line numbers!

Remove punctuation

Our solution has punctuation, which messes up the index. Lets look at code to fix this:


Convert to lower case:

Won't we line "The" and "the" and "THE" to all share the same index? How can we do that?

Create a File Read class

Create a class to open a file, and return the words and lines numbers one at a time with the punctuataion removed, and converted to lower case.

Index program.

Create a program to index the text, and repeatedly ask the user for which word they wish to search for.

Use:" https://repl.it/@JimSkon/IndexBuildShakeStart

A more complete version: https://repl.it/@JimSkon/IndexBuildShakePart

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2019-11-19 - JimSkon
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback