A test indexing program
Text Index
Last time we created a class to create an word index:
https://repl.it/@JimSkon/IndexBuild
Parse
Parsing a document into words
Now we want to build an index of an actual text. In order to do this we need to read all the words from a text, and place them in the index.
Here is a program that reads a text file word by word:
https://repl.it/@JimSkon/IndexPoemStart
We could build an index from this. What is the problem?
Parse a line into strings
Here is a program that takes a line and turns it in to worsds using a class call stringstream.
https://repl.it/@JimSkon/ParseLineIntoString
Note that a string stream reads from a string as though it is a string.
Now we can get the line numbers!
Remove punctuation
Our solution has punctuation, which messes up the index. Lets look at code to fix this:
https://repl.it/@JimSkon/RemovePunctuation
Convert to lower case:
Won't we line "The" and "the" and "THE" to all share the same index? How can we do that?
Create a File Read class
Create a class to open a file, and return the words and lines numbers one at a time with the punctuataion removed, and converted to lower case.
Index program.
Create a program to index the text, and repeatedly ask the user for which word they wish to search for.
Use:"
https://repl.it/@JimSkon/IndexBuildShakeStart
A more complete version:
https://repl.it/@JimSkon/IndexBuildShakePart
Topic revision: r4 - 2019-11-19
- JimSkon