A text classification program I wrote in my Senior Sophister year in TCD undergrad.
git clone git://seanh.sh/text-classify
Log | Files | Refs | README | LICENSE

README.md (1397B)

      1 # The Makefile
      3 Note: The Makeful fetches the reuters corpus over the
      4 network the first time it is needed, so perhaps read the
      5 Makefile first to see what's going on...  it's a
      6 straightforward wget request.  I did this to reduce the space
      7 this repository takes up.
      9 My solution to assignment 2 can be tested by typing  `make`
     10 in the lab directory.  This will empty previous binary files
     11 from the ./bin folder, recompile the java source, and run it
     12 with some example input arguments.  It is originally set by
     13 me to test the program with InfoGain, category acq
     14 (acquisitions), and agressiveness 10.
     16 I have made several test cases, which you can run with the
     17 commands listed in the make file. For example:
     19 ```bash
     20 make infogain; # Test on a category using information gain.
     21 make docfreq;  # Test on a category using document freq.
     22 make max;      # Test with option _MAX
     23 make sum;      # Test with option _SUM
     24 make wavg;     # Test with option _WAVG
     25 make glob;     # Test with option _GLOB
     26 make catship;  # Test with another category
     27 ```
     29 # The Corpus
     31 The corpus is drawn from the files listed in
     32 ./samplecorpuslist.txt.  To cause the program to execute
     33 faster for testing purposes, simply remove xml files from
     34 this list.
     36 # The Serialized Output
     38 The serialized ProbabilityModel is stored in ./var by the
     39 commands in the Makefile.  This default location can be
     40 modified in the Makefile with ease.