1 # Proof of Concept: Facebook Scraper 2 3 This tool was built as a proof of concept to 4 demonstrate how easily data can be scraped 5 from Facebook's mobile webapp. 6 7 ## Installation 8 9 ### Dependencies 10 11 * BASH 12 * Perl 13 * POSIX 14 * A Perl DOM library (Mojo::DOM) 15 16 Install Mojo::DOM by running `cpan` in terminal, and executing `install Mojo::DOM;` 17 18 ## Instructions 19 20 Place your cookie body in a file somewhere, maybe /home/You/.fbcookie 21 22 .fbcookie: 23 24 ```Cookie: datr=xxx; fr=xxx; sb=xxx; wd=xxx; c_user=xxx; xs=xxx; ...``` 25 26 You can get your facebook cookies by inspecting the headers on 27 your browser's network tools panel during a request to facebook, or in 28 your browser's settings page (google it for your respective browser). 29 30 Next, specify the crawler's start point as a *+username* (prepend *your.username* with a '+') 31 or an *ID* (this should be a number), by placing it in the `todo` file: 32 33 ```echo +your.username > todo``` 34 35 Finally, run the crawler like so: 36 37 ```sh start.sh path/to/your/.fbcookie``` 38 39 ## Where does the data go? 40 41 Raw html from friends lists goes into `./tmp` 42 43 Empty files named *+username* or *ID* go into `./done` to mark a profile as *scraped*. 44 45 Scheduled profiles to be scraped are added (FIFO style) to `todo`. 46 This file starts out small (just the start point), grows fast, then starts to 47 shrink as you reach the edges of your extended network on facebook (note the 48 crawler doesn't scrape the profiles of non-friends with no friends in common with you). 49 50 `./names` is filled gradually with files named with *+username* or *ID*, containing 51 a single line with that user's name. 52 53 `./friends` is filled with directories for each user whose friends list is scraped. 54 The directories are named *+username* or *ID*, and contain files whose names 55 represent that user's friends. E.g. if I have N friends on facebook `./friends/+my.username` 56 would be populated with N files, the files having the *ID* or *+username* of those 57 friends as their file name (the files are empty). 58 59 ## Cleanup 60 61 ```sh clean.sh``` 62 63 ## Disclaimer 64 65 I wrote this program for it to be appreciated, not to be run. Expect to see some 66 of your facebook activity limited after a few hours running the script, if 67 you do run it, which I don't recommend. 68 69 The source code is 140 lines. If you do run the script, read the source code; 70 common sense works better than code signatures. 71 72 ## Why is the data stored in such a weird format? 73 74 I wrote this set of perl and BASH scripts in an evening over 75 a nice glass of wine. Quick and dirty was my motto. Nowadays I use sqlite for quick setup 76 storage while doing small things like this. At the time of writing these scripts, 77 my brain's JIT learning algorithm had not yet crossed paths with sqlite. 78 79 I used empty files in a folder to model a *set* data structure, non-empty files 80 in a folder to model a *map*, a file with multiple lines (trimmed from the top, enlarged from the bottom) 81 to model a queue, a folder of folders of files is a map of strings to sets, etc.