Desultory Monday... · Nifty tidbits!

This entry was posted using Its all text on Firefox 3.0 RC2 on Ubuntu Hardy heron, with emacs 23 snapshot as the editor. I love it :-)

Well, Its all Text is great if you hate typing into webforms with textboxes that make editing such a big pain in the butt.

Its great to see that Its All text has been updated to work with FF 3.0 now. The fun would be to see if this works on Windows with cygwin emacs as the editor. Had problems the last time I tried that - but that’s been sometime ago now.

Today’s been a desultory Monday. Spent sometime getting emacs snapshot with pretty fonts on my hardy. Its beautiful.

The next thing has been mostly scratching my head on hadoop. What I’d like to do is parse an access log and generate multiple outputs - ie single input of gobs of web access logs and multiple outputs - with say requests by country, popular pages, % of client browser and so on.

parse web log
pull out remote ips and use geo ips to find the originating country
pull out user agent field and figure out browser distribution.
Filter the requested resource and pull out only pages - find pages by popularity

Now there seem to be quite a number of ways of doing this -

Code the whole thing in Java - and this is where I’m getting into analysis paralysis. Look at ways to generate multiple outputs from MapRed and then use Job and JobControl to setup the pipeline.
Use Pig - Pig examples on the Pig overview page seem to suggest that this should be trivial with Pig.
Use Cascading - seems to be doing the same thing - will need to do this in JRuby or Groovy though.

Will post an update once I get through the java route

Raghu Rajagopalan's ramblings

Latest Posts

Popular Tags

Tips Linux Programming Tools Utilities Howto Vim Rant Troubleshooting Android Java Very cool Python Asp.net Firefox Javascript Kubernetes Web .net Blogging Homelab Shell Unit testing Webservices Xml Cloud General Git Ubuntu Blogger Code Easyblogger Mobile Windows Xbmc .netcore Cygwin Datastructures Devops Ide Kde Logging Microservices Perl Photography Random rambling Raspberrypi Security Sysadmin Virtualbox Vpn Agile Architecture Asciidoc Azure Development Docker Fun Gnome Hardware Hugo Lego mindstorms ev3 Markdown Neovim Networking News Sharepoint Visualbasic Wireguard Wordpress.com