Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yea, I'd be curious to see exactly what he's doing. I can only guess there is a heuristic which results in a lot of failed feed processing noticed on here (I know it's just a weekend project :)) that doesn't generalize well. Boilerpipe, in my experience, works very well on almost all news/blog type content. Finding the date in the first few sentences and the title are extra heuristics that can be added later.

EDIT: The date and title are in the RSS feed already! No further analysis needed.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: