😾 Oskar So, anybody has some data driven project in mind with which he needs collaboration? Mining, analysis, classification, querying etc. Commercial or not.
🤔✍️ David Hehe, based on some of the discussions yesterday around scraping subreply I went and bought metareply.net. It'd be fun to have a community project to play with! If you're interested in working on it here and there let me know :) and that goes for anyone else too!
😾 Oskar Yeah, I can put some work in. There are few problems however. We don't know roadmap for subreply so it very well may be that we will be reinventing the wheel along . It would be great if some parts of it could be open sourced. Anyway, do you have something particular in mind? I am for discussing it openly here, maybe somebody will chip in.
Login or register to reply
🤔✍️ David I figure it'd be easy to scrape at regular intervals and then use the collected data to update some figures, such as: average posts per hour; content length; active users; URLS linked; active threads.
😇🍷 Jesus Christ yes, you can use 'last seen' to get retention
😾 Oskar dvaun, you collected anything? I am thinking about topic/reply discovery for subreply. This probably should be internal feature, but nevertheless - setting up crawler on _Search_ every N minutes, collect post, categorize as topic/reply, measure activity (no. replies, time, other metadata), present in some minimal-style dashboard (maybe put on github and just run locally to simplify things?). Thinking about it now, every _scrapable_ tab can provide potentially useful metrics.