😾 Oskar So, anybody has some data driven project in mind with which he needs collaboration? Mining, analysis, classification, querying etc. Commercial or not.
🤔 David Hehe, based on some of the discussions yesterday around scraping subreply I went and bought metareply.net. It'd be fun to have a community project to play with! If you're interested in working on it here and there let me know :) and that goes for anyone else too!
4y, 18w 5 replies
😾 Oskar Yeah, I can put some work in. There are few problems however. We don't know roadmap for subreply so it very well may be that we will be reinventing the wheel along . It would be great if some parts of it could be open sourced. Anyway, do you have something particular in mind? I am for discussing it openly here, maybe somebody will chip in.
4y, 18w 4 replies
🤔 David I figure it'd be easy to scrape at regular intervals and then use the collected data to update some figures, such as: average posts per hour; content length; active users; URLS linked; active threads.
4y, 18w 3 replies
Login or register your account to reply
😾 Oskar dvaun, you collected anything? I am thinking about topic/reply discovery for subreply. This probably should be internal feature, but nevertheless - setting up crawler on _Search_ every N minutes, collect post, categorize as topic/reply, measure activity (no. replies, time, other metadata), present in some minimal-style dashboard (maybe put on github and just run locally to simplify things?). Thinking about it now, every _scrapable_ tab can provide potentially useful metrics.
4y, 18w 1 reply
🤔 David (RE: subreply.com/ow/hgb) ~ $: I'm not scraping yet--I've been swamped at work. Planning on getting up and running this weekend. As for the crawler scheduling that seems simple enough! I'm happy enough to get a server up and running to host this; I even thought it might be a fun project to experiment with some lower-level server frameworks (the 2 on my mind are actix/rust and drogon/c++). Are you interested in that? And what about publicizing the code?
4y, 18w reply
😇 Jesus Christ yes, you can use 'last seen' to get retention
4y, 18w reply