Finding comments quickly in the desktop team blog
Thursday, November 18, 2010 9:12:49 PM
This is just a quick post about how I fairly quickly find comments within the Desktrop Team blog. I guess I am writing it because the subject is on my mind right now since the comments section on our recent blog post about address bar UI changes is rapidly increasing. Admittedly not a massive surprise to those of us working at Opera.
Often with posts like this, where the comments go over 10 pages it can start to be a real hassle to find that elusive remark you knew you had seen. Indeed it can even be hard to find your own comment to refer back to!
My preferred solution will probably not surprise anyone who has read any of my earlier posts where I profess my love of utilities like cURL and Wget. Basically I just use these to pull the information into the terminal so I can process it there. It is fairly easy to whip up a little shell script in a handful of lines to grep through all of the comments pages quickly. Here is one I wrote some time back, in a couple minutes:
#!/bin/sh gs_tempdir=$(mktemp -d -t grepsnap.XXXXXXXX) cd $gs_tempdir wget -q $2 grep startidx * |\ sed -e "s,[html:a href,\n<a href,g" |\ grep startidx |\ grep -v Next |\ sort -u |\ wget -B $2 -qi- --force-html grep -i $1 * rm -r $gs_tempdir %5B/code%5D I have it saved as grepsnap (for "grep snapshot") in %5Bi%5D~/.local/bin%5B/i%5D (I added %5Bi%5D~/.local/bin%5B/i%5D to may $PATH some time back since I always install opera in %5Bi%5D~/.local%5B/i%5D via our new *nix install script). I use it in much the same fashion as I would use grep, simply replacing the %5Bi%5DFILE%5B/i%5D for %5Bi%5DBLOG_POST_URL%5B/i%5D. For example when %5Burl=http://my.opera.com/arghwashier/%5Darghwashier%5B/url%5D %5Burl=http://my.opera.com/desktopteam/blog/show.dml/21937422?startidx=550#comment47132122%5Dasked about the reasoning for switching off Smooth Scrolling on *nix by default%5B/url%5D I knew I had already answered another user and provided them with the Opera Preference setting in case they wanted to switch it back. Since I knew the preference included "SmoothScrolling" I could use this as a key to find my old post without having to manually trawl through the earlier 11 pages of comments. Here is what I issued and what I got back: %5Bcode%5D $ grepsnap smoothscroll http://my.opera.com/desktopteam/blog/2010/11/17/new-and-improved new-and-improved?startidx=400:opera:config#UserPrefs|SmoothScrolling</div]Using Opera to hover the numbered comments links on the blog page I could quickly see in the status bar that startidx=400 is page 9. Sure enough I found my comment there.
Of course once you realise that you you are not limited to searching for keywords on the rendered page, but that even the source is available, you can also do more interesting things. For example, all comments include the following HTML fragment:
<p class="comment-by"><b><a href="/username/">As you can probably guess, username is a given user's username. These means I can easily use this as a base to find all the comments by one user. For example, say I wanted to find all the comments by Daniel Hendrycks within that blog post (given he has an eye for detail). I could do the following:
$ grepsnap "comment-by.*DanielHendrycks" http://my.opera.com/desktopteam/blog/2010/11/17/new-and-improvedAnd I would get back (as of right now):
new-and-improved:<p class="comment-by"><b><a href="/DanielHendrycks/">Daniel Hendrycks</a></b> <span class="comment-date"><a href="/desktopteam/blog/show.dml/21937422#comment47053932" class="permalink">#</a> 17. November 2010, 22:15</span></p> new-and-improved?startidx=100:<p class="comment-by"><b><a href="/DanielHendrycks/">Daniel Hendrycks</a></b> <span class="comment-date"><a href="/desktopteam/blog/show.dml/21937422?startidx=100#comment47058482" class="permalink">#</a> 17. November 2010, 23:31</span></p>What is even nicer is that a permalink follows, so with just a little help from sed and some more regex magic I can easily turn that into a list of links.
Here is how I would generate a list of links to what fearphage has said in this snapshot thus far:
$ grepsnap "comment-by.*fearphage" http://my.opera.com/desktopteam/blog/2010/11/17/new-and-improved | sed -e "s,.*href=\"\(.*comment[0-9]*\).*,http://my.opera.com\1," http://my.opera.com/desktopteam/blog/show.dml/21937422#comment47053572 http://my.opera.com/desktopteam/blog/show.dml/21937422#comment47053922 http://my.opera.com/desktopteam/blog/show.dml/21937422#comment47054062 http://my.opera.com/desktopteam/blog/show.dml/21937422?startidx=100#comment47061952 http://my.opera.com/desktopteam/blog/show.dml/21937422?startidx=450#comment47112672 http://my.opera.com/desktopteam/blog/show.dml/21937422?startidx=450#comment47113192Alternatively, maybe you just want to know how much someone has commented. Havaard and Rijk usually comment a fair bit, so lets compare how many they have each written up until now: Haavard:
$ grepsnap "comment-by.*haavard" http://my.opera.com/desktopteam/blog/2010/11/17/new-and-improved | wc -l 13Rijk:
$ grepsnap "comment-by.*rijk" http://my.opera.com/desktopteam/blog/2010/11/17/new-and-improved | wc -l 32Yep, I think it is safe to say none of the Opera staff are likely to catch up with Rijk in this blog post!
Anyway, I guess that's it. There is certainly a lot more you could do but you get the idea. Also this was supposed to be a short blog post!














FataL # Thursday, November 18, 2010 10:48:05 PM
Ruarí Ødegaardruario # Thursday, November 18, 2010 10:52:31 PM
Wow, people really don't bother reading old comments do they!
Kyle Bakerkyleabaker # Friday, November 19, 2010 12:45:25 AM
d4rkn1ght # Friday, November 19, 2010 1:53:56 AM
Ruarí Ødegaardruario # Friday, November 19, 2010 8:05:56 AM
Originally posted by ruario:
I have now added the right click problem to the known issues so "right click" is effectively mentioned on every page, since the comments also include the full blog post above. Hence my little example check will now return an invalid (inflated) result if anyone attempts to retry it. It would of course be possible to create a more comprehensive "grepsnap" that ignored the lines that were not comments but in this case it is probably easier just to make a mental note of the number of pages and subtract this from the result you get, if you did want to repeat the test.Ruarí Ødegaardruario # Friday, November 19, 2010 8:13:41 AM
Bela Lubkinfilbo # Friday, November 19, 2010 10:35:07 AM
Originally posted by ruario:
If you're only searching for "right click", you're missing "right button", "right mouse click", "context menu" and so on. Probably more like 100 by now.
Haavardhaavard # Friday, November 19, 2010 10:51:22 AM
Originally posted by FataL:
It should go to your recently posted comment, as it does now.Bela Lubkinfilbo # Friday, November 19, 2010 11:19:16 AM
The actual number (when I sampled a few minutes ago) was "only" 68 posts about right-click not working; from 54 different accounts. That includes people replying to confirm or provide the workaround. Method: `lynx -dump` all pages of blog, grep for "right" and "context" as well as " # " for headers. Comb those down further manually, delete consecutive " # " lines, remainder = all posts on the topic.
(I still falsely overcount by including posts that mention the right-click issue + other stuff that's actually worthwhile.)
Bela Lubkinfilbo # Friday, November 19, 2010 11:21:42 AM
Originally posted by haavard:
FataL's request is how it used to work (long ago). Either seems valid, but I don't think it's worth a user preference because, as it works now, you just have to go "Back" to get what FataL wants.
The reverse isn't true: if it didn't land you at your new comment, you would have to sync to end of blog, then search for your item -- very clumsy.
Bela Lubkinfilbo # Friday, November 19, 2010 11:23:25 AM
I type long lines (let the blog server wrap). Double-Enter for a paragraph break. Exactly the same in my "Your Filter" and "FataL's request" posts above.
The first has paragraph breaks where I wanted them; the second doesn't. ??!??!?
Cutting Spoonhellspork # Friday, November 19, 2010 7:02:57 PM
Hm. Something's been on my mind for a while. I'm pretty sure I reported it, but:
New comments notification will teleport the user to the version of myopera used by the first unread post. I've seen this before with geography codes (my.cn.opera.com being the most common), but the recent beta-test would teleport the browser between my.opera.com and my-beta.opera.com, purely on the basis of where the first new comment was posted from.
Since the change, some old posts have popped on my RSS. For example from Cosimo Streppone's blog I had:
http://my.opera.com/cstrep/blog/2010/10/07/surge-2010-scalability-conference-in-baltimore-usa-day-1
http://my.opera.com/cstrep/blog/2010/09/24/survival-guide-to-utf-8
And comments may be broken on some posts.
Bela Lubkinfilbo # Saturday, November 20, 2010 1:59:35 AM
Originally posted by hellspork:
Heh. Bugreported that this Monday --
Originally posted by DSK-318929:
Charles SchlossChas4 # Wednesday, November 24, 2010 3:03:28 PM
Unregistered user # Friday, November 26, 2010 10:31:36 AM
Unregistered user # Saturday, November 27, 2010 12:34:57 PM
Brian L Johnsonbrianlj # Thursday, December 16, 2010 7:52:11 AM
Ruarí Ødegaardruario # Friday, January 28, 2011 12:42:24 PM
wget -qO- http://my.opera.com/desktopteam/blog/ | sed -n "s,.*h2 class=\"title\".*href=\"\(.*\)\".*,http://my.opera.com\1,p" | parallel "wget -qO {/} {} && grep startidx {/} | sed -e 's:<a href:\n<a href:g' | grep startidx | grep -v Next | sort -u | wget -B {} -qi- --force-html"Or how about all the comments from the last 50 blog posts?
seq 0 10 40 | parallel wget -qO- http://my.opera.com/desktopteam/blog/?startidx={} | sed -n "s,.*h2 class=\"title\".*href=\"\(.*\)\".*,http://my.opera.com\1,p" | parallel "wget -qO {/} {} && grep startidx {/} | sed -e 's:<a href:\n<a href:g' | grep startidx | grep -v Next | sort -u | wget -nc -B {} -qi- --force-html"Cutting Spoonhellspork # Friday, January 28, 2011 7:58:09 PM