cleanPages Extension - an arc90 Readability conversion
Wednesday, January 19, 2011 11:38:59 AM
For discussions about cleanPages v 1.5 please visit the new service page
cleanPages is NOT an adblocker or scriptblocker; it cleans pages for reading or printing after they have been loaded.
Version: 1.0.2
Download from the addons page: cleanPages
Supported Languages: English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
Latest test version:
1.5.54
Warning: May be unstable!
Date: 2012-07-10 17:23 GMT+2
Download from my private server: cleanPages.oex
A warning will be displayed, you'll have to trust me

Snapshot users:
Please add http://quhno.internetstrahlen.de to the trusted repositories.
Menu -> Settings -> Preferences -> Advanced -> Security -> Trusted Web sites (Trusted Repositories Tab)
Known issue:
In the latest Opera snapshots sometimes the settings will be lost after an Opera restart. Not my fault, other extensions suffer under the same issue, it is an Opera bug (CORE-47777).
Changed in the Alpha (latest changes on top):
- kill setInterval() for merged pages too
- prevent set click event listeners on body or documentElement from bleeding into the cleaned page
- removed independent setting for line-height because of
DSK-344053 - kill scripts that are started by setInterval() too
- cleaned upt settings page
- Some minor fixes for hidden contents - now they should stay hidden
- Fix for headlines: Big justified text looked ugly, changed to pure right or left align. Known problem: Wrong align on RTL (Arabic or Hebrew) headlines and text align justify or center, please select text-align right when reading these.
- Fix for content images when screen is very narrow. restricted maximum width to text width.
- Experimental change of the content image detection heuristics
- Deleting non displayed content, i.e. content with display:none, visibility:hidden, opacity:0
- Added Italian translation
- "edit" links in most Wikis will be removed
- Added Turkish translation
- Switched off pagination for sciencesetavenir.fr
- Squashed bug where too long entries in the color input fields failed to update the preview
- Prepared internationalization of the preferences page
- Added Russian help
- Added help localization structure and German help page
- Extension's button can be hidden (see help page)
- the 3 tool buttons top left in the page are now hidden by default.
- Added Help page
- Changed the way the CSS is applied, should be more robust now.
- Extension resets itself to default values after a fresh install.
- added setting for uncolorized black background around the cleaned text.
- added some options to the preferences page: show images, show vimeo and youtube videos, merge paginated pages
- Workaround for Opera CORE-23171
- Options styled and some minor changes
- Mouse gesture support - (see help file)
- Faster reload
- Added Ctrl+Shift+R as keyboard shortcut to start the extension and to reload the original page
- New icon
- Extended font support. Detects installed fonts from a list of 509 of the most common installed fonts on your computer.
- Added autoscrolling feature, watch the upper right corner.
- New settings page
- Bugfix: Additional footnote anchors in text if the button was clicked more than once. (see details)
- Improved the next page detection but there are still some quirks left (at least I hope id didn't break it too much)
- Bugfix: elements styled by <u><b> were removed including their contents.
- Improved duplicate pages detection on multi page articles - should work now correctly with my.opera blog articles with more than one comment page too.
- changed width setting to fixed values and a percentile maximum width to avoid horizontal scroll bars if set too wide if the window width is changed afterwards.
- experimental fix for H2s abused as intro
Known issues of the Alpha
line height changes are only applied after window size change or setting of font size or spacing in 11.50+. Not my fault, reported as Opera Bug DSK-344053No internationalization for the help page- Opera 12 sometimes doesn't want to clean the page again on pressing the extension's button after a previous cleaning attempt on the same page. They broke it. Use CTRL+SHIFT+R instead, that works reliably.
- Several more issues
Please post major errors you encounter in the basic functionality here in the blog comments. Thank You!
Usage
If the extension's button is active, you can click on it to change the layout of the active tab's content - or you can select some text (300+ characters) and click the button to make that text readable. If you selected too little text, cleanPages switches back to the default mode and tries to find the relevant content on its own.
cleanPages shows 3 buttons on the cleaned webpage:
- Reload: It has basically the same behavior as the normal reload button in the browser. It is a true reload except when used on frame sites, then the history is used to go back to the same subframes as before. (Read about History Navigation Mode quirks below)
- Print: Opens the Print dialog to print the cleaned page. Text will be black, backgrounds will be white, the buttons will be not printed.
- Email: Opens the default email client on your system with the page's URL as body text. Feel free to edit subject and body text to something more meaningful than the included default text.
Preferences | Options
To set up the extension's preferences:
- Right-click the button of the extension
- Choose "Preferences"
- On the preferences page, change the settings in each column at least once and tick or untick the "... footnotes" checkbox. This makes the settings permanent as long as the extension is installed (only necessary after a new install, later you can change each setting individually).
You can see a preview with sample text in the "Example" box below the settings. The settings can be changed again any time later by re-opening the "Preferences".
Supported Languages
cleanPages comes in:
English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
The language is set according to your browser language settings and defaults to English for languages not yet supported. The functionallity of cleanPages is independend from languages, one of my test users reported, that it works just fine on Japanese pages.
Please send me a personal message or leave a comment here, if you can and want to translate it into your language.
Changes to the Original Readability™
- Added multilingual preferences and user dialogs.
- Fixed some frame issues. Overwriting or replacing the body of the top document in a frameset is not allowed in Opera because of security restrictions.
- Removed included Typekit fonts. I have no license to use them and I don't intend to buy one.
- Removed original JS smooth scrolling. Operas built in is good enough. Use [space] to scroll down a page and [shift]+[space] to scroll up a page.
- Reactivated the Terminal style.
- Removed the Athleas style.
- Improved the font stacks for cross system use.
- Removed bad browser sniffing because Opera can mask as IE. That wouldn't have worked out.
- Removed or replaced Firefox-only code. (read: Firefox only Bug workarounds for not following the W3C specifications
)
Various other fixes, see source code of the included script. All changes are marked with /*q ... */
Known Issues
The description is not multilingual. Not my fault, kick Opera for that, especially the person who wrote the parser that checks the config.xml during the publishing process for validity. It doesn't even respect their own specifications.

cleanPages, like the original "Arc90 Readability™" bookmarklet, does not work well with:
- Start pages of a website. Navigate to an article page before you use the extension. I will not change that, my version of cleanPages should stay a small extension with a low system impact. If you think otherwise: feel free to edit it, it is licensed under Apache 2.0
- Pages with not enough text to analyze. Not possible. No way.
- Pages with crappy markup. It will do the best it can.
- Some kinds of frameset pages. However frame pages without forced frame reload should work fine.
- Pages that are reloaded with Unser Prefs|History Navigation Mode set to "Auto" (1, default) or "Fast" (3). It works better when set to "Compatible" (2).
To switch between all 3 settings you can use this button:
History Navigation Mode
Further known issues: I hope not

cleanPages comes AS IS, meaning:
I won't fix mistakes that other people made on their websites. If it works, it works. If not and if it is my fault, leave a comment below.
If you find any real bugs, please post them in the comments, too.
If it destroys your hard-disc and melts your processor: Buy a new computer

Legal Stuff
"Readability™" is a Trademark of Arc90, http://arc90.com
Permission to use the code was granted by license and email.
Outdated.
For discussions about cleanPages v 1.5 please visit the new service page







« Previous 1 2 3 4 5 6 7 8 9
Unrealmirakulix # Saturday, June 23, 2012 2:56:54 PM
Unrealmirakulix # Sunday, June 24, 2012 1:00:28 PM
Wenn der Browser maximiert ist, mag das wenig Einsatzmöglichkeiten finden, aber am Laptop mit Opera und z.B. Word nebeneinander sehr
QuHno # Sunday, June 24, 2012 2:02:33 PM
Unrealmirakulix # Sunday, June 24, 2012 2:29:59 PM
Ght ja um das Endergebnis, das sichtbar ist. Was der Code treibt, ist ja sekundär
QuHno # Sunday, June 24, 2012 2:56:12 PM
- maximum width of content images restricted to width of text
- some minor bugfixes
Unrealmirakulix # Sunday, June 24, 2012 3:03:21 PM
QuHno # Sunday, June 24, 2012 6:24:23 PM
- headline align fix. Changed to pure right or left align because justify looked ugly. Small problems with RTL texts, please select text align right when reading these.
- fix for some hidden content that showed up because the to-kill element was changed to a normal paragraph before that kind of content was removed. It should stay hidden now.
Unrealmirakulix # Sunday, June 24, 2012 7:23:41 PM
TiRANiDTiRaNiD # Sunday, June 24, 2012 7:27:50 PM
Sometimes if I don't clean the cache I can't get the newest version 'cause Opera takes the file from the cache.
One more thing, do you have a touch pad?
QuHno # Sunday, June 24, 2012 7:39:08 PM
Originally posted by TiRaNiD:
Can't help you there, that's Operas sometimes crazy caching behaviorMay be if you open a private tab and copy the DL-URL there?
Originally posted by TiRaNiD:
Touchpad like the ones on notebooks?Yes, as soon as I can pry the notebook from my wife
Why?
Unrealmirakulix # Sunday, June 24, 2012 7:55:42 PM
QuHno # Sunday, June 24, 2012 8:02:56 PM
TiRANiDTiRaNiD # Sunday, June 24, 2012 8:03:59 PM
Originally posted by QuHno:
Just wanted to share some interesting info I discovered.
I have Sony Vaio with a touch pad (Synaptics), and I noticed that 'swipe right' is equivalent to ALT + RIGHT ARROW, and 'swipe left' - to ALT + LEFT ARROW.
So I became interested in how I can use it, and I came up with the idea that I can launch cleanPages or save pages in TabVault by binding their 'launch gestures' to the keyboard shortcuts that have 'alt+right/left arrow' as a part of a shortcut.
Long story short now I can save pages in TabVault by pressing 'ctrl' and making 'swipe right' gesture on a touch pad, and launch cleanPages by pressing 'ctrl' and making 'swipe left' gesture.
Also now I can launch the Magic Wand by pressing 'shift' and making 'swipe right' gesture, and so on, and so on.
Just wanted to share this information in case you and maybe someone else don't know about it. This proves to be pretty convenient and, you know, kind of cool to see the swipe gestures do all this stuff.
There are more actions that you can bind to these keyboard shortcuts with 'alt+right/left arrow', until you run out of modifiers (ctrl, shift, ctrl+shift), of course. Have fun!
Unrealmirakulix # Sunday, June 24, 2012 8:07:57 PM
Originally posted by QuHno:
strange... Restart of Opera fixed it
Ruzzz # Wednesday, June 27, 2012 5:05:59 AM
QuHno # Wednesday, June 27, 2012 5:59:12 AM
May be a normal button made from the bookmark and put to a toolbar of your liking would be a better solution?
Save generated <- drag and drop this Button
May be you need to change the length of the delay before the Save dialog opens, that depends on the speed of the computer and the complexity of the page.
Buttonator code for easier button code changes
btw: There seems to be a small problem when using this button and saving as MHT but it seems to work fine when saving as HTML or HTML with images.
I personally use this solution for saving the page in its actual state but some people have problems with that too.
Unrealmirakulix # Wednesday, June 27, 2012 9:07:27 AM
tested a few times (Opera restart (cleared cache); Win restart) @ W7x64 Opera 12 final x64
QuHno # Wednesday, June 27, 2012 10:59:44 AM
I activated videos and images, cleaned the page and right clicked on the flash - and the flash settings dialog opened. Seems to be OK ...
Unrealmirakulix # Wednesday, June 27, 2012 11:20:51 AM
QuHno # Wednesday, June 27, 2012 3:07:45 PM
Unrealmirakulix # Wednesday, June 27, 2012 3:33:56 PM
Muss vielleicht schon wieder einen clean install machen ;(
QuHno # Wednesday, June 27, 2012 6:12:25 PM
Unrealmirakulix # Wednesday, June 27, 2012 7:30:16 PM
QuHno # Saturday, July 7, 2012 10:04:22 AM
Changes:
- I had to remove the independent setting for line-height and set it following the "golden ratio" depending on font-size and content width instead, meaning:
The line height will increase with the font-size and with the width of the content following the rules defined in this article on pearsonified.com.
test case
Specification violation
What was the reason again, why we should file bugs to the bug tracker?
- killing scripts that are started by setInterval()
This is often used by tracking scripts and couldn't be killed the usual way when cleaning because it is not visible in the DOM, so some page constantly load and sent contents even after cleaning the DOM.
- some changes in the settings page
Known issue with Opera 12.xx:
- Clicking the extension's button works only twice for the same tab, this got even worse in 12.50. Reason: Opera forgets that there is a tab as soon as an injected script reloades the whole page.
Not my fault, blame Opera.
clicktest.oex <- Demo extension, works in Opera 12 only. No real function besides showing the bug.
Solution for cleanPages:
Use the Keyboard shortcut Ctrl+Shift+R or the mouse gesture instead (See included help page for a how-to on setting up mouse gesture support for the extension).
QuHno # Tuesday, July 10, 2012 3:29:02 PM
Changes:
- delete click events that were added to document.body or document.documentElement to avoid those pesky "click anywhere outside of the content to open ADVERTISEMENT!" scripts from bleeding into the cleaned content.
- moved the timeout killing routine to the general script killing so that it is called for merged pages too.
Unrealmirakulix # Tuesday, July 10, 2012 3:36:01 PM
QuHno # Tuesday, July 10, 2012 3:49:29 PM
... at least not without increasing the scrolling speed to more than 10px per second. It is less jumpy with 20px/s but then it scrolls too fast for reading if you don't set the width to a ridiculously small value.
Unrealmirakulix # Tuesday, July 10, 2012 6:05:08 PM
QuHno # Wednesday, July 11, 2012 10:51:53 AM
javascript:(function(){var d=0;setInterval(function(){d=d+1;document.body.style.OTransitionDuration='0.1s';document.body.style.OTransform='translate(0,'+d+'px)';},100)})();You can change the d+1 to a higher value for bigger steps, but it only gets worse despite using a CSS3 transformation to achieve the scrolling effect.
Yes, I know that the scrollbars don't change their position and that the only way to stop scrolling is to reload the page or to close the tab. It is just meant as a simple example.
As far as I can see there is no decent way to build a jump or flicker free autoscroller with JS, the way I did it in the extension seems to be the best so far.
s33s # Sunday, July 15, 2012 10:05:48 PM
this article, using cleanpages does not produce a readable page, the latest version will have only the first two paragraphs, earlier versions will have the first paragraph repeatably.
however the official bookmarklet from readability can do it, and I noticed that they preserve author and date now.
QuHno # Monday, July 16, 2012 5:19:53 AM
Basically the markup of the content is the markup of an image gallery with extra long comments and no real article (meant HTML semantically). For example the big red "headlines" in the "article" are no headlines at all, the main content is packed into LI elements, there are several wild grid layout DIVs etc.
Additionally The Verge optimized the pages for use with Instapaper. Instapaper and Readability work together, and definitely use the extra markup too - but that might actually be quite good, if I can manage to utilize that by myself.
Some words about comparisons between the bookmarklet and cleanPages:
Readability's bookmarklet doesn't do anything apart from preparing the page for uploading it to Readabilities server if they don't have a cached version already. It is all computed on the sever which has by far more possibilities to clean up a page than a JavaScript based stand alone extension. On the server they have a database with common templates of the top some hundred article pages respectively content management systems to get a clue where the content might hide. All of that can not be mimicked by an extension if that extension should have a finite response time and not slow down the browser to a crawl.
s33s # Monday, July 16, 2012 2:05:18 PM
QuHno # Monday, July 16, 2012 5:54:51 PM
I wish the people of Evernote would convert Clearly to Opera - the content detection would work, there is even code for Opera in the base detection script, and it is local cleaning too. Their content detection mechanism follows another way and it is a little bit more flexible with web pages like The Verge. They'd just need to adapt the button code and the options page code for Opera, the rest would work out-of-the box. Too sad that it is not PD or CC ...
Unrealmirakulix # Monday, July 23, 2012 9:47:43 AM
site end seems to be detected incorrectly
or just
big empty box at the end after complete content area
QuHno # Monday, July 23, 2012 10:58:01 AM
I am thinking about using this kind of content instead:
http://www.zeit.de/kultur/musik/2012-07/nikitin-bayreuth-kommentar/komplettansicht?print=true
... but I can't do it yet - there is a big
As soon as the content of a tab is reloaded completely from within an extension, the extension button stops working.
I hate it when they break things that used to work fine in previous versions.
btw: Another Opera user wrote a nice UserJS called GotoPrintURL - may be I'll ask her if I may integrate it, in the meantime you can use it as stand alone.
Hint: Look into the sourcecode
Unrealmirakulix # Monday, July 23, 2012 1:28:10 PM
Licaon # Saturday, July 28, 2012 8:37:58 PM
Enabling "Autoplay: Automatically dim the background when you click the play button and versus. ( HTML5 & YouTube video)" will mess the Clean Pages ( latest: http://my.opera.com/QuHno/blog/cleanpages-extension-an-arc90-readability-conversion or stable: https://addons.opera.com/en/extensions/details/cleanpages/ ) output.
Eg:
Normal Page: http://arstechnica.com/information-technology/2010/12/dragon-dictate-20-for-mac-the-ars-review/2/
CleanPage+AutoplayDimDISABLED: http://i.imgur.com/PtZPJ.jpg
CleanPage+AutoplayDimENABLED: http://i.imgur.com/RdUa2.jpg
Screenshots shot on latest Opera beta ( this happened for a long time so it's not a beta issue AFAICS ) with only 2 active Extensions and without any user JS scripts.
Author of TurnOffThelights said: "I received the same issue here, with this log: [28/07/2012 11:19:42] CSS - 'utf-8';/* Document */body{font-size:10px;line-height:1;}#readTools a,a.rdbTK-powered span{background-color:transparent!important;background-.....
The autoplay feature only search for html5 or youtube video. (to detect play or pause status). No CSS will be added."
Any ideas? 10q
QuHno # Tuesday, July 31, 2012 10:10:55 AM
The main problem ist, that cleanPages works destructive to the content as it works in a very similar way as the original Radability script.
I assume that the author of Turn off the lights directly overwites the CSS with JS (he must do some overwriting or adding content because otherwise he couldn't dim the background
I'll look into it as soon as I have some time to spare (heavy work load at the moment) but I can't promise a solution.
edit: I just read you posted a bug report to Stefan on the Google page? If yes: Which bug ID?
edit 2: Outch! The whole runtime environment of Opera gets problems as you can see in this screenshot.
I fear that this could be an Opera problem (aka
edit 3: I think I have found what causes it in light.js of the TOtL extension:
document.addEventListener("DOMNodeInserted", injectcode, false); function injectcode(){ var messagediv = $('ytCinemaMessage'); if(messagediv) {} else { // injected code messaging var bodyinject = document.getElementsByTagName("body")[0], message = document.createElement("div"); message.setAttribute("id", "ytCinemaMessage"); message.style.display = "none"; bodyinject.appendChild(message); $(message.id).attachEvent(message.id, function () { var eventData = $(message.id).innerText; trigger(eventData); }); } }This means that at the moment when something is added to the page TOtL tries to insert a DIV element if it is set to autoplay. My extension adds and removes elements and such the event fires over and over again (nothing I can do against) and such triggers the event. Each time the DIV is inserted by TOtL, my cleanPages removes it because it doesn't belong to the main content and - you get it - TOtL re-inserts it.
Conclusion:
Classic feedback loop running wild.
Solution:
Switch off autoplay in TOtL when using CP
Stefan Van Dammestefanvd # Wednesday, August 1, 2012 9:52:17 AM
I create a public issue report:
http://code.google.com/p/turnoffthelights/issues/detail?id=168
Thanks for checking the possible error. I will try to rewrite that "AutoPlay" feature. Just star that issue to be up to date about this issue.
Kind Regards,
Stefan
I need a namequangltm # Monday, September 17, 2012 1:37:10 PM
QuHno # Monday, September 17, 2012 8:41:04 PM
BTW:
Last call.
Unless there is no severe bug in 1.5.54, I plan to submit that version to the extensions catalog, so that I can start scripting for version 2
@Stefan Van Damme:
I saw that you recently changed your extension and yours and mine now work work just fine. I couldn't have done it although it might have been possible to change my extension.
Thank you for resolving the problem.
Unregistered user # Tuesday, September 18, 2012 2:12:46 PM
QuHno # Tuesday, September 18, 2012 4:10:26 PM
I already pestered some Opera devs about the MHT file "issue" - but no chance that the security restrictions applied to them will be loosened in the near future.
I need a namequangltm # Wednesday, September 19, 2012 12:24:57 PM
Day - night mode:
6am-6pm: I want black texts with white background (can be custom by users)
6pm-6am: I want white texts with black background (can be custom by users)
QuHno # Wednesday, September 19, 2012 2:07:56 PM
QuHno # Wednesday, September 19, 2012 4:05:27 PM
cleanPages v. 1.5.54 submitted to the addons page and awaiting the review.
I need a namequangltm # Thursday, September 20, 2012 4:48:48 AM
People can select 2 mode manually through Template tab (slide tabs)
(Red text for red areas)
QuHno # Thursday, September 20, 2012 12:40:52 PM
Yes, I'd really like to put all that stuff directly on the web-page, but as I wrote before, I am no good JavaScript programmer, it will take me a long time to find out how to do that.
The top thing on the to-do is now to find a way, how I can leave the original page untouched and just put the cleaned content into a kind of overlay (I still don't understand how to do that correctly) then to find a way how to send changed settings to the background script for storage and last, but not least, how to load the settings again as soon as the overlay with the cleaned page content blends in ...
If I manage to do that, putting the settings into a side or top bar would be way easier.
Despite having a wide screen monitor I would prefer a top bar btw, because with little changes the extension works on mobile phones too and the average mobile phone has a very limited space for putting it to the side ...
QuHno # Thursday, September 20, 2012 12:51:55 PM
cleanPages v. 1.5.54 is approved and published in [https://addons.opera.com/extensions/details/cleanpages/]Opera Add-ons[/url].
The new service page is here.
Help needed: I am looking for translators for the incomplete translations and additional languages