cleanPages Extension - an arc90 Readability conversion
Wednesday, January 19, 2011 11:38:59 AM
For discussions about cleanPages v 1.5 please visit the new service page
cleanPages is NOT an adblocker or scriptblocker; it cleans pages for reading or printing after they have been loaded.
Version: 1.0.2
Download from the addons page: cleanPages
Supported Languages: English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
Latest test version:
1.5.54
Warning: May be unstable!
Date: 2012-07-10 17:23 GMT+2
Download from my private server: cleanPages.oex
A warning will be displayed, you'll have to trust me

Snapshot users:
Please add http://quhno.internetstrahlen.de to the trusted repositories.
Menu -> Settings -> Preferences -> Advanced -> Security -> Trusted Web sites (Trusted Repositories Tab)
Known issue:
In the latest Opera snapshots sometimes the settings will be lost after an Opera restart. Not my fault, other extensions suffer under the same issue, it is an Opera bug (CORE-47777).
Changed in the Alpha (latest changes on top):
- kill setInterval() for merged pages too
- prevent set click event listeners on body or documentElement from bleeding into the cleaned page
- removed independent setting for line-height because of
DSK-344053 - kill scripts that are started by setInterval() too
- cleaned upt settings page
- Some minor fixes for hidden contents - now they should stay hidden
- Fix for headlines: Big justified text looked ugly, changed to pure right or left align. Known problem: Wrong align on RTL (Arabic or Hebrew) headlines and text align justify or center, please select text-align right when reading these.
- Fix for content images when screen is very narrow. restricted maximum width to text width.
- Experimental change of the content image detection heuristics
- Deleting non displayed content, i.e. content with display:none, visibility:hidden, opacity:0
- Added Italian translation
- "edit" links in most Wikis will be removed
- Added Turkish translation
- Switched off pagination for sciencesetavenir.fr
- Squashed bug where too long entries in the color input fields failed to update the preview
- Prepared internationalization of the preferences page
- Added Russian help
- Added help localization structure and German help page
- Extension's button can be hidden (see help page)
- the 3 tool buttons top left in the page are now hidden by default.
- Added Help page
- Changed the way the CSS is applied, should be more robust now.
- Extension resets itself to default values after a fresh install.
- added setting for uncolorized black background around the cleaned text.
- added some options to the preferences page: show images, show vimeo and youtube videos, merge paginated pages
- Workaround for Opera CORE-23171
- Options styled and some minor changes
- Mouse gesture support - (see help file)
- Faster reload
- Added Ctrl+Shift+R as keyboard shortcut to start the extension and to reload the original page
- New icon
- Extended font support. Detects installed fonts from a list of 509 of the most common installed fonts on your computer.
- Added autoscrolling feature, watch the upper right corner.
- New settings page
- Bugfix: Additional footnote anchors in text if the button was clicked more than once. (see details)
- Improved the next page detection but there are still some quirks left (at least I hope id didn't break it too much)
- Bugfix: elements styled by <u><b> were removed including their contents.
- Improved duplicate pages detection on multi page articles - should work now correctly with my.opera blog articles with more than one comment page too.
- changed width setting to fixed values and a percentile maximum width to avoid horizontal scroll bars if set too wide if the window width is changed afterwards.
- experimental fix for H2s abused as intro
Known issues of the Alpha
line height changes are only applied after window size change or setting of font size or spacing in 11.50+. Not my fault, reported as Opera Bug DSK-344053No internationalization for the help page- Opera 12 sometimes doesn't want to clean the page again on pressing the extension's button after a previous cleaning attempt on the same page. They broke it. Use CTRL+SHIFT+R instead, that works reliably.
- Several more issues
Please post major errors you encounter in the basic functionality here in the blog comments. Thank You!
Usage
If the extension's button is active, you can click on it to change the layout of the active tab's content - or you can select some text (300+ characters) and click the button to make that text readable. If you selected too little text, cleanPages switches back to the default mode and tries to find the relevant content on its own.
cleanPages shows 3 buttons on the cleaned webpage:
- Reload: It has basically the same behavior as the normal reload button in the browser. It is a true reload except when used on frame sites, then the history is used to go back to the same subframes as before. (Read about History Navigation Mode quirks below)
- Print: Opens the Print dialog to print the cleaned page. Text will be black, backgrounds will be white, the buttons will be not printed.
- Email: Opens the default email client on your system with the page's URL as body text. Feel free to edit subject and body text to something more meaningful than the included default text.
Preferences | Options
To set up the extension's preferences:
- Right-click the button of the extension
- Choose "Preferences"
- On the preferences page, change the settings in each column at least once and tick or untick the "... footnotes" checkbox. This makes the settings permanent as long as the extension is installed (only necessary after a new install, later you can change each setting individually).
You can see a preview with sample text in the "Example" box below the settings. The settings can be changed again any time later by re-opening the "Preferences".
Supported Languages
cleanPages comes in:
English, French, German, Italian, Polish, Portuguese, Russian, Turkish and Swedish.
The language is set according to your browser language settings and defaults to English for languages not yet supported. The functionallity of cleanPages is independend from languages, one of my test users reported, that it works just fine on Japanese pages.
Please send me a personal message or leave a comment here, if you can and want to translate it into your language.
Changes to the Original Readability™
- Added multilingual preferences and user dialogs.
- Fixed some frame issues. Overwriting or replacing the body of the top document in a frameset is not allowed in Opera because of security restrictions.
- Removed included Typekit fonts. I have no license to use them and I don't intend to buy one.
- Removed original JS smooth scrolling. Operas built in is good enough. Use [space] to scroll down a page and [shift]+[space] to scroll up a page.
- Reactivated the Terminal style.
- Removed the Athleas style.
- Improved the font stacks for cross system use.
- Removed bad browser sniffing because Opera can mask as IE. That wouldn't have worked out.
- Removed or replaced Firefox-only code. (read: Firefox only Bug workarounds for not following the W3C specifications
)
Various other fixes, see source code of the included script. All changes are marked with /*q ... */
Known Issues
The description is not multilingual. Not my fault, kick Opera for that, especially the person who wrote the parser that checks the config.xml during the publishing process for validity. It doesn't even respect their own specifications.

cleanPages, like the original "Arc90 Readability™" bookmarklet, does not work well with:
- Start pages of a website. Navigate to an article page before you use the extension. I will not change that, my version of cleanPages should stay a small extension with a low system impact. If you think otherwise: feel free to edit it, it is licensed under Apache 2.0
- Pages with not enough text to analyze. Not possible. No way.
- Pages with crappy markup. It will do the best it can.
- Some kinds of frameset pages. However frame pages without forced frame reload should work fine.
- Pages that are reloaded with Unser Prefs|History Navigation Mode set to "Auto" (1, default) or "Fast" (3). It works better when set to "Compatible" (2).
To switch between all 3 settings you can use this button:
History Navigation Mode
Further known issues: I hope not

cleanPages comes AS IS, meaning:
I won't fix mistakes that other people made on their websites. If it works, it works. If not and if it is my fault, leave a comment below.
If you find any real bugs, please post them in the comments, too.
If it destroys your hard-disc and melts your processor: Buy a new computer

Legal Stuff
"Readability™" is a Trademark of Arc90, http://arc90.com
Permission to use the code was granted by license and email.
Outdated.
For discussions about cleanPages v 1.5 please visit the new service page







« Previous 1 2 3 4 5 6 7 8 9 Next »
Unrealmirakulix # Tuesday, March 13, 2012 4:04:55 PM
seems to have problems with italic...
QuHno # Tuesday, March 13, 2012 4:20:19 PM
Unrealmirakulix # Tuesday, March 13, 2012 4:27:09 PM
Originally posted by QuHno:
ok, thx for the quick answer. -> Arial Unicode MS works perfect
PS: down arrow in options (font) doesn't work. up arrow working. catching slider via mouse is ok, too (hope you know what I mean...
QuHno # Tuesday, March 13, 2012 7:01:34 PM
Originally posted by Unrealmirakulix:
Frutiger auch
Originally posted by Unrealmirakulix:
Falls nicht, darfst Du es auch gerne in einer anderen Sprache schreiben
Also ich kann mit Shift und Pfeil in der Fontauswahl durch alle Fonts wechseln (müsste mal zählen, wie viele er von meinen System erkannt hat) hin und her rasen und mit Tab zu den anderen Einstellungen wechseln und dort mit den Pfeiltasten alles einstellen usw. ...
... in so fern weiß ich nicht was Du meinst.
Unrealmirakulix # Tuesday, March 13, 2012 10:48:31 PM
Originally posted by QuHno:
ich meinte, dass das Scrollen geht, das verschieben des "Reglers" auch, aber nur der Pfeil nach oben anklickbar ist. Sehr komisch...DitherDitherSky # Thursday, March 15, 2012 10:35:49 AM
Unregistered user # Monday, April 9, 2012 9:03:35 PM
TiRANiDTiRaNiD # Monday, April 9, 2012 9:53:57 PM
My post about it here:
http://my.opera.com/community/forums/topic.dml?id=1336932
TiRANiDTiRaNiD # Friday, April 13, 2012 8:16:54 PM
This interesting discovery was made by chocimir, and probably there can be a way of adding some code to CleanPages to make it process MHT pages without the additional actions that I found out about with the help of chocimir.
Have a look here - http://my.opera.com/community/forums/topic.dml?id=1336932&t=1334348096&page=1#comment12059052
QuHno # Friday, April 13, 2012 9:27:08 PM
Before hacking into MHT files (which is a little bit beyond my knowledge, I have no clue how to realize that) I'd prefer the following approach:
clean it first and then save the cleaned content as MHT or HTML or whatever for example with the button I've made.
QuHno # Friday, April 20, 2012 12:33:45 PM
Some minor bugfixes and a new icon.
PS: I still forgot to remove the buttons in the page and to set the footnotes to the full URL. Will fix it with the next version.
Unrealmirakulix # Friday, April 20, 2012 12:40:39 PM
Chocimierchocimir # Friday, April 20, 2012 2:39:42 PM
QuHno # Friday, April 20, 2012 3:21:30 PM
I only put it into the version for the upcoming Opera12 Mobile with Extension support and forgot to backport before packing it ...
QuHno # Friday, April 20, 2012 11:03:23 PM
document.body testing added
the display of the 3 tool buttons can be switched off now.
pure SVG files will not be parsed any more (I hope).
I hope I did not build in too many new errors, I am ill with fever and my brain is mush, sorry ...
Chocimierchocimir # Saturday, April 21, 2012 10:02:15 AM
Unrealmirakulix # Saturday, April 21, 2012 10:14:02 AM
QuHno # Sunday, April 22, 2012 12:27:07 AM
Because I am a bad patient who doesn't want to stay in bed when not tired, I fixed some more problems:
- Scrolling down in the font list by clicking on the down arrow button at the slider should work now.
- Full URLs (minus query and hash) in the footnotes
- The help section got some content
- The font detect had to be improved because there was an undocumented change in Opera 12's behavior and such some of the main fonts were not detected.
The new version is 1.5.26, download as usual from my server.
I still forgot something: The full addresses in the footnotes for printing ...
I think I should make a list with all reported wishes and errors
PS: does anyone of you own a Mac or knows someone who can test the extension with a Opera 12 Version for Mac? I would like to know especially if the font-detect works.
TiRANiDTiRaNiD # Sunday, April 22, 2012 1:23:28 PM
Unrealmirakulix # Sunday, April 22, 2012 2:11:27 PM
Originally posted by TiRaNiD:
1.5.26 is online
QuHno # Sunday, April 22, 2012 2:59:12 PM
Originally posted by TiRaNiD:
I just checked if I uploaded the wrong version but I did not:
1.5.26 is the version on my server.
May be it is a chaching issue or your provider's proxies did not yet propagate it?
QuHno # Tuesday, April 24, 2012 8:42:39 PM
Changes:
some clean up in fontdetect and on the options page.
removed font-spacing
made the with setting fixed width
changed the extensions icon again (some further changes needed for the small icon)
download as usual from my server.
Unrealmirakulix # Tuesday, April 24, 2012 9:10:57 PM
QuHno # Tuesday, April 24, 2012 9:59:39 PM
Some presets like in the old extension - or, if I find a good solution: Storing individual presets. (I think some people wished that
... but I don't know if I manage to do that before my (internally) planned submission date to the extensions catalog ...
Saskatchewan # Wednesday, April 25, 2012 8:23:26 AM
And don't forget about a possibility to hide extension icon from the toolbar (as we've got keyboard shortcuts and mouse gestures support now).
Unrealmirakulix # Wednesday, April 25, 2012 8:36:57 AM
----------------------------------
Here on this page: Known issues: "Unser Prefs|History Navigation Mode" -> "User Prefs|History Navigation Mode"
-----------------------------------
Please include a description how to use the hotkey and gestures in the extension itself.
-----------------------------------
The reload button should perhaps be converted in a back button. This should make the function clearer for all users.
-----------------------------------
Why did you remove and ?
QuHno # Wednesday, April 25, 2012 3:05:59 PM
@Saskatchewan
1. I hope i've got this pesky checkbox issue solved for now.
2. hiding the icon: Don't know it that is really a good idea in the advent of mobile extensions or touch screens - they don't have keyboard or mouse gestures support (at least not in a way comparable to desktop)
@Unrealmirakulix
1. done
2. should be no more issue, regardless of the setting (but todays 12 is a little bit b0rked)
3. did you press the "?" button for help?
4. back should jump to the previous page, not refresh the page.
5 what do you mean by "and ?"
Unrealmirakulix # Wednesday, April 25, 2012 3:11:25 PM
Originally posted by QuHno:
2. kleiner Schreibfehler
3. ? -> thx
5. bold and underlined html
Thx for new version.
Saskatchewan # Wednesday, April 25, 2012 4:28:11 PM
Update: Well, the preview isn't updated with custom settings at all and that's the reason.
QuHno # Wednesday, April 25, 2012 7:17:17 PM
Checkboxes and preferences and preview combined is a real big mess. I had these problems before, and I still don't know exactly what causes this erratic behavior.
I'll look at it as soon as I am calmed down a little bit.
Unrealmirakulix # Wednesday, April 25, 2012 8:36:53 PM
Originally posted by QuHno:
watching Champions Ligue? ^^QuHno # Friday, April 27, 2012 9:49:05 AM
I hope I have got it now and it remembers the settings.
btw: I would prefer watching a league of champignons
Unrealmirakulix # Friday, April 27, 2012 10:10:45 AM
Originally posted by QuHno:
Been a bit nervous at that moment... ^^
1.5.29 installed -> settings stayed
PS: Is there no option to make the toolbar button disappear? We got a shortcut (ctrl + shift + r) and alternatively also gestures... I already have 8 button in my Opera and there'll be more of them in the future I think, if still many oex do not have this option. Should be a default option for extensions by the way...
Saskatchewan # Friday, April 27, 2012 10:21:48 AM
QuHno # Friday, April 27, 2012 11:05:03 AM
@Saskatchewan: That is just a matter of CSS and some additional document.getElementById('foo'), you are right but the main thing for me was to get the functionality straight first. I got the right idea during work this morning and I did not have that much time to spare
Unrealmirakulix # Friday, April 27, 2012 12:34:57 PM
Originally posted by QuHno:
https://addons.opera.com/addons/extensions/details/in-place-translator/ added this option in the newest update...
My buttons: http://prntscr.com/8l5s4
QuHno # Friday, April 27, 2012 4:42:34 PM
Hide toolbar button.
Not in the normal options page but in the help, I wanted to make sure that nobody disables it without reading about the shortcut or how to set up the mouse gestures first
Yes, there was a 1.5.30 before, but I changed some more things - try to find out what. If you don't find it, everything is OK
Unrealmirakulix # Friday, April 27, 2012 4:54:45 PM
Originally posted by QuHno:
Thanks
-> Bei den Einstellung steht die vorschau ein wenig über (Lorem ipsum). Rechts wohl keine Begrenzung...
-> Da man nun den toolbar Button ausblenden kann, sollte bei den Buttons in der Leseansicht auch ein Button zu den Einstellungen sein. So man ja ab sofort über Umwege (oex verwalten -> ...) gehen...
Saskatchewan # Friday, April 27, 2012 5:23:16 PM
Was the icon on the prefs page already on the left, or did it change in this version?
And just BTW: text align icons are in reversed order – align left icon should be on the left, justify on the right side.
Oh, and it looks like now there’s no padding on #readInner on cleaned pages.
TiRANiDTiRaNiD # Friday, April 27, 2012 5:41:59 PM
No matter what article is chosen, the article's text in cleanPages is always 'italics'.
Original - http://files.myopera.com/TiRaNiD/files/zdnet_original.png
Cleaned - http://files.myopera.com/TiRaNiD/files/cleanPages_italics.png
QuHno # Friday, April 27, 2012 5:43:44 PM
Why is it reversed order? Is there a rule for it or is it just common?
about the article: French people tend to speak EMphasized. In earlier versions I had EM and STRONG removal too, but several people complained about it, so what shall I do?
edit: ...and it is not my fault! In this case it comes from the error correction of the HTML5 parser. I think it gets something wrong with the order of the elements during the cleaning process and duplicates the em when I change some of the elements - a classic race condition and I fear the parser wins ...
Saskatchewan # Friday, April 27, 2012 6:08:22 PM
Originally posted by QuHno:
I guess, in every text editor I was using it was (counting from left): left, centre, right, justify. But I haven't ever used RTL designed ones.QuHno # Friday, April 27, 2012 8:10:41 PM
order of the align settings changed, explicit right align added.
"italics" bug for zdnet.fr (and other pages) squashed. No more font decoration for anyone
... two more builds and I have reached 200
TiRANiDTiRaNiD # Friday, April 27, 2012 8:28:53 PM
Originally posted by QuHno:
Danke!
QuHno # Friday, April 27, 2012 9:03:17 PM
Now I just need a tester with a Mac who can tell me how they call the Ctrl key (they have a different keyboard) or if the keyboard shortcut works at all. After that, provided that there are no more major bugs, I think I can submit the extension to the review for the extension's catalog.
Any volunteers who like to translate the help page?
TiRANiDTiRaNiD # Friday, April 27, 2012 9:27:46 PM
Originally posted by QuHno:
I can translate it in Russian. I'm a teacher by profession, and also a linguist. I can do it right.
QuHno # Friday, April 27, 2012 10:02:47 PM
I just have to test if the localization of the help page works the same way as for the other pages, do some code clean up (it is a horrible mess with tons of debug code at the moment) and then the translation party can start
TiRANiDTiRaNiD # Friday, April 27, 2012 10:21:29 PM
Chocimierchocimir # Saturday, April 28, 2012 9:05:32 AM
QuHno # Saturday, April 28, 2012 9:41:36 AM