Sign up | Lost password? | Help

[ advanced search ]

How to make links from text in htmlfiles?

Forums » Opera Community » General Opera topics » User JavaScript

Go to last post

Tuesday, 6. June 2006, 18:28:45

How to make links from text in htmlfiles?

Do any of you have a solution to how to make links of webadresses which isnt already a link?

This would be something similar to the linkify-txt userjs found at userjs.org, but with one major difference, it has to avoid creating links inside links (href attrib) and such.

Optionally it could create links from text where the domain (of the link found) is found in a specified list of domains.

Wednesday, 7. June 2006, 19:41:52

scipio

Undutchable

avatar

Posts: 29747

Netherlands

Why don't you use Opera's Go To URL option? Select the url, right-click and click on that option.

Thursday, 8. June 2006, 03:09:13

Originally posted by scipio:

Why don't you use Opera's Go To URL option? Select the url, right-click and click on that option.



Well, it certainly works, but the reason why I don't use that option
is about the same as the reason html pages has links at all, and is not just presented as a page with links as text (eg not using A tag or similar) instead. Ease of use. :smile:

Certain forums (not allowing html in msg), documents etc are filled with those however, and then it becomes rather tedious
after the 3rd or 4th operation.

I guess the above would explain why.

Thursday, 15. June 2006, 00:06:45

I worked it out by borrowing a small amount of code from linkify-txt.js at userjs.org, and adding some of my own..

I'm a newbie with userjs, but this seems to work perfectly.

Thursday, 15. June 2006, 18:10:31

Could you please share it? I'm also interested in that kind of script...

Thursday, 15. June 2006, 18:51:36

Originally posted by j0sefK:

Could you please share it? I'm also interested in that kind of script...



I will share it when I have tested it some more so I can be somewhat sure about its reliability.

I made it yesterday, so it has only been tested in the most basic way yet.

Rgds

Monday, 19. June 2006, 22:59:30

Now you can find it at http://moropostnr.com/otherfiles/linkify-url1_00.zip

Seems like UserJS.Org has some problems, and probably would take some time to have it released
there.

Rgds

Tuesday, 20. June 2006, 01:39:44 (edited)

shoust

Operaised

avatar

Posts: 3008

United Kingdom

Nice! :smile:

EDIT: The script is messing up the above download link.

linkifyerror.png

Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.

EDIT2: Found a fix.

Replace this code
if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase!='STYLE')


With the below:
if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase()!='STYLE' && node.tagName.toUpperCase()!='A')

Monday, 19. June 2006, 23:10:13

Works like a charm! Thanks!!! :hat:

Tuesday, 20. June 2006, 02:16:26

jebediah

avatar

Posts: 334

Nice script, haerdalis.

Suggestions for use in Opera 9: rather than walking the DOM use XPath to generate a collection of nodes to linkify and transform hd_zzzzzz_WalkNodes into a non-recursive function hd_zzzzzz_LinkifyNode. The script executes faster and the logic is simpler.

var elms = document.evaluate('//body//text()[not(ancestor::a)][not(ancestor::script)][not(ancestor::style)]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
  for (var i = 0, elm; elm = elms.snapshotItem(i); i++) {
    hd_zzzzzz_LinkifyNode(elm);
  }


Change your event listener from

document.addEventListener('load',hd_zzzzzz_Linkify,false);


to

opera.addEventListener('AfterEvent.DOMContentLoaded', hd_zzzzzz_Linkify, false);


so that the script runs as soon as the DOM is loaded rather than waiting for the last image to load.

Tuesday, 20. June 2006, 02:44:30

Stoen

avatar

Posts: 1109

Originally posted by shoust:

The script is messing up the above download link.
linkifyerror.png
Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.



Its an Opera redraw error that only seems to occur when manipulating the text of anchor elements, I see it sometimes in UHB. A simple way to get around it is by flashing the anchors display style eg
var disp = anchor.style.display;
anchor.style.display = 'none';
anchor.style.display = disp;

where anchor is the parentNode of the text node that is being manipulated

Tuesday, 20. June 2006, 03:49:39

Originally posted by j0sefK:

Works like a charm! Thanks!!! :hat:



Thank you :smile:

Tuesday, 20. June 2006, 03:57:03

Originally posted by jebediah:

Nice script, haerdalis.


Thank you.

Originally posted by jebediah:


Suggestions for use in Opera 9: rather than walking the DOM use XPath to generate a collection of nodes to linkify and transform hd_zzzzzz_WalkNodes into a non-recursive function hd_zzzzzz_LinkifyNode. The script executes faster and the logic is simpler.

var elms = document.evaluate('//body//text()[not(ancestor::a)][not(ancestor::script)][not(ancestor::style)]', document, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
  for (var i = 0, elm; elm = elms.snapshotItem(i); i++) {
    hd_zzzzzz_LinkifyNode(elm);
  }


Change your event listener from

document.addEventListener('load',hd_zzzzzz_Linkify,false);


to

opera.addEventListener('AfterEvent.DOMContentLoaded', hd_zzzzzz_Linkify, false);


so that the script runs as soon as the DOM is loaded rather than waiting for the last image to load.



Currently I use 8.54, and this doesn't work on that version (as I've experienced anyhow), but will do when 9 (not preview) is released.
Not really any experience with XPath.

By the way, do 8.54 support XPath?

Tuesday, 20. June 2006, 04:06:44

Tamil

Opera :-(|)

avatar

Posts: 110204

Heaven

@haerdalis, Thanks. :up:

Tuesday, 20. June 2006, 04:11:13

Originally posted by shoust:

Nice! :smile:
EDIT2: Found a fix.

Replace this code

if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase!='STYLE')


With the below:
if(node.nodeType==1 && node.childNodes && node.tagName.toUpperCase()!='SCRIPT' && node.tagName.toUpperCase()!='STYLE' && node.tagName.toUpperCase()!='A')



No problem adding this. Haven't experienced any problems myself however.

Added this in the script. For basic operation one wouldn't need any operations on A elements anyway.

Tuesday, 20. June 2006, 04:49:56

jebediah

avatar

Posts: 334

Originally posted by haerdalis:

Currently I use 8.54, and this doesn't work on that version (as I've experienced anyhow), but will do when 9 (not preview) is released.



Both XPath support and the DOMContentLoaded event are new additions to Opera 9.

XPath, in particular, is really useful for condensing lines and lines of DOM navigation to a single statement. Two good resources are zvon's XPath Tutorial and a section from Mark Pilgrim's Dive into GreaseMonkey (in which he calls XPath's querying power, "The single most powerful tool in your Greasemonkey arsenal").

Tuesday, 20. June 2006, 10:49:03

Originally posted by jebediah:



Both XPath support and the DOMContentLoaded event are new additions to Opera 9.

XPath, in particular, is really useful for condensing lines and lines of DOM navigation to a single statement. Two good resources are zvon's XPath Tutorial and a section from Mark Pilgrim's Dive into GreaseMonkey (in which he calls XPath's querying power, "The single most powerful tool in your Greasemonkey arsenal").



This sounds good.
I guess I'll check out XPath..

Looks similar to something I used in a XSLT transform before by the way.

Tuesday, 20. June 2006, 22:55:36

jebediah

avatar

Posts: 334

An update on correct DOMContentLoaded usage from another thread.

You should not be using this "AfterEvent.DOMContentLoaded" construct anymore. Opera supports normal document.addEventListener("DOMContentLoaded") now.

Wednesday, 21. June 2006, 02:49:23

Originally posted by Stoen:

Originally posted by shoust:

The script is messing up the above download link.
linkifyerror.png
Possible solution, remove current links protocol with regexp match, then regexp match the text, then readd the protocol back on to it after linkify finishes.



Its an Opera redraw error that only seems to occur when manipulating the text of anchor elements, I see it sometimes in UHB. A simple way to get around it is by flashing the anchors display style eg
var disp = anchor.style.display;
anchor.style.display = 'none';
anchor.style.display = disp;

where anchor is the parentNode of the text node that is being manipulated



This seems to work, and seems neccesary in O9 on all newly created links.

In a file with large elements this would be quite noticeable however.

Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?

It doesnt do it on all links btw, inside H3 elements it redraws correctly without any fix, but in CITE and other elements, then something goes wrong.

Wednesday, 21. June 2006, 03:06:55

Stoen

avatar

Posts: 1109

Originally posted by haerdalis:

In a file with large elements this would be quite noticeable however.
Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?



Another/better way is to just do it once on the body element if the script has created any links, seems to work for me in UHB without any problems

I never got around to reporting it as a bug, so it is quite probable Opera dont know about it.

Wednesday, 21. June 2006, 03:57:25

Originally posted by Stoen:

Originally posted by haerdalis:

In a file with large elements this would be quite noticeable however.
Don't really know why O write garbage.. Still does in O9 otherwise. Unsolved bug?



Another/better way is to just do it once on the body element if the script has created any links, seems to work for me in UHB without any problems



This might be a solution. Not tested with a large file/elements yet.


Originally posted by Stoen:


I never got around to reporting it as a bug, so it is quite probable Opera dont know about it.



It really should've been. Not quite sure where a bugreport page is to be found myself. Just a 'report problem with site' as I can see.

Wednesday, 21. June 2006, 04:36:24

Stoen

avatar

Posts: 1109

Originally posted by haerdalis:

It really should've been. Not quite sure where a bugreport page is to be found myself. Just a 'report problem with site' as I can see.



Heh, yeah I know

https://bugs.opera.com/wizard/

Wednesday, 21. June 2006, 15:07:48

Just posted a bugreport on the problem with the garbage on links.

Thursday, 22. June 2006, 09:13:15 (edited)

Stoen

avatar

Posts: 1109

Originally posted by haerdalis:

Just posted a bugreport on the problem with the garbage on links.


:up: Bug number?

Also here are a few things that may help speed up the script if that is a problem on large documents

1. Dont keep on creating new links, create one, once, set all its attributes that wont change and then just clone it, using .cloneNode(false) when you need a new one

2. Same as above but for the text node, dont know if this will result in any speed increase though

3. There is no need to use splitText() twice, especially when you are just going to replace the middle node, just use middleText.deleteData(0,matchedText.length) and node.parentNode.insertBefore(newLinkElement,middleText)

4. Create a copy of the body element and do all of your manipulations on the copy and then replace the original body element. This will significantly speed up the process but will increase memory usage. It can also cause some problems with other scripts. Hmm... just tested this again, in O9 it is actually slower

5. If you are using XPath you can evaluate the expression on the cloned body element from above, just change the expression to //text()...

6. If you are using XPath you can try to use contains() to cut down on the number of text nodes that you run the RegExp on, may be a bit of work though but will provide a measurable speed increase on large documents. Too bad matches() doesnt work

Thursday, 22. June 2006, 14:06:34 (edited)

Originally posted by haerdalis:

Just posted a bugreport on the problem with the garbage on links.


Originally posted by Stoen:


:up: Bug number?


215574. It mentiones the image posted above, this thread, and some explanation.

Originally posted by Stoen:


Also here are a few things that may help speed up the script if that is a problem on large documents

1. Dont keep on creating new links, create one, once, set all its attributes that wont change and then just clone it, using .cloneNode(false) when you need a new one

2. Same as above but for the text node, dont know if this will result in any speed increase though

3. There is no need to use splitText() twice, especially when you are just going to replace the middle node, just use middleText.deleteData(0,matchedText.length) and node.parentNode.insertBefore(newLinkElement,middleText)

4. Create a copy of the body element and do all of your manipulations on the copy and then replace the original body element. This will significantly speed up the process but will increase memory usage. It can also cause some problems with other scripts. Hmm... just tested this again, in O9 it is actually slower

5. If you are using XPath you can evaluate the expression on the cloned body element from above, just change the expression to //text()...

6. If you are using XPath you can try to use contains() to cut down on the number of text nodes that you run the RegExp on, may be a bit of work though but will provide a measurable speed increase on large documents. Too bad matches() doesnt work



Thank you for the tips. There is a big room for improvement in the script, and I will be experimenting with the above. I've just began experimenting with the xpath.
A question: what would be more effective of these btw:
use xpath to get existing A elements, then set the target attrib to _blank when href matches a regex (nonrelated to the current script). Then use another xpath to do what this script does.

Or create a single xpath (which includes existing A elements) and do the same with a couple more ifs?

Thursday, 22. June 2006, 19:42:41 (edited)

By the way, after some testing I've come to the conclusion that the bug only occur within nonblock elements, eg display!=block
, and only when the link isnt prefixed by a pure text of some sort within the parent element.

Thursday, 22. June 2006, 23:38:19

Just posted an update of the script at http://moropostnr.com/otherfiles/linkify-url1_01.zip

In the zip there are 2 scripts, one for version 8, and one for 9, in addition to a testingpage.

Remove the old script from the userjs folder, then
make sure only one of the scripts is copied to the userjs folder.

Friday, 23. June 2006, 04:57:05

jebediah

avatar

Posts: 334

A suggestion about this comment in your code

 All global variables and functions is prefixed with hd_zzzzzz_ to avoid conflicts.


It doesn't look like any of your functions or variables actually need to be global. You could put everything into an anonymous function, in which you declare all variables and functions. This would limit their scope to the anonymous function and you wouldn't have to worry about conflicts.

I've made the changes to your script so that you can see what I mean -- I doubt I'd be able to explain it clearly. userjs.org has a tutorial on methods for avoiding conflicts: http://userjs.org/help/tutorials/avoiding-conflicts

linkify-url9.js

Friday, 23. June 2006, 08:12:02

Stoen

avatar

Posts: 1109

Originally posted by haerdalis:

A question: what would be more effective of these btw:use xpath to get existing A elements, then set the target attrib to _blank when href matches a regex (nonrelated to the current script). Then use another xpath to do what this script does.Or create a single xpath (which includes existing A elements) and do the same with a couple more ifs?



It would probably be faster to use two different XPath expressions to create two seperate node lists, but as long as you dont actually use the RegExp in the top level/first condition of your if statements there probably wont be much difference. Try it and see

Something that you need to consider when using XPath that returns a SNAPSHOT type result is that as the name suggests the node list is a static snapshot. What this means is that when you use splitText you create a new text node that is not part of your original node list and so you will only ever match the first instance of whatever you are looking for within a particular text node

ie.
let nodeList be the text nodes returned by the XPath expression. Note that the following representation is just for clarity and not an actual representation of a node list
nodeList = {'this is a test. this','some text'}

If you are searching for 'this' then the following would happen:

let textNode be the currently selected text node from nodeList
textNode = 'this is a test. this'

your search() would return 0 as the index of the first occurance of 'this'
What happens next is that after splitting textNode the parent node that contains textNode would now look like

'this' ' is a test. this'

However nodeList has no idea that there is a new text node that contains ' is a test. this', so your for loop would just skip straight to 'some text'

What you need to do is use another loop to continue searching for text. You can see an example in the following file that you can use as you wish

linkify-html.js

Friday, 23. June 2006, 13:53:59

Originally posted by Stoen:


It would probably be faster to use two different XPath expressions to create two seperate node lists, but as long as you dont actually use the RegExp in the top level/first condition of your if statements there probably wont be much difference. Try it and see

Something that you need to consider when using XPath that returns a SNAPSHOT type result is that as the name suggests the node list is a static snapshot. What this means is that when you use splitText you create a new text node that is not part of your original node list and so you will only ever match the first instance of whatever you are looking for within a particular text node

ie.
let nodeList be the text nodes returned by the XPath expression. Note that the following representation is just for clarity and not an actual representation of a node list
nodeList = {'this is a test. this','some text'}

If you are searching for 'this' then the following would happen:

let textNode be the currently selected text node from nodeList
textNode = 'this is a test. this'

your search() would return 0 as the index of the first occurance of 'this'
What happens next is that after splitting textNode the parent node that contains textNode would now look like

'this' ' is a test. this'

However nodeList has no idea that there is a new text node that contains ' is a test. this', so your for loop would just skip straight to 'some text'

What you need to do is use another loop to continue searching for text. You can see an example in the following file that you can use as you wish


I'll check out the script :smile: Thank you.

My initial thought after reading your text above is that it probably would be wiser to use two xpaths for what I had in mind, just for the clarity I guess. Seems to make no difference speedwise.

You see, I want it to change all target attribs of links which matches a specific text in the href attr (this is in addition to what this scripts initial purpose is, and would be optional).

Friday, 23. June 2006, 14:17:28 (edited)

Originally posted by jebediah:

A suggestion about this comment in your code

 All global variables and functions is prefixed with hd_zzzzzz_ to avoid conflicts.


It doesn't look like any of your functions or variables actually need to be global. You could put everything into an anonymous function, in which you declare all variables and functions. This would limit their scope to the anonymous function and you wouldn't have to worry about conflicts.



Thank you! This would be a case of "didn't know" :smile:

I haven't tested the script yet, but from the look of it it seems right.

I'm so accoustomed to C++, so I guess I didn't think so far.

Thank you for this lesson. :smile:

Friday, 23. June 2006, 18:11:58

Note: Seems like the XPath solution doesn't always work with some div elements,
but the 8 version always does.

Do somebody have a clue why?

Saturday, 24. June 2006, 06:02:01

Stoen

avatar

Posts: 1109

Possibly because the divs are being created with javascript after the XPath version has already been run, dont know, would need a url to check

Saturday, 24. June 2006, 12:53:36

Originally posted by Stoen:

Possibly because the divs are being created with javascript after the XPath version has already been run, dont know, would need a url to check



This turned out to be an error on my part. :smile:

It wasnt ran at all on that page because of a typo, however what you say would probably apply to the domcontentloaded event I'd guess.

Thursday, 20. July 2006, 20:00:55

shoust

Operaised

avatar

Posts: 3008

United Kingdom

Just one small suggestion, add he [ and ] characters to the exclude list in the regexp. As having a link surrounded in these breaks the link.

eg:
[http://google.com]


Friday, 21. July 2006, 17:10:36

Originally posted by shoust:

Just one small suggestion, add he [ and ] characters to the exclude list in the regexp. As having a link surrounded in these breaks the link.

eg:

[http://google.com]




I guess I can do this.. But, I've encountered links with those used in the path/filename part of the url.. Maybe check for matching start/end pair instead?

Friday, 21. July 2006, 22:31:58 (edited)

I would like some suggestions about such chars which are commonly used to mark urls (which shouldnt be a part of the url in that case), but still are used by some in the url itself..

Edit: Btw, just ending markers, as the part before the protocol isnt known, and really isnt needed.

Friday, 21. July 2006, 22:35:36

Just added exclusion of the ] character when it is the last in the url..
It should be somewhat unlikely to find there other than as a marker I guess.

Didn't bother to change version, so the file is still here

Saturday, 22. July 2006, 06:00:14

Stoen

avatar

Posts: 1109

You are using the RegExp from linkify-text.js arent you? The problem with doing this is that linkify-text used a seperate function to strip certain trailing non-word characters which your script does not

So for instance it would match
http://www.google.com,

Note the comma at the end

What you could do is change this part of the RegExp
[^\s\"<>\{\}\'\(\)]*

to something like
(?:.+?)(?=\W*(?:\s|$))

which will match the smallest number of characters that are followed by zero or more non-word characters which are in turn followed by a space or the end of a line. But this may be too aggressive

Saturday, 22. July 2006, 13:52:58 (edited)

Originally posted by Stoen:

You are using the RegExp from linkify-text.js arent you? The problem with doing this is that linkify-text used a seperate function to strip certain trailing non-word characters which your script does not

So for instance it would match

http://www.google.com,

Note the comma at the end

What you could do is change this part of the RegExp
[^\s\"<>\{\}\'\(\)]*

to something like
(?:.+?)(?=\W*(?:\s|$))

which will match the smallest number of characters that are followed by zero or more non-word characters which are in turn followed by a space or the end of a line. But this may be too aggressive


That is correct, just about anyway.

Thanks for the tip.. I'll try to experiment some with this..
edit: For now it just strips the ], and |.
btw, it really should accept ',' in the file part, as that is quite commonly used, and at times I've even seen it at the end of an url.

Sometimes the same chars are a normal part of an url, which causes some problems now and again..

Not to say it is the proper way to enter an url, but as long those works in the browser then it would rise some questions if the userjs didnt convert it to a link, not counting the space char. And therein lies the problem.

Tuesday, 25. December 2007, 15:01:05

dima.sundler

avatar

Posts: 3

New One:

new urls: _hxxp:\\, _fxp:\\ ...
chenged: var urlRegex = /\b(?:frown:?:frown:_?h.{2}ps?):\/\/)|(www)|(?:frown:_?f.{1}p):\/\/)|(url\("?))[^"\s\<\>]*[^.,;'">\:\s\<\>\)\]\!]/i;


linkify-html.js

Thursday, 10. January 2008, 03:39:33 (edited)

Originally posted by dima.sundler:

New one:



Erm.. Not posted by me, just to make it clear!

In case somebody thought I had assumed another name or something..

Forums » Opera Community » General Opera topics » User JavaScript