a tribute to Factor - a small tutorial
Saturday, September 26, 2009 9:29:12 PM
This blog, I really don't use it much. But here's to exceptions :-)
Factor is a stack language, which is plainly cool.
Factor is a programming language I like. It has the property that it's generally easy to factor out code into a new word (function), which is where it gets its name from.
Your code definitions stay short and obvious, so they're easy to reason about. This is appealing to me.
It also has high level features, lambdas or function-passing as the quotation concept, some meta-programming (and good choices with regard to balance of features), and it's DSL-like, in that you can even create your own syntax in the domain you're working on.
You're encouraged to stay on the high level and use the library goodies, since the compiler is quite advanced. So you write simple, short code and the compiler will make sure you get native, deployed binaries with quite good performance.
Compared to other languages I've seen, the code written gets to be very elegant.
I think it is also more productive as a programming language, although the initial get-something-to-work is longer.
Okay, so enough introduction.
Today we'll take a look at a simple exercise, which is made into a tutorial here.
Please note that I'm quite a beginner, this is one way to accomplish this, which I review and explain here.
I'm sure the more experienced would find obvious, shorter ways, and I'd be happy to review and explain them in the following post. (so feel free to leave your code in the comments)
We have a text file, with a few configs, the entries are not seperated, but we know that they start with an opening brace “{“ and finish with one as well “}”.
example:
purpose: we want to split the file, so we'd get a sequence of these entries.
let's add another test-case, so as to have some variation:
{ { 1 2 } { 2 3 4 } { “hello” } 7 { 89 } 6 5 } }
note we have an extra closing brace, which is an error, but we'll try to be broad-minded about it, and not have it interfere in the matching.
Ok, so it looks like we'd need an iterative approach here.
It's most obvious to see it as levels: with every opening brace you enter into a higher level, and every closing brace downs the level by 1.
so for this file, we have 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 2, 1, 2, 1,2,1, 0, 1,2,1,2,1,2,1,2,1,0,-1.
going from 0 → 1
means we found an opening brace.
after that, we are waiting for a 0 level again.
Pretty simple, yet we have a few conditions, let's write them down.
If we came from below, and we got 1, then it's an opening brace and we mark its position.
If we get a 0 level, we record the position.
If we get something below zero, then it's just redundant closing braces, that we can ignore.
I introduced a symbol named “up?” to know if we came from below or from above.
(on second thought, a better word would be from-above or something)
! this is a comment
! the “off?” and “on?” words below are just helper words.
: off? ( var -- ? )
get f = ;
: on? ( var -- ? )
off? not ;
! here you can see the symbols, which are used as variables,
! they're inside a private namespace so only words in this file can use them.
<PRIVATE
SYMBOLS: brace-count up? ;
PRIVATE>
: reset-counter ( -- )
brace-count 0 swap set ;
: counter-case ( pos count -- pos/f )
{
{ [ dup 1 = up? off? and ] [ drop up? on ] } ! came from 0
{ [ dup 0 = ] [ drop up? off ] }
{ [ dup 0 < ] [ 2drop reset-counter f ] }
[ 2drop f ]
} cond ;
[/FONT]
This word takes a position and the current level ( 2 input arguments ) and outputs either a position or the f value which is a false symbol.
cond takes a sequence of sequences and one optional quotation.
Every inner sequence is one of the cases, the first part needs to output t (true) in order for the second part to run. When matching, the argument is consumed.
For example, in the second condition: pos count [ dup 0 = ]
“pos count dup” becomes “pos count count”
We duplicate the count-level, so it's not lost for the cases below.
“count 0 = “ becomes either t/f , so we have on the stack: pos count t/f
if it's an f, it's dropped and the two items: pos count, would be matched against the next condition.
If it's a t, it'll be consumed by the right hand part, the action quotation. So in a true case we have: pos count drop up? off
off sets a variable to f and doesn't leave anything on the stack, 'count' was dropped, and position is left for later, since it's the zero level condition, and we want to record it.
It is important to leave the same number of items in every branch.
As you can see, in every match, we left an f value, or the position.
We handle level1 that came from below, (first condition)
level 0 (second condition)
level negative ( third condition)
and others: higher than 1, and 1 that came from above ( default quotation)
although this is very introductory stuff, you can see here how 'cond' works,
you can see symbols. Use the get and set words to change them (as well as set-global get-global). You can off and on them. And they're pretty simple.
you can also see how to define new words using the colon -
: name ( input-parameters -- output-parameters )
(The input and output parameters are called stack-effects.)
And you probably started mentally visualizing how these stack items, whether input arguments to a word, or in every place in the code, dance and 'swap', jump 'over' each other, 'dup'licate, or stay in the air for a little duration. (dip)
So now that we have some kind of algorithm, we should manage everything else. We can clearly see that the word 'counter-case' expects position and count-level. Before that, please take a look:
USING: arrays assocs combinators fry grouping io kernel math
math.order math.ranges namespaces prettyprint sequences sorting ;
IN: braces-matcher
you can probably guess, USING: is all the words from different vocabularies that we used.
IN: something is the name of our vocab.
This something will also be the name of the folder that contains it. (this is required)
I suggest you try some things for yourself. Download the factor binary for your platform, unzip and run it, and go to its directory.
In that directory you have a 'work' folder.
Create a folder, with the name braces-matcher, and put (using a text editor of course) at least two files in it: braces-matcher.factor and braces-matcher-tests.factor
Now return to the factor gui.
type USE: braces-matcher
and it will load it and tell you if there are errors.
After every change in your text editor, you click F2 for factor to refresh.
Let's continue. So we have some text, we want to scan it for opening braces and closing braces, and record the positions.
Let's do that.
: positions ( text char -- seq )
[ >array ] dip '[ swap _ = [ drop f ] unless ] map-index sift ;
this requires some explanation. We have dips, fries, unless and some maps. (almost a breakfast)
[ … ] dip takes the rightmost argument 'char', holds it in the air, does whatever is in the quotation, and lands it back down to the right-most position (top of the stack). you can visualize it, as the item that is now to the left of the quotation jumps above the bracketed quotation, brackets disappearing, and lands instead of the word dip.
'[ ← this is called fry. It takes the first item it encounters, and puts it in the rightmost underscore _ position that it contains within its brackets.
So after >array, text became a sequence of numbers (which signify the characters), the 'char' argument got into place, and now text faces map-index.
map takes a certain sequence, does something to it, and returns a sequence with the same number of items. Again, it's important to retain the same number of items.
map-index takes an item from the sequence and its position. We swap them, compare to our 'char'
and we have the position and a boolean. 'unless' takes the boolean. It calls the quotation if the boolean is false, this quotation drops the position and outputs f. if the boolean was true, just the position is returned.
So in the end, we have a sequence of positions and f's. 'sift' removes those f's.
: inc-positions ( text char -- pairs )
positions dup length 1 <array> zip ;
: dec-positions ( text char -- pairs )
positions dup length -1 <array> zip ;
I created 2 words for positions that decrease the level and those that increase the level.
These words just generate pairs of { pos 1 } or { pos -1 } for all the positions in the sequence.
do you see the code duplication? Let's factor it out.
We need the changing part, which is the number, to be on the left side.
(we should also let each word do just one thing.)
So let's put it after the word 'positions'.
positions 1 [ dup length ] dip <array> zip ;
all the parts after the number can be taken as another word.
: prepare-run-count ( seq int -- pairs )
[ dup length ] dip <array> zip ;
: inc-positions ( text char -- pairs )
positions 1 prepare-run-count ;
: dec-positions ( text char -- pairs )
positions -1 prepare-run-count ;
I think this looks better, and still has some information inside that describes what it does.
Since the words just follow one another it's very easy to refactor things in Factor compared to other languages.
Next part:
: open-close-list ( text open-char close-char -- seq )
[ inc-positions ] [ dec-positions ] bi-curry* bi append sort-keys ;
here we have some combinators, you're allowed to visualize them as if transforming the code (right before your eyes).
after bi-curry* does its job, we are left with
then bi comes along. 'bi' calls these two quotations, after inserting 'text' into them.
It looks complex at the beginning but those combinators are very useful and represent the control flow of your program very nicely.
Now we have 2 sequences of position pairs, we append them together, and sort by key.
It looks like the following:
: counter+ ( int -- current )
brace-count [ get + ] [ set ] [ get ] tri ;
: matching-pairs ( text open-char close-char -- seq )
reset-counter open-close-list
[ first2 counter+ counter-case ] map sift 2 <groups> >array ;
we start the counter from 0 ( reset counter ), generate the list that you see above,
then map over it. Taking each item, which is a pair, we use first2 to take its data out.
Now we have two arguments, position and 1 or -1. we update the counter, and check what to do in the counter-case. After that, we sift the f's, make groups of 2, and output as an array.
Here is counter-case again
: counter-case ( pos count -- pos/f )
{
{ [ dup 1 = up? off? and ] [ drop up? on ] } ! came from 0
{ [ dup 0 = ] [ drop up? off ] }
{ [ dup 0 < ] [ 2drop reset-counter f ] }
[ 2drop f ]
} cond ;
only thing left to explain is 'counter+'. It increments or decrements based on 1, or -1 that it got, and returns the current value that it has.
: counter+ ( int -- current )
brace-count [ get + ] [ set ] [ get ] tri ;
'tri' is also a combinator. In our case it puts the brace-count variable into the 3 quotations, and then runs them. Equivalent to:
-1 brace-count get + brace-count set brace-count get
lastly,
: matching-split ( text open-char close-char -- seq )
[ 2drop ] [ matching-pairs ] 3bi
[ first2 1 + rot <slice> >string ] with map ;
I think you now know enough to understand this last bit of code by yourself.
If you don't know a certain word, you can type it in the factor gui and press ctrl+h to get the documentation for it. For shuffle words like rot, you can just put the text cursor inside the word, and the status bar shows you its stack-effects.
P.S: unit tests are very concise and a treat to work with. I haven't covered those here, so just an example:
[ { { 0 10 } { 14 16 } } ] [ "{ { } { } } } { }" CHAR: { CHAR: } matching-pairs ] unit-test
Hope you enjoyed, and that you found this introduction to factor educational.
you can view the entire code with syntax highlighting here:
http://paste.factorcode.org/paste?id=915
The code works for any opening and closing character that you choose, but only for a single character and not a substring. so know its limitations.
It also scans the input twice in getting the positions which could have been avoided pretty easily. If you want, you can try your hand at fixing those two issues, and I'll give credit the next time I have something to write.
Don't forget to leave in the comments your insightful code snippets, simplifications, or just a nice response. It's also interesting to see how your favourite language compares, the strong and weak points of each other.
Thanks and see ya :-)
Introduction to Factor
Factor is a stack language, which is plainly cool.
Factor is a programming language I like. It has the property that it's generally easy to factor out code into a new word (function), which is where it gets its name from.
Your code definitions stay short and obvious, so they're easy to reason about. This is appealing to me.
It also has high level features, lambdas or function-passing as the quotation concept, some meta-programming (and good choices with regard to balance of features), and it's DSL-like, in that you can even create your own syntax in the domain you're working on.
You're encouraged to stay on the high level and use the library goodies, since the compiler is quite advanced. So you write simple, short code and the compiler will make sure you get native, deployed binaries with quite good performance.
Compared to other languages I've seen, the code written gets to be very elegant.
I think it is also more productive as a programming language, although the initial get-something-to-work is longer.
Okay, so enough introduction.
Today
Today we'll take a look at a simple exercise, which is made into a tutorial here.
Please note that I'm quite a beginner, this is one way to accomplish this, which I review and explain here.
I'm sure the more experienced would find obvious, shorter ways, and I'd be happy to review and explain them in the following post. (so feel free to leave your code in the comments)
Brace matching
We have a text file, with a few configs, the entries are not seperated, but we know that they start with an opening brace “{“ and finish with one as well “}”.
example:
{
{ server 1.2.3.4 }
{ port 21 }
{ access authenticate }
{ user aszxkfhy }
}
{
{ server 128.127.126.125 }
{ port 64 }
{ access public }
}
purpose: we want to split the file, so we'd get a sequence of these entries.
let's add another test-case, so as to have some variation:
{ { 1 2 } { 2 3 4 } { “hello” } 7 { 89 } 6 5 } }
note we have an extra closing brace, which is an error, but we'll try to be broad-minded about it, and not have it interfere in the matching.
The algorithm.
Ok, so it looks like we'd need an iterative approach here.
It's most obvious to see it as levels: with every opening brace you enter into a higher level, and every closing brace downs the level by 1.
so for this file, we have 0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0, 1, 2, 1, 2, 1,2,1, 0, 1,2,1,2,1,2,1,2,1,0,-1.
going from 0 → 1
means we found an opening brace.
after that, we are waiting for a 0 level again.
Pretty simple, yet we have a few conditions, let's write them down.
If we came from below, and we got 1, then it's an opening brace and we mark its position.
If we get a 0 level, we record the position.
If we get something below zero, then it's just redundant closing braces, that we can ignore.
I introduced a symbol named “up?” to know if we came from below or from above.
(on second thought, a better word would be from-above or something)
! this is a comment
! the “off?” and “on?” words below are just helper words.
: off? ( var -- ? )
get f = ;
: on? ( var -- ? )
off? not ;
! here you can see the symbols, which are used as variables,
! they're inside a private namespace so only words in this file can use them.
<PRIVATE
SYMBOLS: brace-count up? ;
PRIVATE>
: reset-counter ( -- )
brace-count 0 swap set ;
: counter-case ( pos count -- pos/f )
{
{ [ dup 1 = up? off? and ] [ drop up? on ] } ! came from 0
{ [ dup 0 = ] [ drop up? off ] }
{ [ dup 0 < ] [ 2drop reset-counter f ] }
[ 2drop f ]
} cond ;
[/FONT]
This word takes a position and the current level ( 2 input arguments ) and outputs either a position or the f value which is a false symbol.
cond takes a sequence of sequences and one optional quotation.
Every inner sequence is one of the cases, the first part needs to output t (true) in order for the second part to run. When matching, the argument is consumed.
For example, in the second condition: pos count [ dup 0 = ]
“pos count dup” becomes “pos count count”
We duplicate the count-level, so it's not lost for the cases below.
“count 0 = “ becomes either t/f , so we have on the stack: pos count t/f
if it's an f, it's dropped and the two items: pos count, would be matched against the next condition.
If it's a t, it'll be consumed by the right hand part, the action quotation. So in a true case we have: pos count drop up? off
off sets a variable to f and doesn't leave anything on the stack, 'count' was dropped, and position is left for later, since it's the zero level condition, and we want to record it.
It is important to leave the same number of items in every branch.
As you can see, in every match, we left an f value, or the position.
We handle level1 that came from below, (first condition)
level 0 (second condition)
level negative ( third condition)
and others: higher than 1, and 1 that came from above ( default quotation)
although this is very introductory stuff, you can see here how 'cond' works,
you can see symbols. Use the get and set words to change them (as well as set-global get-global). You can off and on them. And they're pretty simple.
you can also see how to define new words using the colon -
: name ( input-parameters -- output-parameters )
(The input and output parameters are called stack-effects.)
And you probably started mentally visualizing how these stack items, whether input arguments to a word, or in every place in the code, dance and 'swap', jump 'over' each other, 'dup'licate, or stay in the air for a little duration. (dip)
The entirety.
So now that we have some kind of algorithm, we should manage everything else. We can clearly see that the word 'counter-case' expects position and count-level. Before that, please take a look:
USING: arrays assocs combinators fry grouping io kernel math
math.order math.ranges namespaces prettyprint sequences sorting ;
IN: braces-matcher
you can probably guess, USING: is all the words from different vocabularies that we used.
IN: something is the name of our vocab.
This something will also be the name of the folder that contains it. (this is required)
I suggest you try some things for yourself. Download the factor binary for your platform, unzip and run it, and go to its directory.
In that directory you have a 'work' folder.
Create a folder, with the name braces-matcher, and put (using a text editor of course) at least two files in it: braces-matcher.factor and braces-matcher-tests.factor
Now return to the factor gui.
type USE: braces-matcher
and it will load it and tell you if there are errors.
After every change in your text editor, you click F2 for factor to refresh.
Let's continue. So we have some text, we want to scan it for opening braces and closing braces, and record the positions.
Let's do that.
: positions ( text char -- seq )
[ >array ] dip '[ swap _ = [ drop f ] unless ] map-index sift ;
this requires some explanation. We have dips, fries, unless and some maps. (almost a breakfast)
[ … ] dip takes the rightmost argument 'char', holds it in the air, does whatever is in the quotation, and lands it back down to the right-most position (top of the stack). you can visualize it, as the item that is now to the left of the quotation jumps above the bracketed quotation, brackets disappearing, and lands instead of the word dip.
'[ ← this is called fry. It takes the first item it encounters, and puts it in the rightmost underscore _ position that it contains within its brackets.
So after >array, text became a sequence of numbers (which signify the characters), the 'char' argument got into place, and now text faces map-index.
map takes a certain sequence, does something to it, and returns a sequence with the same number of items. Again, it's important to retain the same number of items.
map-index takes an item from the sequence and its position. We swap them, compare to our 'char'
and we have the position and a boolean. 'unless' takes the boolean. It calls the quotation if the boolean is false, this quotation drops the position and outputs f. if the boolean was true, just the position is returned.
So in the end, we have a sequence of positions and f's. 'sift' removes those f's.
: inc-positions ( text char -- pairs )
positions dup length 1 <array> zip ;
: dec-positions ( text char -- pairs )
positions dup length -1 <array> zip ;
I created 2 words for positions that decrease the level and those that increase the level.
These words just generate pairs of { pos 1 } or { pos -1 } for all the positions in the sequence.
do you see the code duplication? Let's factor it out.
We need the changing part, which is the number, to be on the left side.
(we should also let each word do just one thing.)
So let's put it after the word 'positions'.
positions 1 [ dup length ] dip <array> zip ;
all the parts after the number can be taken as another word.
: prepare-run-count ( seq int -- pairs )
[ dup length ] dip <array> zip ;
: inc-positions ( text char -- pairs )
positions 1 prepare-run-count ;
: dec-positions ( text char -- pairs )
positions -1 prepare-run-count ;
I think this looks better, and still has some information inside that describes what it does.
Since the words just follow one another it's very easy to refactor things in Factor compared to other languages.
Next part:
: open-close-list ( text open-char close-char -- seq )
[ inc-positions ] [ dec-positions ] bi-curry* bi append sort-keys ;
here we have some combinators, you're allowed to visualize them as if transforming the code (right before your eyes).
after bi-curry* does its job, we are left with
text [ open-char inc-positions ] [ close-char dec-positions ]
then bi comes along. 'bi' calls these two quotations, after inserting 'text' into them.
[ text open-char inc-positions ] [ text close-char dec-positions ] ...
It looks complex at the beginning but those combinators are very useful and represent the control flow of your program very nicely.
Now we have 2 sequences of position pairs, we append them together, and sort by key.
It looks like the following:
{ { 2 1 } { 4 1 } { 8 -1 } { 9 1 } { 10 -1 } { 16 -1 } }
: counter+ ( int -- current )
brace-count [ get + ] [ set ] [ get ] tri ;
: matching-pairs ( text open-char close-char -- seq )
reset-counter open-close-list
[ first2 counter+ counter-case ] map sift 2 <groups> >array ;
we start the counter from 0 ( reset counter ), generate the list that you see above,
then map over it. Taking each item, which is a pair, we use first2 to take its data out.
Now we have two arguments, position and 1 or -1. we update the counter, and check what to do in the counter-case. After that, we sift the f's, make groups of 2, and output as an array.
Here is counter-case again
: counter-case ( pos count -- pos/f )
{
{ [ dup 1 = up? off? and ] [ drop up? on ] } ! came from 0
{ [ dup 0 = ] [ drop up? off ] }
{ [ dup 0 < ] [ 2drop reset-counter f ] }
[ 2drop f ]
} cond ;
only thing left to explain is 'counter+'. It increments or decrements based on 1, or -1 that it got, and returns the current value that it has.
: counter+ ( int -- current )
brace-count [ get + ] [ set ] [ get ] tri ;
'tri' is also a combinator. In our case it puts the brace-count variable into the 3 quotations, and then runs them. Equivalent to:
-1 brace-count get + brace-count set brace-count get
lastly,
: matching-split ( text open-char close-char -- seq )
[ 2drop ] [ matching-pairs ] 3bi
[ first2 1 + rot <slice> >string ] with map ;
I think you now know enough to understand this last bit of code by yourself.
If you don't know a certain word, you can type it in the factor gui and press ctrl+h to get the documentation for it. For shuffle words like rot, you can just put the text cursor inside the word, and the status bar shows you its stack-effects.
P.S: unit tests are very concise and a treat to work with. I haven't covered those here, so just an example:
[ { { 0 10 } { 14 16 } } ] [ "{ { } { } } } { }" CHAR: { CHAR: } matching-pairs ] unit-test
Hope you enjoyed, and that you found this introduction to factor educational.
you can view the entire code with syntax highlighting here:
http://paste.factorcode.org/paste?id=915
The code works for any opening and closing character that you choose, but only for a single character and not a substring. so know its limitations.
It also scans the input twice in getting the positions which could have been avoided pretty easily. If you want, you can try your hand at fixing those two issues, and I'll give credit the next time I have something to write.
Don't forget to leave in the comments your insightful code snippets, simplifications, or just a nice response. It's also interesting to see how your favourite language compares, the strong and weak points of each other.
Thanks and see ya :-)
Unregistered user # Monday, September 28, 2009 8:57:43 PM
Unregistered user # Friday, October 2, 2009 9:14:20 PM