Introductions
Determine if a sentence is a pangram. A pangram (Greek: παν γράμμα, pan gramma, “every letter”) is a sentence using every letter of the alphabet at least once. The best known English pangram is:
The quick borwn fox jumps over the lazy dog.
This is something we can see when choosing a font. This makes sense as we need to see as many alphabets as we can to see the combinations!
My strategy
There are a lot of ways is checking. and this is my stategy:
- change the each character into lower case
- if is alphabet (strictly speaking, ascii alphabet) put into the accumlator. (but do not insert more than once when duplicated)
- when finished to the list count and check it is 26. (a-z)
procedural programming into functional programming
If we look at the procedure, we will realize that we can do them one by one, and which is easier to be made of a chain of functions.
We could see the each step involving data transformation, Which is great because:
- We could devide our logic into smaller and maintainable functions.
- Maintainable function means, we can write the unit tests per function easier.
- If the each function is general, we could re-use later. (but making a function too general is always an evil, IMHO)
However, I found something unexpected to check a character whether it is lower case or not.
isLower is more complicated than I thought
Because, there are a lot of alphabets including basic english (a-z) in the world,
but our function should only care about english alphabets, which means:
Ã
must be ignored in this test. and isLower
in Data.Char
module cares about non-ascii
alphabet as well (check all unicode characters), we need to check it is ascii as well.
check it is amazaon (a-z)
As Char data type is an Ord
in haskell. we can compare the characters by using
arithmetic comparison:
> 'a' < 'b'
λTrue
> 'Z' < 'a'
λTrue
General solution by using set
Data.Set is a general module to handle the set of data. We can ensure that te member in the set always show once by using library. we can safetly try to add more than once, and we will count the number of kind of members later.
Some people would say this is not efficient solution Because it only has 26 cases in total. In other words, We could do use bit (binary) artihmetic which is memory effiecient and fast. I agree that it is worth to try. But we are unable to do it in general cases. So I’ll go for more general case:
module Pangram (isPangram) where
import qualified Data.Char as Char
import qualified Data.Set as Set
isPangram :: String -> Bool
=
isPangram 26==)
(. Set.size
. foldr sieve Set.empty
where
= -- lch : lower character
isLowerAlhpa26 lch 'a' <= lch && lch <= 'z'
=
sieve c acc let lc = Char.toLower c
in
if isLowerAlhpa26 lc then
(Set.insert lc acc)else
acc
But wait… what if the string is very long and we want to get result quicker if is decided
to be True
?
Return to basic
foldr
is not designed to quit earlier.
but if we are using the normal List
, we can do something with lazy evaluation.
module Pangram (isPangram) where
import qualified Data.Char as Char
isPangram :: String -> Bool
=
isPangram 26 ==)
(. length
. accPangram' ""
where
= []
accPangram' acc [] :cs) =
accPangram' acc (clet lc = Char.toLower c
in
if isLowerAlhpa26 lc
&& (not . elem lc $ acc) then
if length acc >= 25 then -- quit earlier if possible!
True]
[else
True : accPangram' (lc:acc) cs
else
accPangram' acc cs
=
isLowerAlhpa26 lch 'a' <= lch && lch <= 'z'
In this example, I use the some edge cases by checking the number of list is equal or more than 26, to exit earlier. Another approach is using take 26.
-- .. snip ..
26 ==)
(. length
. take 26 -- take only 26 from the beginning
. accPangram' ""
where
= []
accPangram' acc [] :cs) =
accPangram' acc (clet lc = Char.toLower c
in
if isLowerAlhpa26 lc
&& (not . elem lc $ acc) then
-- note : we don't need to check here anymore.
--if length acc >= 25 then -- quit earlier if possible!
--[True]
--else
True : accPangram' (lc:acc) cs
-- .. snip ..
This is another kind of design pattern we should consider when you develop in lazy evaluation. i.e: think about that we could chain the function in context of quit earlier if applicable.
More options
We can achieve similar effect by using library function like unfoldr
.
Or sometimes, we can go for foldM
(but not in this task.)
Rust version
lower case is also compliated in rust
Rust also concerns about unicode, to_lowercase() looks like change a char into another char at first time. but the result could be serveral chars. we can check out the example in above to_lowercase() link.
So, what we want to here is to_ascii_lowercase()!
A Solution with Storing in a String
the total list in result is only 26 members in maximum, so it is safe to say we don’t need to dig for HashSet here.
pub fn is_pangram(sentence: &str) -> bool {
let collect_as_lowercases = |mut acc: String, c: char| {
if c.is_ascii_alphabetic() {
let uc = c.to_ascii_lowercase();
if !acc.contains(uc) {
.push(uc)
acc}
}
acc};
sentence.chars()
.fold(String::new(), collect_as_lowercases)
.len()
== 26
}
The basic idea is the same as haskell version. I didn’t go for early exit when
the mission is fulfiled already. I haven’t checked enough, but at least, we
can go with normal for
loop.
I tried below code but has no luck yet. (which result is the same as haskell)
// .. snip ..
sentence.chars()
.fold(Vec::<char>::new(), collect_as_lowercases)
.into_iter()
.take(26) // please take 26 only!!
.count()
== 26
}
Another Solution wih HashSet
It is generally great idea to use general function or library as many as you can. You can even feel free to use or compare between your own solution and one with general function. So the following code is with Hashset:
use std::collections::HashSet; // required to use it as `HashSet`
pub fn is_pangram(sentence: &str) -> bool {
let collect_as_lowercases = |mut acc: HashSet<char>, c: char| {
if c.is_ascii_alphabetic() {
let lc = c.to_ascii_lowercase();
// basically same as push() except that we don't need to check
// existence by ourselves. neat!
.insert(lc);
acc}
acc};
sentence.chars()
.fold(HashSet::<char>::new(), collect_as_lowercases)
.len()
== 26
}
One more solution using collect()
I found this solution after mine.
use std::collections::HashSet;
use std::ascii::AsciiExt;
pub fn is_pangram(sentence: &str) -> bool
{
sentence.to_lowercase()
.chars()
.filter(|&c| c.is_ascii() && c.is_alphabetic())
.collect::<HashSet<char>>()
.len() == 26
}
This is very clear solution. my .to_lowercase is applied to each char
. but this solution
apply the function with same name on the &str
, Moreorver, magic .collect()
function looks
interesting, I’d like to dig more about later sometime.
Wrapping Up _정리하기
- Procedule programming could be transformed into functional programming especially when the task is related into data transformation.
- Breaking early is sometimes tricky for haskell beginner like me. We need to figure out how to do it by making custom function. or Monad Control in haskell (I hope I could learn more and talk about it.)
. Use general library is good idea for robust programming even though it is worth to try to make your own.
Little More Information About foldr
Note:
foldr,
sometimes, could be finshed earlier in conjunction withtake
. When the first element could be determined right away, which doesn’t require to access full accumlator.
> foldr (:) [] [1,2,3,4,5]
λ1,2,3,4,5]
[= take 10 $ foldr (:) [] [1,2..]
λ1,2,3,4,5,6,7,8,9,10]
[> λ
This behaviour achieved with only foldr
, not foldl
. This is beyond the subject of
the article. I hope I could write about this later.
> take 10 $ foldl (flip (:)) [] [1,2..]
λ-- which runs forever.
-- So, I stopped by typing Ctrl-C
^CInterrupted.
Thank you for reading!!
If you need some classic music, I would recommend Yunchan Lim 임윤찬 – RACHMANINOV Piano Concerto No. 3 in D Minor, op. 30 – 2022 Cliburn Competition. Which was quite phenomenal performance for me even though I don’t know about the classic music.
I would say, connection between all of us is most important thing..