MJ's kind weird Code - Exercism - Haskell, Rust

Posted on July 1, 2022 by Myoungjin Jeon

Tags: pangram, haskell, rust, functional programming, fp, fold, unicode, library

Introductions

Determine if a sentence is a pangram. A pangram (Greek: παν γράμμα, pan gramma, “every letter”) is a sentence using every letter of the alphabet at least once. The best known English pangram is:

The quick borwn fox jumps over the lazy dog.

This is something we can see when choosing a font. This makes sense as we need to see as many alphabets as we can to see the combinations!

My strategy

There are a lot of ways is checking. and this is my stategy:

change the each character into lower case

if is alphabet (strictly speaking, ascii alphabet) put into the accumlator. (but do not insert more than once when duplicated)

when finished to the list count and check it is 26. (a-z)

procedural programming into functional programming

If we look at the procedure, we will realize that we can do them one by one, and which is easier to be made of a chain of functions.

We could see the each step involving data transformation, Which is great because:

We could devide our logic into smaller and maintainable functions.

Maintainable function means, we can write the unit tests per function easier.

If the each function is general, we could re-use later. (but making a function too general is always an evil, IMHO)

However, I found something unexpected to check a character whether it is lower case or not.

isLower is more complicated than I thought

Because, there are a lot of alphabets including basic english (a-z) in the world, but our function should only care about english alphabets, which means: Ã must be ignored in this test. and isLower in Data.Char module cares about non-ascii alphabet as well (check all unicode characters), we need to check it is ascii as well.

check it is amazaon (a-z)

As Char data type is an Ord in haskell. we can compare the characters by using arithmetic comparison:

λ> 'a' < 'b'
True
λ> 'Z' < 'a'
True

General solution by using set

Data.Set is a general module to handle the set of data. We can ensure that te member in the set always show once by using library. we can safetly try to add more than once, and we will count the number of kind of members later.

Some people would say this is not efficient solution Because it only has 26 cases in total. In other words, We could do use bit (binary) artihmetic which is memory effiecient and fast. I agree that it is worth to try. But we are unable to do it in general cases. So I’ll go for more general case:

module Pangram (isPangram) where

import qualified Data.Char as Char
import qualified Data.Set as Set

isPangram :: String -> Bool
isPangram =
  (26==)
  . Set.size
  . foldr sieve Set.empty

  where
    isLowerAlhpa26 lch = -- lch : lower character
        'a' <= lch && lch <= 'z'

    sieve c acc =
      let lc = Char.toLower c
      in
        if isLowerAlhpa26 lc then
          (Set.insert lc acc)
        else
          acc

But wait… what if the string is very long and we want to get result quicker if is decided to be True?

Return to basic

foldr is not designed to quit earlier. but if we are using the normal List, we can do something with lazy evaluation.

module Pangram (isPangram) where

import qualified Data.Char as Char

isPangram :: String -> Bool
isPangram =
  (26 ==)
    . length
    . accPangram' ""

    where
      accPangram' acc [] = []
      accPangram' acc (c:cs) =
        let lc = Char.toLower c
        in
          if isLowerAlhpa26 lc
             && (not . elem lc $ acc) then
            if length acc >= 25 then -- quit earlier if possible!
              [True]
            else
              True : accPangram' (lc:acc) cs
          else
            accPangram' acc cs

      isLowerAlhpa26 lch =
          'a' <= lch && lch <= 'z'

In this example, I use the some edge cases by checking the number of list is equal or more than 26, to exit earlier. Another approach is using take 26.

-- .. snip ..
(26 ==)
  . length
  . take 26 -- take only 26 from the beginning
  . accPangram' ""

  where
    accPangram' acc [] = []
    accPangram' acc (c:cs) =
      let lc = Char.toLower c
      in
        if isLowerAlhpa26 lc
           && (not . elem lc $ acc) then
          -- note : we don't need to check here anymore.
          --if length acc >= 25 then -- quit earlier if possible!
          --[True]
          --else
            True : accPangram' (lc:acc) cs
 -- .. snip ..

This is another kind of design pattern we should consider when you develop in lazy evaluation. i.e: think about that we could chain the function in context of quit earlier if applicable.

More options

We can achieve similar effect by using library function like unfoldr. Or sometimes, we can go for foldM (but not in this task.)

Rust version

lower case is also compliated in rust

Rust also concerns about unicode, to_lowercase() looks like change a char into another char at first time. but the result could be serveral chars. we can check out the example in above to_lowercase() link.

So, what we want to here is to_ascii_lowercase()!

A Solution with Storing in a String

the total list in result is only 26 members in maximum, so it is safe to say we don’t need to dig for HashSet here.

pub fn is_pangram(sentence: &str) -> bool {
    let collect_as_lowercases = |mut acc: String, c: char| {
        if c.is_ascii_alphabetic() {
            let uc = c.to_ascii_lowercase();
            if !acc.contains(uc) {
                acc.push(uc)
            }
        }
        acc
    };

    sentence
        .chars()
        .fold(String::new(), collect_as_lowercases)
        .len()
        == 26
}

The basic idea is the same as haskell version. I didn’t go for early exit when the mission is fulfiled already. I haven’t checked enough, but at least, we can go with normal for loop.

I tried below code but has no luck yet. (which result is the same as haskell)

// .. snip ..

    sentence
        .chars()
        .fold(Vec::<char>::new(), collect_as_lowercases)
        .into_iter()
        .take(26) // please take 26 only!!
        .count()
        == 26
}

Another Solution wih HashSet

It is generally great idea to use general function or library as many as you can. You can even feel free to use or compare between your own solution and one with general function. So the following code is with Hashset:

use std::collections::HashSet; // required to use it as  `HashSet`

pub fn is_pangram(sentence: &str) -> bool {
    let collect_as_lowercases = |mut acc: HashSet<char>, c: char| {
        if c.is_ascii_alphabetic() {
            let lc = c.to_ascii_lowercase();
            // basically same as push() except that we don't need to check

            // existence by ourselves. neat!
            acc.insert(lc);
        }
        acc
    };

    sentence
        .chars()
        .fold(HashSet::<char>::new(), collect_as_lowercases)
        .len()
        == 26
}

One more solution using collect()

I found this solution after mine.

use std::collections::HashSet;
use std::ascii::AsciiExt;

pub fn is_pangram(sentence: &str) -> bool
{
    sentence
        .to_lowercase()
        .chars()
        .filter(|&c| c.is_ascii() && c.is_alphabetic())
        .collect::<HashSet<char>>()
        .len() == 26
}

This is very clear solution. my .to_lowercase is applied to each char. but this solution apply the function with same name on the &str, Moreorver, magic .collect() function looks interesting, I’d like to dig more about later sometime.

Wrapping Up _정리하기

Procedule programming could be transformed into functional programming especially when the task is related into data transformation.
Breaking early is sometimes tricky for haskell beginner like me. We need to figure out how to do it by making custom function. or Monad Control in haskell (I hope I could learn more and talk about it.)

. Use general library is good idea for robust programming even though it is worth to try to make your own.

Little More Information About foldr

Note: foldr, sometimes, could be finshed earlier in conjunction with take. When the first element could be determined right away, which doesn’t require to access full accumlator.

λ> foldr (:) [] [1,2,3,4,5]
[1,2,3,4,5]
λ= take 10 $ foldr (:) [] [1,2..]
[1,2,3,4,5,6,7,8,9,10]
λ>

This behaviour achieved with only foldr, not foldl. This is beyond the subject of the article. I hope I could write about this later.

λ> take 10 $ foldl (flip (:)) [] [1,2..]
-- which runs forever.
-- So, I stopped by typing Ctrl-C
^CInterrupted.

Thank you for reading!!

If you need some classic music, I would recommend Yunchan Lim 임윤찬 – RACHMANINOV Piano Concerto No. 3 in D Minor, op. 30 – 2022 Cliburn Competition. Which was quite phenomenal performance for me even though I don’t know about the classic music.

I would say, connection between all of us is most important thing..