๐Ÿ“ฐ Poem Text Converter for Latex

Posted on March 4, 2024 by Myoungjin Jeon
Tags: literate haskell, haskell, lhs, latex, parser

What Does This Do?

This simple program converts sections of poetry written in text into LaTeX code compatible with the poemscol package. Itโ€™s important to note that it only converts the poetry sections, so youโ€™ll still need to write other LaTeX commands manually.

Why I made this?

Manually converting text to LaTeX is tedious, so Iโ€™m looking for a converter to streamline the process. Interestingly, I tried to adding a command within ChatGPT4 to convert my text into poemscol syntax, but it was too slow and became slower overtime. Recognizing that the task wasnโ€™t complex, I decided to tackle it myself using my favorite language (one Iโ€™m also eager to learn more about).

{-# LANGUAGE OverloadedStrings #-}

module Main (main) where

import System.Environment               as ENV

This program may not be large, but using Data.Text is a good idea for handling text. I believe this answer provides a good explanation for this approach. and this is a summary from ChatGPT v4

Data.Text is more space-efficient than Haskell's native String, which is a linked list of Chars with high space overhead. It is also more performant, providing better memory locality and interfacing more efficiently with native system libraries, especially for IO-heavy programs. Additionally, Data.Text offers text-specific functions not available with native Strings.

So, Data.Text, Data.Text.IO is imported.

import qualified Data.Text              as T
import qualified Data.Text.IO           as T
import Data.Text (Text)

Additionally, there are a few more modules to handle syntax. I prefer to use (>>>) instead of . for longer statements, especially when the code consists of more than four steps. This is because itโ€™s easier to follow than backward function combination when using ., and I use (<&>) for similar reasons (forward function composition).

import Data.Either (isRight)
import Control.Arrow ((>>>))
import Data.Functor ((<&>))
Generally speaking, translating some values into English words can provide a better understanding when reading the code. And this approach offers better control over options when retrieving values from the command line input by a user. โ€œSo, Iโ€™d like to recommend the following lines:โ€
beginTitle :: Text
beginTitle = "\\sequencetitle{"

endTitle :: Text
endTitle = "}\n\\poemtitle{}"

beginStanza :: Text
beginStanza = "\\begin{stanza}"

endStanza :: Text
endStanza = "\\end{stanza}"

beginPoem :: Text
beginPoem = "\\begin{poem}"

endPoem :: Text
endPoem = "\\end{poem}"

middleEOLSurfix :: Text
middleEOLSurfix = "\\verseline"

aIndent :: Text
aIndent = " "

-- ind: indent
ind :: Int -> Text
ind = (flip T.replicate) aIndent

The title will be retrieved from the command line, while the rest of the poem will be provided through STDIN. So firtly, I made getArgs for Data.Text version.

getArgsText :: IO [Text]
getArgsText = map T.pack <$> ENV.getArgs

It is always a good idea for a program to have a help message. Even if you are the person who created the program, you could forget how to use it and have to open the code again, LOL.

parseOpts :: [Text] -> IO (Either Text Text)
parseOpts ["-h"] = do
  progName <- T.pack <$> ENV.getProgName
  return $ Left $ "Usage: " <> progName <> " [OPTION] [A Poem Title]" <> "\n"
    <> "Return a poem structured for a LaTeX package, `poemscol'" <> "\n"
    <> "Read text data from STDIN." <> "\n\n"
    <> "-h        show this message." <> "\n\n"
    <> ":^]\n"

While it may not be the best design, the title will be retrieved from the entire command line arguments. The advantage is that you donโ€™t need to use quotes around the title.

parseOpts ts =
  return $ case (mkt ts) of "" -> Right ""
                            tt -> Right $ (addtex tt) <> "\n\n"
  where
    mkt    = T.unwords
    addtex = (beginTitle <>) . (flip (<>)) endTitle

And the poem text will be read from the STDIN! `parseContents will take previously parsed data, which is the poem title, and combine it with the parsed poem text.โ€

parseContents :: (Either Text Text) -> IO (Either Text Text)
parseContents ei =
  if isRight ei then
    do
      pb <- parseBody
      return . ((<> pb) <$>) $ ei
  else
    return ei

โ€œparseBody is the main part of the program, which:

  1. Groups by stanza.
  2. Adds a special command for each line (which is โ€œโ€) except for the last line of a stanza.
  3. Adds syntax for the stanza.
  4. Adds indentation to each line for better readability.
  where
    parseBody = T.getContents <&>
                (     T.splitOn "\n\n"  -- Group by stanza
                  >>> map T.lines       -- and then divide into lines
                                        -- within each group, NB: *map*
                  >>> map foldLinesWithTex
                  >>> foldStanzasWithTex
                )
    sil = 1 -- stanza indent level
    lil = sil + 1 -- line indent level
T.intercalate works very similarly to the general join function in most programming languages. This perfectly fits my need to ensure that the last line doesnโ€™t get an extra \verseline suffix.
    -- Add "\verseline" to the end of each line except the last line of a stanza.
    foldLinesWithTex = T.intercalate (middleEOLSurfix <> "\n" <> (ind lil))

Unsurprisingly, foldr is used for folding. โ˜บ๏ธ

    -- Wrap each stanza in a stanza structure with indentation.
    foldStanzasWithTex =
      foldr (\n acc ->
                if T.null n then
                  acc
                else
                  (ind sil) <> beginStanza <> "\n"
                  <> (ind lil) <> n <> "\n"
                  <> (ind sil) <> endStanza <> "\n"
                  <> acc
            ) ""

In the main function, combine that with the (>>=) operator. The last block of code will print out the result using T.putStr(Data.Text.putStr). In this case, I only have two scenarios for Either handling, but in both cases, Iโ€™ll print out a help message for Left or the result for Right.

main :: IO ()
main = getArgsText >>= parseOpts >>= parseContents >>=
  (\x -> case x of
      Left l -> T.putStr l -- this will be help message
      Right r -> T.putStr (beginPoem <> "\n" <> r <> endPoem <> "\n") )

Any Possible Improvements?

I could handle an โ€˜empty titleโ€™ as a Warning, but I donโ€™t feel itโ€™s necessary here. If it were a warning instead of Right โ€œโ€œ, I would need to handle the previous result differently in parseContents to check for any fixable errors that come in. Additionally, if I need to make changes, the Either Text Text data type is not sufficient to handle them correctly. Perhaps Either SomeErrorHandlingDataType Text or Either Error WarningAndParsed would be more suitable. I lean towards the second option because Iโ€™d like to parse the body even if there is no title.

Another Advantage

I could integrate this program within org-mode in Emacs, allowing me to write down the text and generate a syntaxed poem in the same place. Iโ€™ll post about this sooner or later, but before I get lazy, hereโ€™s a snippet:

**** poem
#+name: poem8
#+begin_verse
๋ˆˆ๋ฌผ๋ฐฉ์šธ ๊ฐ™์•˜๋‹ค.
ํ”ผ์›Œ๋‚ด๊ธฐ๋ณด๋‹ค
ํ„ฐ์ ธ๋‚˜์˜จ ๋“ฏํ•œ..

๋ชฉ๋ จ ๊ฝƒ๋ด‰์˜ค๋ฆฌ์˜
์šฐ์•„ํ•œ ๊ธฐ๋‹ค๋ฆผ.

์„œ์„œํžˆ ์ง„์‹ฌ์ด ํ”ผ์–ด๋‚œ๋‹ค.
๋งˆ์Œ์˜ ์—ฌ๋ฐฑ๊ณผ ๊ฐ™์€ ํ•˜์–€..

"์ง„์‹ฌ์—๋Š” ๋†’๋‚ฎ์ด๊ฐ€ ์žˆ๋Š”๊ฑธ๊นŒ."

๊ทธ ๋†’์ด๋ฅผ ๋งž์ถ”์–ด์•ผ
๋ˆˆ ์†์— ๋นจ๋ ค๋“ค์–ด์™€
๋งˆ์Œ์—๊นŒ์ง€ ๋ฐ•ํžˆ๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.
#+end_verse

#+begin_src sh :stdin poem_example :results output :var title="Magnolia"
poemscol-portion-exe $title
#+end_src

And if we execute the code above, we can get a result like the following:

#+RESULTS:
#+begin_example
\begin{poem}
\sequencetitle{Magnolia}
\poemtitle{}

 \begin{stanza}
  ๋ˆˆ๋ฌผ๋ฐฉ์šธ ๊ฐ™์•˜๋‹ค.\verseline
  ํ”ผ์›Œ๋‚ด๊ธฐ๋ณด๋‹ค\verseline
  ํ„ฐ์ ธ๋‚˜์˜จ ๋“ฏํ•œ..
 \end{stanza}
 \begin{stanza}
  ๋ชฉ๋ จ ๊ฝƒ๋ด‰์˜ค๋ฆฌ์˜\verseline
  ์šฐ์•„ํ•œ ๊ธฐ๋‹ค๋ฆผ.
 \end{stanza}
 \begin{stanza}
  ์„œ์„œํžˆ ์ง„์‹ฌ์ด ํ”ผ์–ด๋‚œ๋‹ค.\verseline
  ๋งˆ์Œ์˜ ์—ฌ๋ฐฑ๊ณผ ๊ฐ™์€ ํ•˜์–€..
 \end{stanza}
 \begin{stanza}
  "์ง„์‹ฌ์—๋Š” ๋†’๋‚ฎ์ด๊ฐ€ ์žˆ๋Š”๊ฑธ๊นŒ."
 \end{stanza}
 \begin{stanza}
  ๊ทธ ๋†’์ด๋ฅผ ๋งž์ถ”์–ด์•ผ\verseline
  ๋ˆˆ ์†์— ๋นจ๋ ค๋“ค์–ด์™€\verseline
  ๋งˆ์Œ์—๊นŒ์ง€ ๋ฐ•ํžˆ๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.
 \end{stanza}
\end{poem}
#+end_example

Thank you for reading ^^;