How to use Markdown and Pygments in Python to enhance the formatting of your content ?

Published: March 21, 2024

Tags: Python; Pygments; Markdown;

DMCA.com Protection Status

Introduction

To enhance the formatting of your content using Markdown and Pygments in Python, you'll need to understand how each of them works and how they can be combined effectively. Markdown is widely used for formatting plain text to make it look richer and more visually appealing, while Pygments is a syntax highlighting library in Python that supports coloring of code for a wide range of languages.

This tutorial will demonstrate how to effectively utilize both methods for formatting our content.

See also the previous article: How to use the Python module Pygments for code highlighting ?

Case study

Let's consider this scenario: you have user-generated content that uses markdown syntax, with code lines indicated by tabulations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
content = """
## Introduction

Tutoral on how to ...

## Import necessary libraries

    import pandas as pd  # Importing pandas library for data manipulation
    import numpy as np   # Importing numpy library for numerical computations

Set random seed for reproducibility

    np.random.seed(42)

Generate random integer data in the range from -10 to 10 with a shape of 4x3

    data = np.random.randint(-10, 10, size=(4, 3))

Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'

    df = pd.DataFrame(data, columns=['A', 'B', 'C'])

Print the DataFrame

    print(df)"""

Converting the content into formatted Markdown

To convert the content into formatted Markdown with a table of contents (TOC) and line breaks. Here's the breakdown of what the code accomplishes:

1
2
3
import markdown

md = markdown.Markdown(extensions=['markdown.extensions.extra','markdown.extensions.toc','markdown.extensions.nl2br'], safe_mode=True)

A markdown.Markdown object (md) is created with specific extensions enabled:

  • markdown.extensions.extra: Enables additional markdown features like strikethrough and tables.
  • markdown.extensions.toc: Generates a table of contents based on headings.
  • markdown.extensions.nl2br: Converts newlines to HTML line breaks.

The content string containing the Python script is converted to formatted markdown using md.convert:

1
content_formatted = md.convert(content)

The converted markdown content (content_formatted) is then printed:

1
print(content_formatted)

The output will be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<h2 id="introduction">Introduction</h2>
<p>Tutoral on how to ...</p>
<h2 id="import-necessary-libraries">Import necessary libraries</h2>
<pre><code>import pandas as pd  # Importing pandas library for data manipulation
import numpy as np   # Importing numpy library for numerical computations
</code></pre>
<p>Set random seed for reproducibility</p>
<pre><code>np.random.seed(42)
</code></pre>
<p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p>
<pre><code>data = np.random.randint(-10, 10, size=(4, 3))
</code></pre>
<p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p>
<pre><code>df = pd.DataFrame(data, columns=['A', 'B', 'C'])
</code></pre>
<p>Print the DataFrame</p>
<pre><code>print(df)
</code></pre>

Utilizing Pygments for highlighting code snippets

First, we should import the Python module "re" and then locate all code sections in the content formatted with Markdown:

1
2
3
4
5
6
import re

for code_section in re.findall('\<pre\>\<code\>[\s\S]*?\<\/code\>\<\/pre\>', content_formatted):

    print(code_section)
    print('')

The output will be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<pre><code>import pandas as pd  # Importing pandas library for data manipulation
import numpy as np   # Importing numpy library for numerical computations
</code></pre>

<pre><code>np.random.seed(42)
</code></pre>

<pre><code>data = np.random.randint(-10, 10, size=(4, 3))
</code></pre>

<pre><code>df = pd.DataFrame(data, columns=['A', 'B', 'C'])
</code></pre>

<pre><code>print(df)
</code></pre>

Now we can write a code to highlight code snippets contained within HTML pre and code blocks using the Pygments library (see How to use the Python module Pygments for code highlighting ?):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter
from pygments.lexers import guess_lexer

import html
import re

for code_section in re.findall('\<pre\>\<code\>[\s\S]*?\<\/code\>\<\/pre\>', content_formatted):

    new_code_section = code_section.replace('<pre><code>', '')
    new_code_section = new_code_section.replace('</code></pre>', '')

    new_code_section = html.unescape(new_code_section)

    lexer = get_lexer_by_name("python", stripall=True)
    formatter = HtmlFormatter(linenos=True, cssclass="github-dark", style='default')

    new_code_section_highlight = highlight(new_code_section, lexer, formatter)

    #print(new_code_section_highlight)
    #print('')
    content_formatted = content_formatted.replace(code_section, new_code_section_highlight)


print(content_formatted)

The output will be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<h2 id="introduction">Introduction</h2>
<p>Tutoral on how to ...</p>
<h2 id="import-necessary-libraries">Import necessary libraries</h2>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>  <span class="c1"># Importing pandas library for data manipulation</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>   <span class="c1"># Importing numpy library for numerical computations</span>
</pre></div></td></tr></table></div>

<p>Set random seed for reproducibility</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</pre></div></td></tr></table></div>

<p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</pre></div></td></tr></table></div>

<p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="s1">&#39;B&#39;</span><span class="p">,</span> <span class="s1">&#39;C&#39;</span><span class="p">])</span>
</pre></div></td></tr></table></div>

<p>Print the DataFrame</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</pre></div></td></tr></table></div>

Creating a simple HTML page

We are now able to generate a simple HTML page:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<!DOCTYPE html>
<html>

<style>

  pre { line-height: 125%; }
  td.linenos .normal { color: #6e7681; background-color: #0d1117; padding-left: 5px; padding-right: 5px; }
  span.linenos { color: #6e7681; background-color: #0d1117; padding-left: 5px; padding-right: 5px; }
  td.linenos .special { color: #e6edf3; background-color: #6e7681; padding-left: 5px; padding-right: 5px; }
  span.linenos.special { color: #e6edf3; background-color: #6e7681; padding-left: 5px; padding-right: 5px; }
  .github-dark .hll { background-color: #6e7681 }
  .github-dark { background: #0d1117; color: #e6edf3 }
  .github-dark .c { color: #8b949e; font-style: italic } /* Comment */
  .github-dark .err { color: #f85149 } /* Error */
  .github-dark .esc { color: #e6edf3 } /* Escape */
  .github-dark .g { color: #e6edf3 } /* Generic */
  .github-dark .k { color: #ff7b72 } /* Keyword */
  .github-dark .l { color: #a5d6ff } /* Literal */
  .github-dark .n { color: #e6edf3 } /* Name */
  .github-dark .o { color: #ff7b72; font-weight: bold } /* Operator */
  .github-dark .x { color: #e6edf3 } /* Other */
  .github-dark .p { color: #e6edf3 } /* Punctuation */
  .github-dark .ch { color: #8b949e; font-style: italic } /* Comment.Hashbang */
  .github-dark .cm { color: #8b949e; font-style: italic } /* Comment.Multiline */
  .github-dark .cp { color: #8b949e; font-weight: bold; font-style: italic } /* Comment.Preproc */
  .github-dark .cpf { color: #8b949e; font-style: italic } /* Comment.PreprocFile */
  .github-dark .c1 { color: #8b949e; font-style: italic } /* Comment.Single */
  .github-dark .cs { color: #8b949e; font-weight: bold; font-style: italic } /* Comment.Special */
  .github-dark .gd { color: #ffa198; background-color: #490202 } /* Generic.Deleted */
  .github-dark .ge { color: #e6edf3; font-style: italic } /* Generic.Emph */
  .github-dark .ges { color: #e6edf3; font-weight: bold; font-style: italic } /* Generic.EmphStrong */
  .github-dark .gr { color: #ffa198 } /* Generic.Error */
  .github-dark .gh { color: #79c0ff; font-weight: bold } /* Generic.Heading */
  .github-dark .gi { color: #56d364; background-color: #0f5323 } /* Generic.Inserted */
  .github-dark .go { color: #8b949e } /* Generic.Output */
  .github-dark .gp { color: #8b949e } /* Generic.Prompt */
  .github-dark .gs { color: #e6edf3; font-weight: bold } /* Generic.Strong */
  .github-dark .gu { color: #79c0ff } /* Generic.Subheading */
  .github-dark .gt { color: #ff7b72 } /* Generic.Traceback */
  .github-dark .g-Underline { color: #e6edf3; text-decoration: underline } /* Generic.Underline */
  .github-dark .kc { color: #79c0ff } /* Keyword.Constant */
  .github-dark .kd { color: #ff7b72 } /* Keyword.Declaration */
  .github-dark .kn { color: #ff7b72 } /* Keyword.Namespace */
  .github-dark .kp { color: #79c0ff } /* Keyword.Pseudo */
  .github-dark .kr { color: #ff7b72 } /* Keyword.Reserved */
  .github-dark .kt { color: #ff7b72 } /* Keyword.Type */
  .github-dark .ld { color: #79c0ff } /* Literal.Date */
  .github-dark .m { color: #a5d6ff } /* Literal.Number */
  .github-dark .s { color: #a5d6ff } /* Literal.String */
  .github-dark .na { color: #e6edf3 } /* Name.Attribute */
  .github-dark .nb { color: #e6edf3 } /* Name.Builtin */
  .github-dark .nc { color: #f0883e; font-weight: bold } /* Name.Class */
  .github-dark .no { color: #79c0ff; font-weight: bold } /* Name.Constant */
  .github-dark .nd { color: #d2a8ff; font-weight: bold } /* Name.Decorator */
  .github-dark .ni { color: #ffa657 } /* Name.Entity */
  .github-dark .ne { color: #f0883e; font-weight: bold } /* Name.Exception */
  .github-dark .nf { color: #d2a8ff; font-weight: bold } /* Name.Function */
  .github-dark .nl { color: #79c0ff; font-weight: bold } /* Name.Label */
  .github-dark .nn { color: #ff7b72 } /* Name.Namespace */
  .github-dark .nx { color: #e6edf3 } /* Name.Other */
  .github-dark .py { color: #79c0ff } /* Name.Property */
  .github-dark .nt { color: #7ee787 } /* Name.Tag */
  .github-dark .nv { color: #79c0ff } /* Name.Variable */
  .github-dark .ow { color: #ff7b72; font-weight: bold } /* Operator.Word */
  .github-dark .pm { color: #e6edf3 } /* Punctuation.Marker */
  .github-dark .w { color: #6e7681 } /* Text.Whitespace */
  .github-dark .mb { color: #a5d6ff } /* Literal.Number.Bin */
  .github-dark .mf { color: #a5d6ff } /* Literal.Number.Float */
  .github-dark .mh { color: #a5d6ff } /* Literal.Number.Hex */
  .github-dark .mi { color: #a5d6ff } /* Literal.Number.Integer */
  .github-dark .mo { color: #a5d6ff } /* Literal.Number.Oct */
  .github-dark .sa { color: #79c0ff } /* Literal.String.Affix */
  .github-dark .sb { color: #a5d6ff } /* Literal.String.Backtick */
  .github-dark .sc { color: #a5d6ff } /* Literal.String.Char */
  .github-dark .dl { color: #79c0ff } /* Literal.String.Delimiter */
  .github-dark .sd { color: #a5d6ff } /* Literal.String.Doc */
  .github-dark .s2 { color: #a5d6ff } /* Literal.String.Double */
  .github-dark .se { color: #79c0ff } /* Literal.String.Escape */
  .github-dark .sh { color: #79c0ff } /* Literal.String.Heredoc */
  .github-dark .si { color: #a5d6ff } /* Literal.String.Interpol */
  .github-dark .sx { color: #a5d6ff } /* Literal.String.Other */
  .github-dark .sr { color: #79c0ff } /* Literal.String.Regex */
  .github-dark .s1 { color: #a5d6ff } /* Literal.String.Single */
  .github-dark .ss { color: #a5d6ff } /* Literal.String.Symbol */
  .github-dark .bp { color: #e6edf3 } /* Name.Builtin.Pseudo */
  .github-dark .fm { color: #d2a8ff; font-weight: bold } /* Name.Function.Magic */
  .github-dark .vc { color: #79c0ff } /* Name.Variable.Class */
  .github-dark .vg { color: #79c0ff } /* Name.Variable.Global */
  .github-dark .vi { color: #79c0ff } /* Name.Variable.Instance */
  .github-dark .vm { color: #79c0ff } /* Name.Variable.Magic */
  .github-dark .il { color: #a5d6ff } /* Literal.Number.Integer.Long */

</style>

<body>

<h2 id="introduction">Introduction</h2>
<p>Tutoral on how to ...</p>
<h2 id="import-necessary-libraries">Import necessary libraries</h2>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span>
<span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>  <span class="c1"># Importing pandas library for data manipulation</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>   <span class="c1"># Importing numpy library for numerical computations</span>
</pre></div></td></tr></table></div>

<p>Set random seed for reproducibility</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span>
</pre></div></td></tr></table></div>

<p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
</pre></div></td></tr></table></div>

<p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="s1">&#39;B&#39;</span><span class="p">,</span> <span class="s1">&#39;C&#39;</span><span class="p">])</span>
</pre></div></td></tr></table></div>

<p>Print the DataFrame</p>
<div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
</pre></div></td></tr></table></div>

</body>

</html>

How to use  Markdown and Pygments in Python to enhance the formatting of your content ?
How to use Markdown and Pygments in Python to enhance the formatting of your content ?

Conclusion

When combining Markdown and Pygments, typically you would use Pygments to generate the HTML for your code snippets separately and then insert that HTML into your Markdown document. Note, however, that not all Markdown renderers support raw HTML, or there may be some restrictions when mixing HTML and Markdown, so make sure to test your output.

Additionally, if you are generating documentation or web pages, consider tools and static site generators that integrate Markdown and Pygments, such as MkDocs or Jekyll, which simplify the process and offer seamless integration.

By combining Markdown's simplicity with Pygments' powerful syntax highlighting, you can create richly formatted, visually appealing content that makes your text and code snippets both readable and engaging.

References

Links Site
Introduction and Quickstart pygments.org
Markdown python-markdown.github.io

Articles of interest

Image

of