To enhance the formatting of your content using Markdown and Pygments in Python, you'll need to understand how each of them works and how they can be combined effectively. Markdown is widely used for formatting plain text to make it look richer and more visually appealing, while Pygments is a syntax highlighting library in Python that supports coloring of code for a wide range of languages.
This tutorial will demonstrate how to effectively utilize both methods for formatting our content.
Case study
Let's consider this scenario: you have user-generated content that uses markdown syntax, with code lines indicated by tabulations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | content = """ ## Introduction Tutoral on how to ... ## Import necessary libraries import pandas as pd # Importing pandas library for data manipulation import numpy as np # Importing numpy library for numerical computations Set random seed for reproducibility np.random.seed(42) Generate random integer data in the range from -10 to 10 with a shape of 4x3 data = np.random.randint(-10, 10, size=(4, 3)) Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C' df = pd.DataFrame(data, columns=['A', 'B', 'C']) Print the DataFrame print(df)""" |
Converting the content into formatted Markdown
To convert the content into formatted Markdown with a table of contents (TOC) and line breaks. Here's the breakdown of what the code accomplishes:
1 2 3 | import markdown md = markdown.Markdown(extensions=['markdown.extensions.extra','markdown.extensions.toc','markdown.extensions.nl2br'], safe_mode=True) |
A markdown.Markdown object (md) is created with specific extensions enabled:
- markdown.extensions.extra: Enables additional markdown features like strikethrough and tables.
- markdown.extensions.toc: Generates a table of contents based on headings.
- markdown.extensions.nl2br: Converts newlines to HTML line breaks.
The content string containing the Python script is converted to formatted markdown using md.convert:
1 | content_formatted = md.convert(content) |
The converted markdown content (content_formatted) is then printed:
1 | print(content_formatted) |
The output will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <h2 id="introduction">Introduction</h2> <p>Tutoral on how to ...</p> <h2 id="import-necessary-libraries">Import necessary libraries</h2> <pre><code>import pandas as pd # Importing pandas library for data manipulation import numpy as np # Importing numpy library for numerical computations </code></pre> <p>Set random seed for reproducibility</p> <pre><code>np.random.seed(42) </code></pre> <p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p> <pre><code>data = np.random.randint(-10, 10, size=(4, 3)) </code></pre> <p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p> <pre><code>df = pd.DataFrame(data, columns=['A', 'B', 'C']) </code></pre> <p>Print the DataFrame</p> <pre><code>print(df) </code></pre> |
Utilizing Pygments for highlighting code snippets
First, we should import the Python module "re" and then locate all code sections in the content formatted with Markdown:
1 2 3 4 5 6 | import re for code_section in re.findall('\<pre\>\<code\>[\s\S]*?\<\/code\>\<\/pre\>', content_formatted): print(code_section) print('') |
The output will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <pre><code>import pandas as pd # Importing pandas library for data manipulation import numpy as np # Importing numpy library for numerical computations </code></pre> <pre><code>np.random.seed(42) </code></pre> <pre><code>data = np.random.randint(-10, 10, size=(4, 3)) </code></pre> <pre><code>df = pd.DataFrame(data, columns=['A', 'B', 'C']) </code></pre> <pre><code>print(df) </code></pre> |
Now we can write a code to highlight code snippets contained within HTML pre and code blocks using the Pygments library (see How to use the Python module Pygments for code highlighting ?):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from pygments import highlight from pygments.lexers import get_lexer_by_name from pygments.formatters import HtmlFormatter from pygments.lexers import guess_lexer import html import re for code_section in re.findall('\<pre\>\<code\>[\s\S]*?\<\/code\>\<\/pre\>', content_formatted): new_code_section = code_section.replace('<pre><code>', '') new_code_section = new_code_section.replace('</code></pre>', '') new_code_section = html.unescape(new_code_section) lexer = get_lexer_by_name("python", stripall=True) formatter = HtmlFormatter(linenos=True, cssclass="github-dark", style='default') new_code_section_highlight = highlight(new_code_section, lexer, formatter) #print(new_code_section_highlight) #print('') content_formatted = content_formatted.replace(code_section, new_code_section_highlight) print(content_formatted) |
The output will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <h2 id="introduction">Introduction</h2> <p>Tutoral on how to ...</p> <h2 id="import-necessary-libraries">Import necessary libraries</h2> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span> <span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span> <span class="c1"># Importing pandas library for data manipulation</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="c1"># Importing numpy library for numerical computations</span> </pre></div></td></tr></table></div> <p>Set random seed for reproducibility</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> </pre></div></td></tr></table></div> <p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> </pre></div></td></tr></table></div> <p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">,</span> <span class="s1">'C'</span><span class="p">])</span> </pre></div></td></tr></table></div> <p>Print the DataFrame</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> </pre></div></td></tr></table></div> |
Creating a simple HTML page
We are now able to generate a simple HTML page:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | <!DOCTYPE html> <html> <style> pre { line-height: 125%; } td.linenos .normal { color: #6e7681; background-color: #0d1117; padding-left: 5px; padding-right: 5px; } span.linenos { color: #6e7681; background-color: #0d1117; padding-left: 5px; padding-right: 5px; } td.linenos .special { color: #e6edf3; background-color: #6e7681; padding-left: 5px; padding-right: 5px; } span.linenos.special { color: #e6edf3; background-color: #6e7681; padding-left: 5px; padding-right: 5px; } .github-dark .hll { background-color: #6e7681 } .github-dark { background: #0d1117; color: #e6edf3 } .github-dark .c { color: #8b949e; font-style: italic } /* Comment */ .github-dark .err { color: #f85149 } /* Error */ .github-dark .esc { color: #e6edf3 } /* Escape */ .github-dark .g { color: #e6edf3 } /* Generic */ .github-dark .k { color: #ff7b72 } /* Keyword */ .github-dark .l { color: #a5d6ff } /* Literal */ .github-dark .n { color: #e6edf3 } /* Name */ .github-dark .o { color: #ff7b72; font-weight: bold } /* Operator */ .github-dark .x { color: #e6edf3 } /* Other */ .github-dark .p { color: #e6edf3 } /* Punctuation */ .github-dark .ch { color: #8b949e; font-style: italic } /* Comment.Hashbang */ .github-dark .cm { color: #8b949e; font-style: italic } /* Comment.Multiline */ .github-dark .cp { color: #8b949e; font-weight: bold; font-style: italic } /* Comment.Preproc */ .github-dark .cpf { color: #8b949e; font-style: italic } /* Comment.PreprocFile */ .github-dark .c1 { color: #8b949e; font-style: italic } /* Comment.Single */ .github-dark .cs { color: #8b949e; font-weight: bold; font-style: italic } /* Comment.Special */ .github-dark .gd { color: #ffa198; background-color: #490202 } /* Generic.Deleted */ .github-dark .ge { color: #e6edf3; font-style: italic } /* Generic.Emph */ .github-dark .ges { color: #e6edf3; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ .github-dark .gr { color: #ffa198 } /* Generic.Error */ .github-dark .gh { color: #79c0ff; font-weight: bold } /* Generic.Heading */ .github-dark .gi { color: #56d364; background-color: #0f5323 } /* Generic.Inserted */ .github-dark .go { color: #8b949e } /* Generic.Output */ .github-dark .gp { color: #8b949e } /* Generic.Prompt */ .github-dark .gs { color: #e6edf3; font-weight: bold } /* Generic.Strong */ .github-dark .gu { color: #79c0ff } /* Generic.Subheading */ .github-dark .gt { color: #ff7b72 } /* Generic.Traceback */ .github-dark .g-Underline { color: #e6edf3; text-decoration: underline } /* Generic.Underline */ .github-dark .kc { color: #79c0ff } /* Keyword.Constant */ .github-dark .kd { color: #ff7b72 } /* Keyword.Declaration */ .github-dark .kn { color: #ff7b72 } /* Keyword.Namespace */ .github-dark .kp { color: #79c0ff } /* Keyword.Pseudo */ .github-dark .kr { color: #ff7b72 } /* Keyword.Reserved */ .github-dark .kt { color: #ff7b72 } /* Keyword.Type */ .github-dark .ld { color: #79c0ff } /* Literal.Date */ .github-dark .m { color: #a5d6ff } /* Literal.Number */ .github-dark .s { color: #a5d6ff } /* Literal.String */ .github-dark .na { color: #e6edf3 } /* Name.Attribute */ .github-dark .nb { color: #e6edf3 } /* Name.Builtin */ .github-dark .nc { color: #f0883e; font-weight: bold } /* Name.Class */ .github-dark .no { color: #79c0ff; font-weight: bold } /* Name.Constant */ .github-dark .nd { color: #d2a8ff; font-weight: bold } /* Name.Decorator */ .github-dark .ni { color: #ffa657 } /* Name.Entity */ .github-dark .ne { color: #f0883e; font-weight: bold } /* Name.Exception */ .github-dark .nf { color: #d2a8ff; font-weight: bold } /* Name.Function */ .github-dark .nl { color: #79c0ff; font-weight: bold } /* Name.Label */ .github-dark .nn { color: #ff7b72 } /* Name.Namespace */ .github-dark .nx { color: #e6edf3 } /* Name.Other */ .github-dark .py { color: #79c0ff } /* Name.Property */ .github-dark .nt { color: #7ee787 } /* Name.Tag */ .github-dark .nv { color: #79c0ff } /* Name.Variable */ .github-dark .ow { color: #ff7b72; font-weight: bold } /* Operator.Word */ .github-dark .pm { color: #e6edf3 } /* Punctuation.Marker */ .github-dark .w { color: #6e7681 } /* Text.Whitespace */ .github-dark .mb { color: #a5d6ff } /* Literal.Number.Bin */ .github-dark .mf { color: #a5d6ff } /* Literal.Number.Float */ .github-dark .mh { color: #a5d6ff } /* Literal.Number.Hex */ .github-dark .mi { color: #a5d6ff } /* Literal.Number.Integer */ .github-dark .mo { color: #a5d6ff } /* Literal.Number.Oct */ .github-dark .sa { color: #79c0ff } /* Literal.String.Affix */ .github-dark .sb { color: #a5d6ff } /* Literal.String.Backtick */ .github-dark .sc { color: #a5d6ff } /* Literal.String.Char */ .github-dark .dl { color: #79c0ff } /* Literal.String.Delimiter */ .github-dark .sd { color: #a5d6ff } /* Literal.String.Doc */ .github-dark .s2 { color: #a5d6ff } /* Literal.String.Double */ .github-dark .se { color: #79c0ff } /* Literal.String.Escape */ .github-dark .sh { color: #79c0ff } /* Literal.String.Heredoc */ .github-dark .si { color: #a5d6ff } /* Literal.String.Interpol */ .github-dark .sx { color: #a5d6ff } /* Literal.String.Other */ .github-dark .sr { color: #79c0ff } /* Literal.String.Regex */ .github-dark .s1 { color: #a5d6ff } /* Literal.String.Single */ .github-dark .ss { color: #a5d6ff } /* Literal.String.Symbol */ .github-dark .bp { color: #e6edf3 } /* Name.Builtin.Pseudo */ .github-dark .fm { color: #d2a8ff; font-weight: bold } /* Name.Function.Magic */ .github-dark .vc { color: #79c0ff } /* Name.Variable.Class */ .github-dark .vg { color: #79c0ff } /* Name.Variable.Global */ .github-dark .vi { color: #79c0ff } /* Name.Variable.Instance */ .github-dark .vm { color: #79c0ff } /* Name.Variable.Magic */ .github-dark .il { color: #a5d6ff } /* Literal.Number.Integer.Long */ </style> <body> <h2 id="introduction">Introduction</h2> <p>Tutoral on how to ...</p> <h2 id="import-necessary-libraries">Import necessary libraries</h2> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span> <span class="normal">2</span></pre></div></td><td class="code"><div><pre><span></span><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span> <span class="c1"># Importing pandas library for data manipulation</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span> <span class="c1"># Importing numpy library for numerical computations</span> </pre></div></td></tr></table></div> <p>Set random seed for reproducibility</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">42</span><span class="p">)</span> </pre></div></td></tr></table></div> <p>Generate random integer data in the range from -10 to 10 with a shape of 4x3</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span> </pre></div></td></tr></table></div> <p>Create a pandas DataFrame using the generated random data, with column names 'A', 'B', and 'C'</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s1">'A'</span><span class="p">,</span> <span class="s1">'B'</span><span class="p">,</span> <span class="s1">'C'</span><span class="p">])</span> </pre></div></td></tr></table></div> <p>Print the DataFrame</p> <div class="github-dark"><table class="github-darktable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal">1</span></pre></div></td><td class="code"><div><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">df</span><span class="p">)</span> </pre></div></td></tr></table></div> </body> </html> |
When combining Markdown and Pygments, typically you would use Pygments to generate the HTML for your code snippets separately and then insert that HTML into your Markdown document. Note, however, that not all Markdown renderers support raw HTML, or there may be some restrictions when mixing HTML and Markdown, so make sure to test your output.
Additionally, if you are generating documentation or web pages, consider tools and static site generators that integrate Markdown and Pygments, such as MkDocs or Jekyll, which simplify the process and offer seamless integration.
By combining Markdown's simplicity with Pygments' powerful syntax highlighting, you can create richly formatted, visually appealing content that makes your text and code snippets both readable and engaging.
