Introduction
The log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. In other words, if X is a random variable with a log-normal distribution, then the natural logarithm of X, ln(X), follows a normal distribution.
The probability density function (PDF) of the log-normal distribution is given by:
\begin{equation}
f(x;\mu,\sigma)=\frac{1}{x \sigma \sqrt{2\pi}}.exp(-\frac{(ln(x)-\mu)^2}{2\sigma^2})
\end{equation}
where:
- $x > 0$ is the value of the random variable
- $\mu$ is the mean of the natural logarithm of the random variable
- $\sigma$ is the standard deviation of the natural logarithm of the random variable
- $\pi$ is the mathematical constant pi (approximately equal to 3.14159)
The log-normal distribution is often used to model positively skewed data that cannot be accurately described by a normal distribution. It is commonly encountered in fields such as finance (e.g., stock prices), biology (e.g., growth rates of organisms), and environmental science (e.g., particle sizes).
To generate random numbers following a log-normal distribution in Python, you have the option of utilizing scipy.stats.lognorm or leveraging the numpy library.
Using Scipy lognorm function
With Scipy the log-normal distribution is characterized by two parameters, s (shape, sometimes also denoted as $\sigma$) and scale (which is (e^$\mu$), where $\mu$ is the mean of the log of the variable). There's also a loc parameter that is usually set to 0 for a standard log-normal distribution:
Generate random numbers from a log-normal distribution with a mean $\mu=0$
For instance, to generate ten thousand random points for a lognormal distribution with parameters $\mu=0$ and $\sigma=0.5$, follow these steps:
1 2 3 4 5 6 7 8 | # Define the standard deviation for the log-normal distribution std = 0.5 # Generate a single random sample from the log-normal distribution print(lognorm.rvs(std)) # Generate an array of random samples from the log-normal distribution data = lognorm.rvs(std, size=100000) |
We can now utilize matplotlib to generate a basic histogram, allowing us to visually assess the distribution of our data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # Import the lognorm module from the scipy.stats library from scipy.stats import lognorm # Import numpy and matplotlib.pyplot libraries import numpy as np import matplotlib.pyplot as plt # Plot a histogram of the generated data hx, hy, _ = plt.hist(data, bins=50, density=1, color="lightblue") # Set y-axis limits plt.ylim(0.0, max(hx) + 0.05) # Set plot title and grid plt.title('Generate random numbers \n from a log normal distribution with python') plt.grid() # Save the plot as an image file plt.savefig("generate_random_numbers_log_normal_distribution_01.png", bbox_inches='tight') # Display the plot plt.show() # Close the plot plt.close() |
Print the mean of the generated data
1 | print(np.mean(data)) |
The output will be:
1 | 1.1337967863987979
|
Print the theoretical mean of the log-normal distribution
1 | print(np.exp(std**2 / 2.0)) |
The output will be:
1 | 1.1331484530668263
|
Print the standard deviation of the generated data
1 | print(np.std(data)) |
The output will be:
1 | 0.6039652592147067
|
Calculate and print the theoretical standard deviation of the log-normal distribution
1 2 | var = (np.exp(std**2) - 1) * np.exp(std**2) print(np.sqrt(var)) |
The output will be:
1 | 0.6039005332108811
|
Generate random numbers from a log-normal distribution with a mean $\mu \neq 0$
Note: When $\mu \ne 0$, simply multiply by $exp(\mu)$. For instance:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # Set parameters for the lognormal distribution mu = 3.0 # Mean of the lognormal distribution (on the log scale) std = 0.5 # Standard deviation of the lognormal distribution (on the log scale) # Generate 100,000 random numbers from the lognormal distribution data = lognorm.rvs(std, size=100000) # Generate random samples data *= np.exp(mu) # Scale the data to match the desired mean # Create a histogram of the generated data hx, hy, _ = plt.hist(data, bins=50, density=1, color="lightblue") # Plot a histogram plt.ylim(0.0, max(hx) + 0.05) # Adjust y-axis limits for better visualization plt.title('Generate random numbers from a log normal distribution with Python') plt.grid() # Add gridlines for visual clarity # Save the plot as a PNG file plt.savefig("generate_random_numbers_log_normal_distribution_02.png", bbox_inches='tight') plt.show() # Display the histogram plt.close() # Close the plot to avoid memory issues # Calculate and print statistics of the generated data print(np.mean(data)) # Print the sample mean print(np.exp(mu + std**2 / 2.0)) # Print the theoretical mean of a lognormal distribution print(np.std(data)) # Print the sample standard deviation # Calculate and print the theoretical standard deviation var = (np.exp(std**2) - 1) * np.exp(2 * mu + std**2) # Theoretical variance formula print(np.sqrt(var)) # Print the theoretical standard deviation |
Generating random numbers from a lognormal distribution using NumPy.
\begin{equation}
exp(\mu + \sigma Z)
\end{equation}
To generate random numbers with a log-normal distribution using NumPy, follow these steps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Set parameters for the lognormal distribution (on the normal scale) mu = 3.0 # Mean of the normal distribution before transformation sigma = 0.5 # Standard deviation of the normal distribution before transformation # Generate 100,000 random numbers from a normal distribution data = np.random.randn(100000) # Generate standard normal samples data *= sigma # Scale the data to desired standard deviation data += mu # Shift the data to desired mean # Apply exponential transformation to create a lognormal distribution data = np.exp(data) # Transform to achieve a lognormal distribution # Create a histogram of the generated data plt.hist(data, bins=50, density=1, color="lightblue") # Plot the histogram plt.ylim(0.0, max(hx) + 0.05) # Adjust y-axis limits for better visualization plt.title('Generate random numbers from a log normal distribution with Python') plt.grid() # Add gridlines for visual clarity # Optionally, save the plot as a PNG file (commented out here) # plt.savefig("generate_random_numbers_log_normal_distribution_02.png", bbox_inches='tight') plt.show() # Display the histogram |
References
Links | Site |
---|---|
Normal distribution | wikipedia |
numpy.random.randn | doc scipy |
numpy.random.normal | doc scipy |