As the creator of my website, you’d be surprised by the amount of emails I receive, asking if they can SEO-optimize my website for me, for a certain price. So in the last few month, I started looking at what is actually wrong with my website. One of the issues were internal links, where you link one article to another of your own website. I’ve done this on many occasions before when writing on similar topics roughly around the same time. But If the time between two articles spans a year or two, I normally don’t go back to update the old one anymore, but I do try to encorporate it in the article I am writing on.
Table of Contents
In this article I want to talk in a short section on why internal links can optimize your search engine results. After that I want to explain what was probably one of the larger issues I had, namely broken links. It is too tedious to go through every page yourself and checking every link. I’ll go into details on how I tackled this problem. In the last section, I looked at improving my internal links. For this part I used machine learning to compare different articles and find out which articles should be linked together.
Are Internal Links Good for SEO?
The short answer: Yes. The long answer is still yes and for several reasons. Internal links help search engine bots, like Googlebot, crawl through your site. If a page isn’t linked from anywhere, it might never be discovered or indexed. They also help define your site’s structure, making it clear which pages are most important and how content is related. This organization helps both search engines and visitors.
Internal links also improve user experience. It helps readers find related content naturally, keeping them engaged longer and lowering bounce rates, both of which are positive SEO signals. Plus, when one page performs well, internal links can pass some of that authority (called link equity) to other pages, helping them rank better too.
On the flip side, broken links, links that point to non-existent or deleted pages (404 errors), can harm your SEO. Google sees this as poor maintenance and might rank your site lower because of it. That’s why regularly checking and updating internal links is just as important as creating them.
Broken Links and How I Fixed Them with Python
Broken links can seriously affect your website’s SEO. We have all encountered those 404 “Page Not Found” errors before, and most of us probably left the site to find another one that had the information we were looking for.
In my case, the issue started when I changed how my URLs were structured. They now follow the format https://www.florisera.com/name_of_my_article/. This change caused many of my older posts to contain broken links. Manually finding and fixing them would take far too much time, so I decided to create a Python script that automatically detects and records them for me. The full code can be found on my github.

In Figure 1, you can see the process flow I designed. It begins with my website’s sitemap, which contains a list of all article URLs. These URLs are stored in a single CSV file that acts as a small database where I can extract or add data. Using the Python library BeautifulSoup, the script parses the HTML of each article and extracts all internal links (stored as <a href="">).
At this point, I have a complete list of internal links, and I just need to verify if each one still works. For this, I use a HEAD request instead of a GET request, since I only care about whether the page exists and not about downloading its content. If no broken links are found, the script moves on to the next article. If it does find any, it logs them in the CSV file and continues until all articles have been checked. Once every page is processed, the script finishes.
Getting to this stage, I found no easy method to replace the broken links with the list of working ones. Luckily I only had to fix a total of 20 broken links on my entire website, and this was still managable.
Recommending Internal Links
The other thing I looked into was how to improve internal linking. I usually have an intuitive sense of which articles relate to each other, and with enough time I could manually link them all. However, since I was already using Python scripts, automating this process would be far more efficient and scalable.
Just like in my earlier web-crawling step, I reused my website’s sitemap containing all article URLs and scraped each page’s metadata using BeautifulSoup. For every article, I extracted the following fields:
- Title (
<title>orog:title) - Meta description (
<meta name="description">) - Keywords (from your “tag cloud” section)
- Excerpt (tries to parse from Elementor JS data)
After gathering all this information, I combined the title, excerpt, meta description, and keywords into a single text block per article. Computers, unlike humans, don’t actually “understand” text, they represent it as vectors in a high-dimensional space. Large Language Models (like ChatGPT) and smaller embedding models (like SentenceTransformers) convert text into these vector embeddings.
Semantical Cosine Similarity

In my case, I used the all-MiniLM-L6-v2 model from SentenceTransformers to generate embeddings for each article’s text block. I then calculated the cosine similarity between every pair of articles to measure how semantically close they are to each other. For each article, I selected the top 8 most similar posts (excluding itself, of course) and stored these results in a separate data file as shown in Figure 2. I color-coded them, where green shows a high similarity between the source URL and the target URL. Red on the other hand shows a very low similarity. In my example, for the 3-phase AC system, I only have a few related articles. While for the “dissertation research results”, almost all the other chapters are a high match, which is obvious.
Visualization of Link Network
From this database with similarity scores, I wanted to visualize my internal link recommendations in a network graph. A typical graph in python is made with the modules NetworkX and PyVis. The graph I created is shown in Figure 3. Each node is an article and arrows are drawn between similar articles. I did filter our weak links, which I set at a similarity ≥ 0.3.

From the graph, I highlighted three main topics that correspond perfectly with the sections featured in my website’s header, which is of course no coincidence. Each topic can be further divided into two smaller sub-groups like this:
- Blue: Electronics
- Tutorials on PIC16F877A
- Motors, AC vs DC, IGBTs
- Orange: Research and Writing
- Citation and References
- Dissertation
- Green: Spintronics and Physics
- CMOS, Scaling, SC Technology
- Ferromagnetism, Majorana, Physics
- Free-floating:
- Social Media
- Resistor Color (Change)
- Tips and motivational
- HSV paint
And finally, a few less categorizable topics like Links, CV, and History pages.
If you are interested, I’ve uploaded the results here, but it might look slightly different than Figure X, depending on the parameters you set.
Conclusion
When I started this article, I initially thought it might be possible to automatically insert links into my older posts. However, I decided to let that idea go, since I wouldn’t really know how to program something that finds the right spot in a text and adds a link in a natural way (perhaps an LLM like ChatGPT can do it, but then it needs access to my website). Instead, I tried doing it manually, with the dataset at hand. I managed to go through a few articles, but it turned out to be a rather tedious process, so I stopped about halfway. Still, it’s a useful technique to keep applying in future posts, especially while I’m already writing the article and the context is still fresh.
Florius
Hi, welcome to my website. I am writing about my previous studies, work & research related topics and other interests. I hope you enjoy reading it and that you learned something new.
More Posts








