RSS Daily tech news
  • Quantum crystals could spark the next tech revolution
    Auburn scientists have designed new materials that manipulate free electrons to unlock groundbreaking applications. These “Surface Immobilized Electrides” could power future quantum computers or transform chemical manufacturing. Stable, tunable, and scalable, they represent a leap beyond traditional electrides. The work bridges theory and potential real-world use.
  • Decades-old photosynthesis mystery finally solved
    Scientists from the Indian Institute of Science (IISc) and Caltech have finally solved a decades-old mystery about how photosynthesis really begins. They discovered why energy inside plants flows down only one of two possible routes — a design that lets nature move sunlight with astonishing precision. Using advanced computer simulations, the researchers showed that one […]
  • A century-old piano mystery has just been solved
    Scientists confirmed that pianists can alter timbre through touch, using advanced sensors to capture micro-movements that shape sound perception. The discovery bridges art and science, promising applications in music education, neuroscience, and beyond.
  • Princeton’s AI reveals what fusion sensors can’t see
    A powerful new AI tool called Diag2Diag is revolutionizing fusion research by filling in missing plasma data with synthetic yet highly detailed information. Developed by Princeton scientists and international collaborators, this system uses sensor input to predict readings other diagnostics can’t capture, especially in the crucial plasma edge region where stability determines performance. By reducing […]
  • Heisenberg said it was impossible. Scientists just proved otherwise
    Researchers have reimagined Heisenberg’s uncertainty principle, engineering a trade-off that allows precise measurement of both position and momentum. Using quantum computing tools like grid states and trapped ions, they demonstrated sensing precision beyond classical limits. Such advances could revolutionize navigation, medicine, and physics, while underscoring the global collaboration driving quantum research.
  • This new camera sees the invisible in 3D without lenses
    Scientists have developed a lens-free mid-infrared camera using a modern twist on pinhole imaging. The system uses nonlinear crystals to convert infrared light into visible, allowing standard sensors to capture sharp, wide-range images without distortion. It can also create precise 3D reconstructions even in extremely low light. Though still experimental, the technology promises affordable, portable […]

Automating Internal Links – SEO

by Florius

As the creator of my website, you’d be surprised by the amount of emails I receive, asking if they can SEO-optimize my website for me, for a certain price. So in the last few month, I started looking at what is actually wrong with my website. One of the issues were internal links, where you link one article to another of your own website. I’ve done this on many occasions before when writing on similar topics roughly around the same time. But If the time between two articles spans a year or two, I normally don’t go back to update the old one anymore, but I do try to encorporate it in the article I am writing on. 

Table of Contents

In this article I want to talk in a short section on why internal links can optimize your search engine results. After that I want to explain what was probably one of the larger issues I had, namely broken links. It is too tedious to go through every page yourself and checking every link. I’ll go into details on how I tackled this problem. In the last section, I looked at improving my internal links. For this part I used machine learning to compare different articles and find out which articles should be linked together.

Are Internal Links Good for SEO?

The short answer: Yes. The long answer is still yes and for several reasons. Internal links help search engine bots, like Googlebot, crawl through your site. If a page isn’t linked from anywhere, it might never be discovered or indexed. They also help define your site’s structure, making it clear which pages are most important and how content is related. This organization helps both search engines and visitors.

Internal links also improve user experience. It helps readers find related content naturally, keeping them engaged longer and lowering bounce rates, both of which are positive SEO signals. Plus, when one page performs well, internal links can pass some of that authority (called link equity) to other pages, helping them rank better too.

On the flip side, broken links, links that point to non-existent or deleted pages (404 errors), can harm your SEO. Google sees this as poor maintenance and might rank your site lower because of it. That’s why regularly checking and updating internal links is just as important as creating them.

Broken Links and How I Fixed Them with Python

Broken links can seriously affect your website’s SEO. We have all encountered those 404 “Page Not Found” errors before, and most of us probably left the site to find another one that had the information we were looking for.

In my case, the issue started when I changed how my URLs were structured. They now follow the format https://www.florisera.com/name_of_my_article/.  This change caused many of my older posts to contain broken links. Manually finding and fixing them would take far too much time, so I decided to create a Python script that automatically detects and records them for me. The full code can be found on my github.

A flowchart shows “Sitemap → Crawler → Load CSV.” From “Next page?” a No branch goes to “Done.” The Yes branch goes to “Single page HTML → Check Links → 404?”. If 404 is Yes, the link is added to “Broken links” and the flow returns to “Next page?”. If 404 is No, it also loops back to “Next page?”, repeating until finished.
Figure 1. Broken-link checker workflow: crawl sitemap, paginate through URLs from CSV, fetch each page, check links, log 404s, repeat until no next page, then finish.

In Figure 1, you can see the process flow I designed. It begins with my website’s sitemap, which contains a list of all article URLs. These URLs are stored in a single CSV file that acts as a small database where I can extract or add data. Using the Python library BeautifulSoup, the script parses the HTML of each article and extracts all internal links (stored as <a href="">).

At this point, I have a complete list of internal links, and I just need to verify if each one still works. For this, I use a HEAD request instead of a GET request, since I only care about whether the page exists and not about downloading its content. If no broken links are found, the script moves on to the next article. If it does find any, it logs them in the CSV file and continues until all articles have been checked. Once every page is processed, the script finishes.

Getting to this stage, I found no easy method to replace the broken links with the list of working ones. Luckily I only had to fix a total of 20 broken links on my entire website, and this was still managable. 

Recommending Internal Links

The other thing I looked into was how to improve internal linking. I usually have an intuitive sense of which articles relate to each other, and with enough time I could manually link them all. However, since I was already using Python scripts, automating this process would be far more efficient and scalable. 

Just like in my earlier web-crawling step, I reused my website’s sitemap containing all article URLs and scraped each page’s metadata using BeautifulSoup. For every article, I extracted the following fields:

  • Title (<title> or og:title)
  • Meta description (<meta name="description">)
  • Keywords (from your “tag cloud” section)
  • Excerpt (tries to parse from Elementor JS data)

After gathering all this information, I combined the title, excerpt, meta description, and keywords into a single text block per article. Computers, unlike humans, don’t actually “understand” text, they represent it as vectors in a high-dimensional space. Large Language Models (like ChatGPT) and smaller embedding models (like SentenceTransformers) convert text into these vector embeddings.

Semantical Cosine Similarity

A three-column table titled “source_url, target_url, similarity_score” with heat-colored scores (green=high, red=low). Rows show suggestions like /3-phase-ac-systems/ linking to articles on AC motors, brushed vs. brushless motors, IGBT inverters, energy transfer in DC/AC, MOSFET vs. IGBT, capacitors, PWM in PIC16F877A, and IEEE reference style, with scores from ~0.22–0.56. Another block shows /dissertation-research-results-tips-and-example/ linking to dissertation pages (conclusion, methodology, discussion, introduction, overview, appendix, literature review, preface) with higher scores ~0.61–0.86.
Figure 2. Internal-link recommendations ranked by semantic similarity; greener cells indicate higher scores between source and target URLs.

In my case, I used the all-MiniLM-L6-v2 model from SentenceTransformers to generate embeddings for each article’s text block. I then calculated the cosine similarity between every pair of articles to measure how semantically close they are to each other. For each article, I selected the top 8 most similar posts (excluding itself, of course) and stored these results in a separate data file as shown in Figure 2. I color-coded them, where green shows a high similarity between the source URL and the target URL. Red on the other hand shows a very low similarity. In my example, for the 3-phase AC system, I only have a few related articles. While for the “dissertation research results”, almost all the other chapters are a high match, which is obvious. 

Visualization of Link Network

From this database with similarity scores, I wanted to visualize my internal link recommendations in a network graph. A typical graph in python is made with the modules NetworkX and PyVis. The graph I created is shown in Figure 3. Each node is an article and arrows are drawn between similar articles. I did filter our weak links, which I set at a similarity ≥ 0.3. 

A large node-link graph showing many small blue nodes connected by curved blue edges. Three colored outlines highlight communities: a blue cluster at top-right (electronics topics), an orange cluster at left (research/writing posts), and a green cluster at bottom-right (spintronics). Dense areas indicate strongly interlinked posts; a few tiny subgraphs sit around the periphery with fewer connections.
Figure 3. Internal-link network of my site, clustered into three themes: Electronics (blue), Research & Writing (orange), and Spintronics (green). Nodes are posts; lines are suggested links.

From the graph, I highlighted three main topics that correspond perfectly with the sections featured in my website’s header, which is of course no coincidence. Each topic can be further divided into two smaller sub-groups like this:

  • Blue: Electronics
    • Tutorials on PIC16F877A
    • Motors, AC vs DC, IGBTs
  • Orange: Research and Writing
    • Citation and References
    • Dissertation
  • Green: Spintronics and Physics
    • CMOS, Scaling, SC Technology
    • Ferromagnetism, Majorana, Physics
  • Free-floating:
    • Social Media 
    • Resistor Color (Change)
    • Tips and motivational
    • HSV paint

And finally, a few less categorizable topics like Links, CV, and History pages.

If you are interested, I’ve uploaded the results here, but it might look slightly different than Figure X, depending on the parameters you set.

Conclusion

When I started this article, I initially thought it might be possible to automatically insert links into my older posts. However, I decided to let that idea go, since I wouldn’t really know how to program something that finds the right spot in a text and adds a link in a natural way (perhaps an LLM like ChatGPT can do it, but then it needs access to my website). Instead, I tried doing it manually, with the dataset at hand. I managed to go through a few articles, but it turned out to be a rather tedious process, so I stopped about halfway. Still, it’s a useful technique to keep applying in future posts, especially while I’m already writing the article and the context is still fresh.

Florius

Hi, welcome to my website. I am writing about my previous studies, work & research related topics and other interests. I hope you enjoy reading it and that you learned something new.

More Posts

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.