PythonIntermediate PythonNumPySQLGen AI
HTMLCSSJavaScriptIntermediate JavaScriptReactp5.jsNode.js
Command LineGit & GitHub
C++JavaData Structures & Algorithms
  • 64

  • 119

  • 3

  • 64

  • 119


  • Build a Custom Search Engine with Exa

    Ellie Popoca

    Prerequisites: Python fundamentals, Command Line
    Versions: Python 3.10
    Read Time: 45 minutes

    Introduction

    Picture this: You read a funny tweet (probably ours...), but you forgot who tweeted it and where you saw it. You’re sad. What if there was a search engine that rediscovers exactly what you’re thinking about?

    It's now possible with a specific kind of machine learning called natural language processing (NLP). It teaches computers to understand, interpret, and generate human language.

    In this tutorial, we'll build a custom search engine using an API with LLM capabilities! 🚀

    LLMs (short for large language models) are a powerful tool within the broader field of NLP. These models are trained on insane amounts of data and can grasp the intricacies of human language. 🤯 A popular LLM that you might already be using is GPT-3.

    Tokenization Example

    LLMs leverage deep learning techniques to generate text, translate languages, answer questions, and so much more. The model estimates the probability of a token or a sequence of tokens in a sequence. A token refers to a unit of language extracted from a larger piece of text.

    However, accessing LLMs is a little tricky since training models is both time-consuming and expensive. APIs solve this. APIs serve as a tool to access NLP tasks, such as text generation, translation, summarization, and more, which we'll explore today!

    Let's learn how we can even build our very own custom search engine! 👀

    Exa API

    Exa (formerly "Metaphor") is an API (application programming interface) that retrieves the best content on the web. With Exa, anyone can semantically search the web to get high-quality, relevant information. With Exa's technology, we can rediscover the content on the internet. Unlike Google, which relies on keyword search (matching the exact words of the query to the web content), Exa can understand both the user's input and the content out there. Wow, right?

    Exa utilizes LLMs as a core component, and has been extensively trained to help computers understand human language.

    In this tutorial, we will use Exa's capabilities to build a basic search engine that can help you find exactly what you are searching for.

    Set Up

    First, we need to create an Exa account to access an API key for building out the "search" functionality of our search engine! You'll automatically get 1000 free requests just for signing up. After creating an account, navigate to "Overview" to retrieve your API key.

    Exa Dashboard

    Note: Keep your API keys private and safe and avoid sharing and posting them online. 🔒

    For this project, we'll need Python 3 and pip (package installer) installed.

    Assuming that we have those two installed, let's open up our preferred code editor (we recommend VS Code) and create a new file called main.py.

    Packages and Imports

    With Python installed, let's go ahead and use pip to download Exa:

    pip install exa_py
    

    Note: If this command doesn't work, try pip3 install exa_py.

    In our main.py file, we're going to import Exa, and initialize Exa with our API key!

    from exa_py import Exa
    
    exa = Exa('YOUR_KEY_HERE')
    

    Getting Started with Searching! ⚡

    In the same file, let's create a variable called query, which will hold the response to the input of what we want to search!

    from exa_py import Exa
    
    exa = Exa('YOUR_KEY_HERE')
    
    query = input('Search here: ')
    

    Now, let's take a look at Exa's .search() function:

    from exa_py import Exa
    
    exa = Exa('YOUR_KEY_HERE')
    
    query = input('Search here: ')
    
    response = exa.search(
      'Best Chicago cold brew',
      num_results=10,
    )
    

    Here's one result (of 10) from the returned data for searching "Best Chicago cold brew":

    {
      "score": 0.1940334290266037,
      "title": "13 Innovative Cold Brew Drinks from Coffee Shops Around the Country",
      "id": "https://www.thrillist.com/drink/nation/best-cold-brew-coffee-drinks",
      "url": "https://www.thrillist.com/drink/nation/best-cold-brew-coffee-drinks",
      "publishedDate": "2017-07-10",
      "author": "Dan Gentile"
    },
    

    For each search result, we get:

    • score: The score associated indicates its relevance within the query system; higher scores mean greater relevance.
    • title: This is the name of the article/blog/website.
    • id & url: The link to the result's web page.
    • publishedDate: This lists the publish date.
    • author: This credits a writer. This will return "None" if there is no credited writer.

    To customize our Exa results further, we can play around with Exa's filters to find the exact type of content we need. You can explore the filters here Exa's API Search documentation.

    Here are some of our favorite filters:

    • numResults: Number of search results to return.
    • includeDomains: List of domains to include in the search.
    • category: A data category to focus on, with higher comprehensivity and data cleanliness. Categories right now include company, news, pdf, papers, tweet, GitHub, movies, songs, and personal sites.

    The search example below will return 10 results for "best pizza in Brooklyn" searching through Twitter since May 2023:

    response = exa.search(
      'best pizza in Brooklyn',
      num_results=10,
      start_published_date='2023-05-01', 
      category='tweet', 
      use_autoprompt=True,
    )
    

    Search Coffee on TikTok

    This is where the fun customization begins! ✨💖

    For this tutorial, you'll learn how to search TikTok to obtain top coffee drink accounts!

    In our case, we will want only the top 5 search results, and only from https://www.tiktok.com. Additionally, we're going to treat our search function like a keyword, so we need to declare that as well!

    Let's go ahead and use the Exa parameters to accomplish this. To display this, let's also add a print statement at the end of our file to search!

    from exa_py import Exa
    
    exa = Exa('YOUR_KEY_HERE')
    
    query = input('Search here: ')
    
    response = exa.search(
      query,
      num_results=5,
      type='keyword',
      include_domains=['https://www.tiktok.com'],
    )
    
    print(response)
    

    Run the script by going to your terminal and typing:

    python3 main.py
    

    After pressing enter, you should now see the following:

    Input Search typing the word coffee

    You might notice some information on each result like score, author, and text. Depending on your search query, you may or may not want these to be displayed every time you write a query.

    Let's format our code so that for our React documentation search engine, we only get the Title and the URL. Delete the print statement, and add the following to main.py.

    for result in response.results:
      print(f'Title: {result.title}')
      print(f'URL: {result.url}')
      print()
    

    Now, your search results should look something like this!

    Query Results including Title and URL

    A lot easier on the eyes, right? 🤩💫

    At this point, your main.py file should look like this:

    from exa_py import Exa
    
    exa = Exa('YOUR_KEY_HERE')
    
    query = input('Search here: ')
    
    response = exa.search(
      query,
      num_results=5,
      type='keyword',
      include_domains=['https://www.tiktok.com'],
    )
    
    for result in response.results:
      print(f'Title: {result.title}')
      print(f'URL: {result.url}')
      print()
    

    Mission Accomplished!

    Congrats, you just created a custom search engine! ✨

    video summary of searching on Exa and TikTok

    Here are some additional ideas on what you could search with the Exa API!

    • Wikipedia links
    • Coding documentation
    • Academic papers
    • Tweets
    • Research papers
    • News and media

    More Resources

    Thanks for following along, here are some helpful links!

    We would love to see what you build with this tutorial! Tag @codedex_io and @ExaAILabs on Twitter if you make something cool! ⚡

    P.S. Snaps & slays to Sarah Chieng from Exa for reviewing the project.

    Need Help?

    codedex coin
    author image

    Elliee!

    Joined 10-07-2022

    Curriculum Developer, teaching you how to move colorful pixels! *:・゚✧*:・゚ Overcooked 2 and Game Pigeon Word Bites god.

    • Chicago, IL
    • Curriculum Developer @ Codédex

    MORE FROM@ELLIE

    project

    Animate Images with CSS keyframes

    HTMLBeginner

    project

    Build a 3D Model with Three.js

    JSIntermed.

    RECOMMENDED COURSE

    The Legend of Python