Monday, October 2, 2023

Turning MegaGladys Into a Wikipedia OSINT Power Tool

Turning MegaGladys Into a Wikipedia OSINT Power Tool
By ResearchBuzz

When I say that I do ResearchBuzz, that’s literal. There’s only one person behind ResearchBuzz and that’s me. I don’t have any contract workers or employees or anything like that; I can’t afford them. It’s just me.

That means when I make something I’m limited by what I know and what I can do. This is frustrating because sometimes I have a feeling about how something should go but I don’t know how to make it. That’s definitely been the case with MegaGladys, a tool for pulling and presenting information from Wikipedia. When I first wrote it  last year I had an idea of using it as part of a dashboard-type application that would do a variety of Wikipedia-based people searching, but I just had the idea, I didn’t have any knowledge.

The past few months of hand-coding MastoGizmos and RSS Gizmos have both given me more knowledge and more comfort around working with CSS, so last week I decided to see if I could create a single-page application that would allow the user to both find information about a person listed on Wikipedia and create informed, context-added searches of external sources like Google News, Bing, and Chronicling America.

And y’all, I think I made something pretty nice. I haven’t put all my ideas in it yet, but I like what I’ve got so far and I think you will too. Let me tell you all about MegaGladys.

This is a screenshot of the front page of MegaGladys. The setup is a left nav column and a right body column. The left nav column has space at the top to ender a name, then a list of eight tools underneath that you can use to search the name.

Introducing MegaGladys

MegaGladys is a single-page application of eight tools designed to find information about people on Wikipedia from both within Wikipedia and via specially-crafted external links. Because all tools are aggregated in this single page, you enter the name you’re searching only once and it’s automatically integrated into the searches you do. The site will work on your phone but it’s not mobile-friendly; it’s designed for desktop use.

Let me show you how it works using Joe Biden as an example (though you can use any person who’s in Wikipedia.) I’ll enter his name in the top search form and click the MegaGladys button.

A screenshot of MegaGladys, the first tool at MegaGladys.com . The left nav column looks the same, but now the right body section is split into two parts. The first part is a picture of Joe Biden and an excerpt from his Wikipedia article. The second part is several text sections showing official and reference links along with links to Biden's social media accounts.

MegaGladys

MegaGladys, the first tool, pulls the Wikipedia information about the name you’re searching. The center column provides an image and an excerpt from the Wikipedia article. The right column shows official and reference links for the person at the top (if there are multiple official Web sites, they’re all listed.) Beneath that are two sections for social media links for that person — and again, if there are multiple links for a network, they will all be listed. (Joe Biden has accounts as POTUS, VP, and himself.)

That’s handy if you just need some quick link information, but what if you want to know what Biden’s been up to lately? That’s when you need Gossip Machine. Click the button.

A screenshot of Gossip Machine. The left nav remains the same. The body part of the page contains Gossip Machine, two dropdown menus to specify the year (starting 2017) and month of the page views of you want to analyze. In this case the search is for July 2023 and a list of five dates are denoted as being especially active, starting July 10.

Gossip Machine

Gossip Machine uses Wikipedia’s page view data (which starts in 2017) to analyze Joe Biden’s page activity by month and list dates with particularly high activity — in this case I’m looking at July 2023. The Z-Score of each listed date is denoted by a progress bar so you can get an at-a-glance idea of how much busier his page was compared to average. Each date listing as a “Google News search for this date” link that takes you to a Google News search for that specific date. I’ll click on the link for July 10. Here’s what I get:

This is a screenshot of Google News' search result for July 10, 2023. The results are about Biden's meeting with world leaders before the NATO summit.

As you can see, the increased interest in Joe Biden’s Wikipedia page coincided with his trip to Europe in advance of NATO’s meeting.

I find that having a way to narrow down searches for famous people by date can come in really handy. If you went to Google News right now and searched for Joe Biden, you’d get information about current events and news, which makes sense. But if you want to dig down and find out about things that happened a couple of years or even a couple of months ago, you’ll have less luck finding specific news unless you use Google News’s date search. Gossip Machine makes Google News date searching very, very specific using what I call “fossilized attention” — the recorded interest of visitors to Wikipedia in the form of page view data.

Let’s look at another example — November 2020. The election in America was November 3, 2020. Knowing that, what dates in November do you think would be most busy for Joe Biden’s Wikipedia page? November 4, maybe? Nope!

A screenshot of Gossip Machine again, only this time we're searching for dates in November 2020. In this case there are three busy dates and they are November 8, November 7, and November 4. November 8 is by far the busiest day with a z-score of 3.65, compared to 2.86 and 1.38 for the other two dates.

Clicking on the November 8 will remind you that the actual winning declaration took a couple days:

A screenshot of Google News search for Joe Biden on November 8, 2020. The first result is a Vox article from November 8 with the headline "Joe Biden has won the election, defeating Donald Trump."

Gossip Machine is great when you want to find news about a person from a specific month, but what if you’re trying to find news about a person from a specific place? That’s when you want the Search TV News By State button.

A screenshot of Search TV News By State. There's a dropdown menu to specify state (currently set on North Carolina) and a series of checkboxes showing TV stations by city (Charlotte, Wilmington, Raleigh, and Goldsboro are visible.) Two buttons allow you to either search the Web space of checked TV stations or find news from within the last 24 hours.

Search TV News by State

That to AI-generated infosewage and scrape-and-spit fraudsters, there’s no telling what you’ll get when you search Google News for somebody’s name. Search TV News by State gives you some defense against the garbage by letting you query the FCC database for FCC-licensed TV stations by state. Use the dropdown menu to choose the state you want to search, then tick the checkboxes of the stations you want to search (up to 10.) Once you’ve chosen at least one station you’ll have the option of searching the Web space of that station, or search Google News for the last 24 hours’ worth of news from that station. Your name search is automatically included.

A screenshot showing a search for Joe Biden bundled with several site: searches representing different TV stations in North Carolina.

The advantage of using this tool is that you know what you’re getting. You’re searching FCC-licensed TV stations, so you KNOW that the news you’re getting is coming from a specific place; you’re not relying on the word of some anonymous content creator.

Of course, the FCC database is authoritative but limited. It’s only for TV stations in America. Sometimes that’s just not enough, but there aren’t any worldwide confirmed media lists that I know of that I can access via API, so I did the next-best thing: made a tool to find media sources on Wikipedia and bundle them into a Google search. This search is not as authoritative as the FCC license search, but it does give you a way to identify and search media sources in a more transparent way than just throwing your search into Google News.

A screenshot of Non-Sketchy News Search. The keyword for finding media outlets in Wikipedia is North Carolina. Beneath that is a list of news sources, showing the source name, a description as provided by Wikipedia, and a Web site link so you can vet it before you include it in your search.

Non-Sketchy News Search

Non-Sketchy News Search lets you do a Wikipedia search for media sources by keyword, then bundle them into a Google search. Your original query name is automatically included. Tick up to ten sources, then click on the Generate Google Search button at the bottom. A Google search will open in a new tab.

A screenshot of several North Carolina newspapers bundled into a Google search for Joe Biden. Three results are shown.

Please note that I’m still chasing a bug on NSNS. Most times it works great, occasionally it fails and I can’t figure out what’s happening. So if it breaks for you I apologize; please reload and try again.

I’ve got one more tool to help you search news about a person. It’s called Biography Builder.

A screenshot of Biography Builder. You don't have to provide any information when you use this tool aside from your initial name search BB uses Wikidata to detect the birth and death years of your name search and creates several time-bounded searches to external resources, including Google Books, Internet Archive, Digital Public Library of America, and Chronicling America. If the person is still alive the search goes until the present day.

Biography Builder

Unlike the other tools I’ve showed you so far, Biography Builder requires no input beyond the initial name search. It uses Wikidata to find the birth and death dates for the person you’re searching for, and creates date-bounded Google searches for Google Books, Internet Archive, Digital Public Library of America, and Chronicling America. If the person is still alive, as in Joe Biden’s case, the time search terminates in this year.

Joe Biden isn’t a great example for this; it works better if you’re searching a more historical figure.  A search of Google Books’ Newspapers section for Mark Twain looks much different when you limit your search to his lifespan.

A screenshot of a Google Books search, Newspapers, Full View Only, for Mark Twain covering the years 1835--1910. The first headline is Mark Twain at World's Fair. A National Convention in Honor Of... and it's from November 15, 1903.

 

So far I’ve showed you tools that provide information about one individual. But equally important is information about someone and their relationships with other people. And if you can search Wikipedia for information about one person, why not search for information about several people and how they relate to each other? Let me tell you about Crony Corral, the PeopleLinx Affiliations Lookup, and the PeopleLinx Affiliation Filter.

This is a screenshot of just the body part of Crony Corral. The top part asks the user to enter names separated by commas. In this case Joe Biden, Hunter Biden, and Beau Biden are listed. The 2nd part shows the pairs for this group. In this case the dropdown menu shows Joe Biden and Beau Biden and that they both went to Archmere Academy. The third part allows the user to set how many mentions each person should have in a Wikipedia page. The third part lists the Wikipedia pages that Beau and Joe Biden have in common -- the first two listed are "Family of Joe Biden" and "Beau Biden".

Crony Corral

Crony Corral is a little complicated. It uses Wikidata to find things in common between groups of people (by checking 17 different Wikidata properties) and then divides the people into pairs based on what they have in common. In this case, I have searched for Joe Biden, Beau Biden, and Hunter Biden, and chosen the pair of Joe Biden and Beau Biden. (They have Archmere Academy in common as a place of education.) Once I have that pair I can search for Wikipedia pages which mention both of those names x number of times (the setting goes from 1-5.) In the case of the Bidens the Wikipedia pages are family-oriented as you might expect. If you tried the names of two people who worked together — Carol Burnett and Harvey Korman, say — you’d get much different results. The idea is to find Wikipedia pages relevant to the the relationship of the pair you’re searching.

A close-up result for a Crony Corral search of Carol Burnett, Harvey Korman with a mention count of 3. The three Wikipedia pages mentioned are Harvey Korman, Mama's Family ,and A Special Evening With Carol Burnett.

As you can see, each Wikipedia page listing also includes search links for the name pair and the topic on Google, DuckDuckGo, or Bing, so you can continue your searches outside Wikipedia.

A screenshot of DuckDuckGo results for "Mama's Family Carol Burnett Harvey Korman".

Crony Corral is designed to find Wikipedia pages that two people have in common, and it’s designed to find two people in common via their Wikidata properties. The PeopleLinx Affiliations Lookup focuses just on the affiliations, and uses them to build external search links.

A screenshot of PeopleLinx Affiliations Lookup , which finds Wikidata affiliations between people and builds them into Google, Bing, and DuckDuckGo searches. In this case the search is for Joe Biden, Hunter Biden, and Beau Biden, and it lists the affiliations they have in common (Archmere Academy for all three of them, as well as University of Pennsylvania for Beau Biden and Joe Biden.) External links for Google, Bing, and DuckDuckGo are available for every affiliation.

PeopleLinx Affiliations Lookup

PAL focuses on finding affiliations between people and listing them in groups along with external search links. In this case, a search for Joe Biden, Hunter Biden, and Beau Biden finds the affiliation of Archmere Academy between the three of them and lists that at the top, followed by the affiliation with the University of Pennsylvania, which Joe Biden (employee) and Beau Biden (student) share.  As you might imagine, searching for Joe Biden along with something as specific as his school name can generate some pretty focused results. The first one made me laugh out loud.

A screenshot of the Google search results for "Joe Biden Archmere Academy". The first result is from Archmere itself and the headline starts "Joseph R. Biden '61 Becomes First Auk Elected as..."

The last tool I want to show you analyzes a group of people AND a group of companies at the same time to find any affiliations between the two. Here’s the PeopleLinx Affiliation Filter.

A screenshot of the PeopleLinx Affiliation Filter (PAF.) It has two text forms: one for a list of people, and one for a list of companies. It looks for commonalities between the two groups and when it finds one, it's included on a list along with external links to Google, Google News, Bing, and DuckDuckGo.

PeopleLinx Affiliation Filter

The PeopleLinx Affiliations Lookup finds affiliations between people for all companies/organizations/groups on Wikipedia. PeopleLinx Affiliation Filter, on the other hand, looks for commonalities between a specific group of people and a specific group of businesses/organizations/groups. In the case of the screenshot I did a lookup of Joe Biden, Hunter Biden, and Beau Biden with the organization of Archmere Academy and University of Pennsylvania. That works for the purposes of this demonstration but the way I generally use this tool is after I read some article about a bunch of tech CEO types and what they did, I put in their names and the institutions of Harvard and Stanford. The results are always enlightening.

More to Come

When I make tools like MegaGladys, it’s usually to answer the needs of some search problem. As I use the tool I invariably come up with more search problems I want to solve, so I’m sure this is just the start. Give MegaGladys a try — I’m pretty sure it has some tools you can’t find anywhere else (and if you CAN find them somewhere else I want to know about it!)

 



October 2, 2023 at 09:35PM
via ResearchBuzz https://ift.tt/8dO5j03

No comments:

Post a Comment