Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

Episode 263 - Relative Probability

Episode 263 - Relative Probability

Aaron talks to Max about his new pamphlet entitled "Relative Probability on Finite Outcome Spaces".

Links

Arxiv: Relative Probability on Finite Outcome Spaces: A Systematic Examination of its Axiomatization, Properties, and Applications
The Local Maximum: Local Maximum Labs

Related Episode

Episode 0 - The great beginning, and how to Update your Beliefs
Episode 133 - What Is the Point of Pointless Topology?
Episode 146 - Math, Language, and Intelligence with Tai-Danae Bradley
Episode 234 - Simplexes and Distributions
Episode 243 - Eric Daimler on Conexus and Category Theory
Episode 245 - Axioms of Probability
Episode 262 - Category Theory, Google Responds, and Another Covid Retro

Transcript

Max: You're listening to the Local Maximum episode 263.

Narration: Time to expand your perspective. Welcome to the Local Maximum. Now here's your host, Max Sklar.

Max: Welcome everyone, welcome. You have reached another Local Maximum today in the studio with Aaron. Look at this. Look, we added some new camera angles today. The Local Maximum has gone 3D. I think that's I think that was missing before. I think the angles were kind of a little too flat.

Aaron: I didn't put on my special glasses.

Max: Well, you don't need special glasses for this. But for the 1% of you watching this on video, you'll now see this room from a different angle. Maybe we'll see more as time goes by. So first of all, Aaron, thank you for coming on the show today at past midnight, really appreciate it.

Aaron: It's good to be here at Local Maximum labs. 

Max: Yes, yes, exactly. All right, and specifically talking about Local Maximum labs. Today we're talking about this new paper that I put out called relative probability on finite outcome spaces. Oh, good. We got it on both both angles here.

So I feel like this would be a long dry, kind of a demo or explanation of my academic paper. So it starts with these premises and we work to that. We're not going to do that today. Okay, I'm not going to do some whiteboarding.

Aaron: Not the full walkthrough. You know, proof by proof?

Max: No, maybe I'll do that at some point. But I just want to talk about, you know, how did I spit out this, this 30 page paper, but you know, I'm just gonna, I know you, you read it through quite thoroughly. So why don't I hand it over to you and just let you ask me about it. Yeah. What's going on here?

Aaron: So Relative Probability On Finite Outcome Spaces. The subtitle: A Systematic Examination Of Its Axiomatization Properties And Applications.

Max: Axiomatization. We actually did. That's a mouthful, we actually did an episode on the axioms of probability, I should look up what number that is.

Aaron: Kolmogorov. Is that how you pronounce it? Okay. Interestingly enough, he comes up as a frequently referenced character in a fictional book, I'm reading but that is not the subject of this evening's episode.

Max: So yeah, and let's try to keep it something that the average person would understand. But I don't know. Well, that'd be I'll do that. 

Aaron: That gets to my third question on my list here. So we'll jump right into that. Yeah. Who is the target audience for this?

Max: So the target audience for this is? So that's a good question. This is not like a document that is written purely for graduate students or for something on the PhD level, where you really need a lot of background to understand the mathematics I'm talking about.

The mathematics in here is advanced from the perspective of a non-mathematician. But it's not something that an undergrad who's studying math in the undergraduate level can't grasp and it's not something and it has definitely lots of parts in it that anyone can grasp if you are thinking about probability.

And so the actual constructions in it, like I know that sometimes you're reading through math papers or physics papers, too. And they're like, well, imagine this five dimensional manifold that you're going to do a triple integral on and you know. Or, you know, imagine if you're talking about infinity. Imagine, an infinite set that is countable, and then another one that is uncountable of level three and try to, you know, there's nothing here that's conceptually hard. It should be stuff that is pretty straightforward, conceptually.

So it's not something that if you have trouble grasping, like, it's, it gets abstract, but it's not complicated. I hope that that kind of clears it up.

Aaron: So there's some mention of simplexes, which we talked about on a previous episode.

Max: I should get these episode numbers because we don't have them. I mean, I don't have any electronic devices which are probably good. Well, this whole thing is electronic devices, but I didn't bring my-

Aaron: No interruptions. But when you bring up the simplex, I think you make the point here as we may have — and we haven't touched on — when we discussed that on a previous episode, that we're not gonna get into four and five and beyond dimensional simplexes. That's not required for what you're what you're laying out here.

Max: Right. The only deal with a simplex is that if you have three events and you're trying to figure out what the — there's three things that could happen. And you're trying to figure out what the probability is between those three events. You could have all 100% of the probability be on one event, you could have two of them divided 50-50, or something.

So those three numbers that add up to one, those live in a triangle, and that is a two dimensional simplex. And so, you know, and four numbers, they, they live in a tetrahedron and two numbers, they just live in a line segment, because it's either, you know, either 100% and one side or 100%.

It's either all weighted towards tails or all weighted towards heads if you're doing the weighted coin example. So even if we can't visualize more than three dimensions there, you could still imagine, okay, there's five different possibilities here. What's the situation where there's five different possibilities? I'm not thinking off the top of my head

Aaron: There's got to be a five sided die.

Max: Okay, well, why don't we just use a six sided die? I mean, you know, a weighted die that has six sides. We don't really know. Yes, it's living on a five dimensional simplex in terms of like, what the number, but look, the probability of each side is a number and they add up to one.

And so it's just that space of what's the possible configuration of the weights of that die? How do you explain the weights of that die? That is what a simplex is, to me, from the probabilistic point of view?

Aaron: I think we've already gotten too far down into the weeds. Let's take a step back. So relative probability, what, what is what is the big takeaway here? What is the main message and so my, my, my high level assessment is that-

Max: I’d like to hear what yours is first, because I kind of wrote this as like, this comes out of just how I've been thinking for a long time. And then I had to pare down what I've been thinking.

Aaron: I have a process question about that later. But before we get to that, so what is the pitch here? My very high level understanding is that you're talking about, okay. There may be some cases where we're classically if we're comparing probabilities. If we have, you know, known definite probabilities or absolute probabilities, we can compare those to other known absolute probabilities.

But if we don't have a, I think what you refer to in this context as an anchor probability for something. How do we compare things that we don't know? We know how they relate to each other, but not necessarily how they relate to a kind of external objective measure? 

Max: Well, that's right. So there's like a few different ways of looking at this. I mean, one is just, okay. Every event has a probability associated with it, and you're thinking about probability. And you can just think in terms of okay, you know, if A is twice as likely as B, and B is three times as likely as C, then we know that A is six times as likely as C, I mean, that's just simple.

Aaron: And we can, we can make that determination without knowing that A has a, you know, 50% chance, we don't need that right, in order to make reasonable comparisons between A, B and C.

Max: Right. So, that's true. And that's, that's already true. But there's the question of, do we need to have absolute probability at all I mean, so right now, if you use the Kolmogorov axioms, you're assuming that every event has some absolute probability, we can talk about Kolmogorov axioms, we've talked about it before on the show, like, why are they so popular? And it's not in mathematics. 

It's not that people poke holes in axioms — axioms are not assumptions. They're not something that you could poke holes in; you can’t prove them false or anything. But it's almost like an organizational question. And so once you have a standard of how a certain mathematical object is organized, it kind of sticks. So that's what I'm kind of poking a hole a little bit into it.

But what if we organize it in a sense that, really, we only care about how likely things are relative to each other? So maybe you could think, okay, the absolute probability is, you know, what's the probability that this coin lands on heads in relation to the probability that the sun comes up tomorrow?

Well, the sun comes up tomorrow, probability around one. So the coin landing on heads relative to that you could get some measure of absolute probability, but it's like, okay, let's imagine, at the fundamental level, we really only care about the relative ratios. And so that's, that's sort of my question, because when you're doing Bayesian inference, and when you're searching for hypotheses in a Bayesian question, and oftentimes you're only caring about those ratios.

So I was like, alright, why don't we make it official and actually make our mathematics reflect that as well. So that's sort of how I've been thinking about it.

Aaron: What do you foresee? Is there a kind of an aha moment here or a killer use case where thinking about it this way, as opposed to some of the more traditional approaches gives you an edge in something? Or it's or it's particularly well suited to a specific use case? Or is it more of a kind of a thought exercise?

Max: So I would say it's like I don't, I don't think it's just a thought exercise. I think it's more of a, hey, I want to develop all of these theories and ideas. It's nice to have kind of a basic foundation, a foundational change that I can reference back to. And so that's what this kind of is.

So it’s sort of like questioning the foundations of probability theory. And so I think it's now there's some interesting use cases here. There's no great theorem, no great result or anything like that. But there's some interesting points that I think I want to go through.

So one of them is the philosophical problem. And people have a lot of ways of dealing with this. But this is from Borel. There is a paradox named after it. It's like the Borel Paradox or something like that. But it's like, okay, what is the probability that a point on the earth is in the Western Hemisphere, given that it's on the equator?

And so the way you usually solve that is, okay, the problem that what's the probability that a point is on the Western Hemisphere and on the equator, divided by the probability that it's on the Equator, and the probability of both is zero, because the probability that you're exactly on the equator, assuming that we have some like uniform distribution over the earth, so like, you have this problem where you lose conditional probability, and you lose the ability to compare probability zero events.

But when you go from finite to infinite, all of a sudden, in mathematics, you get lots of probability zero events, like all of these uniform distributions that were continuous distributions that we're talking about all of the individual events, they are probability zero, so it's, you know, like, take a uniform number between zero and one, what's the probability get point five? Zero.

So if you want to be able to compare those events. And so I came, I sort of wanted to look at a framework to do that. I was gonna say, I developed a framework. I think I developed a framework. I mean, I also point out and like other people who have done things very similar to what I did, but I sort of am more inspired by the work that I've done in the 21st century here.

So there is another part kind of towards the end, your two parts towards the end, which is like, why is this interesting? I mean, one is like, I looked at common probability distributions from the standpoint of them being ratios. And so the formulas change, sometimes the formulas simplify. So that's kind of nice. And then, and we know this and Bayesian statistics a lot, because we have that denominator that was it called, like the probability of the data called the joint distribution, whatever it is, it always cancels out. So let's make it official, let's have it always cancel out.

And then I have this interesting digital representation, like how can you represent these probabilities now that you're not literally writing down a list of numbers for that six sided die. What do you have now that you need a relative distribution? And then I also talk about topology, and limits.

Aaron: We've had a couple of topology related episodes.

Max: We have. And so one of the points I'm making is that, okay, let's suppose that, I don't know. We have our — do I want to go with the six sided die? Let's just do with the three events. Okay, let's say we have three events. I'm not even gonna pick an example here. I'm just gonna use A, B, and C. Sorry, folks, we're not picking a concrete example. But I'm really bad if they give a concrete example on the spot.

So let's say that we know that like A is twice as likely as B. Okay, and then we know that the probability of C is 90% and the probability of A and B Together is 10%. Now, don't worry, I'm not going to make you solve what the individual ones of A and B are. But you can do it right? So think of it, there's this tiny probability of either A or B, huge probability of C. But we know that A is twice as likely as B, now, increase the probability of C. So C is 90%.

Now, you say that C is 99.9%. So we have C kind of approach 100%. and A and B together, they approach 0%. Now, if you're in the world of absolute probability, when you take that limit, you end up with the, with the probability distribution there of C equals one and A and B equals zero. And so that corresponds to a vertex on the, on the triangle, on the simplex. So that's kind of nice. 

Aaron: So in that case, we're losing some valuable information about that relationship between a and b as they both approach zero.

Max: Exactly, exactly. But if your fundamental object is based on relative probability, you're keeping that information when you take that limit. So I think that's, I don't know if it's useful in terms of like, oh, I'm just gonna plug that into a machine learning problem. It'll fix something. Not like that.

But it's a useful framework, if you're dealing with that issue. And you're dealing with conditional probability and zeros, which is what the Borel equator problem is. So there's that. And then there's also Yeah, and there's some other interesting things. There's some connections to category theory that I don't get into too much.

But I think it's kind of interesting that this is actually what I've described here is actually a category. And so I'm not a category theorist, and I talked about it last episode. In preparation for this, why is category theory interesting?

And so I don't have the answer to like what this means. But I found that the relative probability function, as I described it, is also a category, which is pretty cool.

Aaron: So I was going to ask you for some real world examples of use cases. And while we didn't give a concrete grounding for that here, we kind of got to that in that last exchange. 

Max: Yeah, I actually do have an example in the paper.

Aaron: I'm trying to think of, like, is there a case where, so there's a whole section about how to apply this to Bayesian probability, right. Which, which is something that I feel like even going back to episode zero, we were talking about Bayesian probability and priors and posteriors, so we got a long history with that.

But when I think of Bayesian probabilities, I frequently think of Metaculus. So is there a case where this kind of framework for thinking would help me make better predictions on Metaculus or can you think of a way to kind of plug it into that mental model?

Max: I think, I actually think it might now that I think about it. And I'm kind of thinking about this on the, I'm thinking about this more kind of on the fly, I didn't think about this beforehand, but you're trying to predict things on Metaculus. Look, why don't we do exactly what we do here and try to predict the relative probability of different events on Metaculus like, let's say, you have something we're okay, I'm having a hard time predicting the probability of this event.

But you have two events that are kind of related. And oftentimes, things on Metaculus are kind of related like maybe some of the political elections or maybe like, you know, some of the some of the economic predictions could be kind of related. And you're like, okay, well, maybe, instead of trying to go directly with the absolute probability, I can try to figure out how, how things relate to each other, like I said, is this twice as likely as this other thing is, is well is is about as likely, as is nothing.

And if you could, if you could do that, you could maybe create a leveraged bet by if you bet on one thing, you can bet on all the other things where you figured out the relative.

Aaron: and there are some questions on Metaculus that are I, I usually kind of think of them as almost meta questions. But where the thing being predicted is dependent on either some other outcome, or dependent on the resolution of some other Metaculus question.

So I don't know, in a political one, for example, it would be thinking back to the midterms, it was, if the Republicans take the house, how many seats will they take it by? And so that's, that's not exactly a ratio there.

Max: But it's a conditional.

Aaron: But it's conditional and conditional is kind of a step towards that relative probability. It's not exactly the same but it's, it's related. 

Max: Well, it's the probability of A given B is the probability of A and B divided by the probability of B. So it's a relative probability of two events. One event happens to be a subset of the other, but it's still the relative probability of two events. So it's almost like, what's the probability I get heads given that I got either heads or tails?

Aaron: So stepping back a little bit. And you kind of touched on this when we were talking earlier. But what was the inspiration for this? I mean, was this something that you that was there a particular moment that sparked this, or this is just something you've kind of been dabbling around the edges of for a while.

Max: Definitely the latter. Somebody's been dabbling around the edges for a while, because, I mean, as far back as when I was in grad school, when I was first kind of really trying to grasp probability on the fundamental level. I just, I feel like the, I feel like our mental model of it is good. And the mathematics of it is pretty good. And in fact, some of the higher level mathematics of probability theory, like there's actually a lot that I don't understand yet.

But I feel like there's something fundamental that is open to change, that's open to like, kind of rearranging how we think about it. And I've always kind of been interested in this concept of like, measure theory, like, why is mathematical measure so difficult to define? Why do you need such high level mathematics to define a measure? Which you do.

What other ways are there for doing it, basically. And so I think kind of my just real world experience of doing Bayesian inference and searching hypothesis spaces, and all of the algorithms for that are based on relative probabilities between things, most machine learning algorithms, when you're looking at your gradient descent and whatnot, the gradient descent on the loss function often is some kind of derived from a probability.

And when you're searching that you're often thinking about the ratio. In a lot of cases, in applied math, it's only the ratio that matters. So I could just say, okay, great, let's focus on the ratio. And let me do that for my applied project. But I also tend to think backwards.

Okay, now that I'm thinking about the ratio, how do we derive everything from it? So it's like, okay, I could just go forward and do what works without thinking about it. Or I could also go backward and try to shore up some theory as well.

Aaron: Very cool. So I guess, I've got a process question, I want to ask you, as well. But before I get to that, you, I saw that you cited at least one piece of your previous work, but it doesn't seem that this grew directly out of that. But this isn't really a sequel to your previous paper, as much as they kind of have some adjoining vertices in their geometry.

Max: I feel like I have a lot of papers out that could have a really good sequel, including this one, and including that other one that has good future work, but I keep introducing new crap. No, but I do think I'm going to actually go back and do some of those sequels.

Eventually, it's just like, when do you do the future work? And I think my answer is, either when I'm inspired to do it, or when it's like, it becomes kind of necessary, because I actually have a problem. Like, on the last one, there were a lot of questions on sampling.

And like there was some future work that I need to do for different types of sampling, if I run into a real world problem where I have to solve it, and sure I write it down, absolutely.

Aaron: Because in that case, yeah, if once you solved it, you might as well document it for others to leverage.

Max: And so I think the main connection between this one and the last one: the last one talked a lot about supervised machine learning, and the Bayesian interpretation of it, and really just talks about how, you know, the different algorithms for searching models.

Which really training models, is kind of funny, because I think I read on Twitter, some somewhere where someone said, you didn't train your model, your model always existed as a mathematical object, you simply searched for it. So okay, whatever. I don't know what to use, in that case, but when you're training your model,

Aaron: It's like you spoke the Platonic solid into existence

Max: Yeah, exactly. So when you're training a model, it's kind of like you're searching the space, and you're looking at the ratio between probabilities, do you have a good ratio, a bad ratio? What's going on there? And then there are a bunch of different ways to, to figure out how to navigate that. And so I have a list of lots of different algorithms besides just gradient descent.

Gradient descent is pretty simple. It's just like, when's that ratio going up at the fastest pace? Which direction? Then you go in that direction. So that's a pretty good one. But, there are other ones as well. And it's like, okay, if I'm, if I'm doing this stuff, and the ratio matters and all of them, then therefore, this is an interesting problem to solve.

So that's the connection between the two. And of course, this one is, it's finite outcome spaces. I feel like the whole ratio kind of thing gets weirder and more interesting in infinite spaces. But the fact is, I had 30 pages to write and finite space space. So I had to start with that.

Aaron: So, so process wise, you already mentioned that this goes back to, you know, things you started thinking about way back in grad school. So it's something that's been percolating in your mind for a while.

Max: And it's also in terms of trying to wrap my head around measure theory, which I feel like I've never quite been able to wrap my head around, and just coming back to it every few years being like, okay, how should we understand measure theory? And then I came out with this.

Aaron: So my hard hitting question here is. Well, I guess the first easy question. First is how long did it take you to write the paper and be — well, I guess we kind of already covered this. How long did it take you to think through it? And so it sounds like the think-through question is, you've been thinking about this for years before you put words to paper. 

Max: Yeah, but the problem is like I started in the… I never remember when I started. Probably some time, what was the tech retreat? It was like, was it July or August?

Aaron: Yeah, I guess we did it…it was last summer.

Max: Yeah. Had I started it then? Probably. Well, you know what, look, this is on GitHub. So we can tell when I, when I started really working on it. I probably created it some time ago.

Aaron: Close to half a year?

Max: I think it was no, no, it wasn't a half a year. I was working on newmap.ai. I kind of lost steam a little bit. I'm kind of returning to it a little bit now.

Aaron: So this was your break from that? Yeah, to change gears a little bit. 

Max: Yeah. And then I started writing it, and it took a few weeks to get it together. But then you realize, oh, my God, my thinking is a little bit off. And then you have to, like rearrange some things.

Aaron: That is exactly the question I wanted to get to was, since when you sat down to write this, how much— was the idea fully formed, and it took you a couple of months to get the wording just right and to get the concept out of your head onto paper?

And how much of it was, as you started to codify all of this, your thinking developed further. And there were some maybe unsolved aspects of it that you didn't really figure out how it all dovetailed together until you'd started working on the paper itself.

Max: When I started, Aaron, I thought that I had it all figured out. And all I had to do was write it down. And then it'd be good. And then I could just put it out.

Aaron: It sounds like you were mistaken.

Max: I was badly mistaken. So first of all, it turns out that some of — you think it's pretty simple, right? It's just a bunch of numbers that add up to one. How complicated could it be? But no, it turns out that some of my formulas for coming up with a relative probability of events is oh, I'm sorry, adding division. Is that so hard? No, it's gonna be easy.

But then it turns out that like some of the formulas — I always want to get like, what is the most generalizable formula? And what is the most elegant formula? That's what I'm going for. And so, some of my formulas for events were wrong. And it's like, well, how do you define the probability of an event which has multiple outcomes ? And there's also this problem where some outcomes become uncomparable and how do you deal with those? What if you have an event where two of the outcomes inside it are uncomparable?

I just, I kind of skipped over that. I had to say, we're not dealing with that. Because I tried and I was like, yeah, I might be able to deal with it to some extent, but I don't want to. There's nothing conclusive here. And then, as I started doing it, and then I started looking at the topology, right? And the topology was really interesting.

I had this whole — and then I ended up putting it in the appendix — I had this whole scheme for defining a topology on relative probability function, but then it turns out that I could prove what I wanted to in a much less elegant but much easier way. Just by saying, look, you know, in topology, you're looking for compactness, that means that or basically, you're looking to see if something is closed, is it closed under limits? So does the object contain its boundary, right?

And so the simplex, the simplex contains its boundary, because it's like, let's say the two dimensional is like a triangle in three dimensional space. And we kind of already have all the theorems and stuff to know like, how that works. And it's a triangle that does contain its boundary. So that's not that hard. The problem is the relative probability function, it's not, it's not just the triangle, it's because a whole bunch of different values live on that vertex. Because if C is 100%, A and B still can vary between them.

So there's a lot that lives in that point, but if any two numbers have a relative value, then now instead of living in a three dimensional space, it lives in a six dimensional space. And I'm just like, well, it's Euclidean space mathematically. And so we can now say that it's.

Aaron: So this was the hexagon diagram. 

Max: A little bit. Yeah. But that's, so that's actually kind of hard. So I feel like that’s very tricky.

Aaron: For those following at home: you talk about it. And I'll say which section to call that out in.

Max: So if the relative probability between A and B can still vary, if C is 100%, then you're no longer living on a triangle. You're living on something else with a vertex of the triangle has lots of values. And I thought that you might be able to model that as a hexagon.

But it turns out that like, you know, it turns out that it's tough, because it doesn't quite work as nicely as you'd want it to. And then in multiple dimensions, there are values hidden in the sides and values hidden in the vertices. And it's like, I don't and that's the downside to the future work. Like, I don't know if there's a way to nicely describe the shape. And so that's kind of an interesting question.

I am guessing there's probably a mathematician that exists in the world who has some really great insight into this, but, but I don't know it. But I find this an interesting question. And there's and so if you can show that up to the-

Aaron: Yeah, I don't know if this will show up on the camera. We can see it here. For those following at home section 10.3: Embedding in Lower Dimensional Euclidean Space.

Max: But the problem is, you think that point always lives in the middle of somewhere in the hexagon. But it turns out that sometimes that point could actually go outside the hexagon.

Aaron: That’s a little trippy.

Max: Yeah. So like, there's, there's some problems there. So, yeah, I think it's an interesting question.

Aaron: So we kind of wandered into the future work section, there's a couple of things you called out there as future work. Do you have any immediate intentions to delve into that future work yourself? Or you're going to be changing gears again? What are you working on next really, is where I'm going with that.

Max: Oh I’m going to be changing gears again. Although, I really do think I will come back to a future work section. So I mean, look, I'm gonna be very busy. I'm on a job search now. And so I might have less time for this, especially as I get a job. By the way, it's such a pain in the neck these days. How many hacker rank exams do I have to take? Even though I've worked.

Aaron: How hard could it be, like you said, you know, it's, it's just a bunch of numbers that add up to one. Machine learning? It’s just ones and zeroes! It’s two numbers.

Max: I'll tell that to the recruiter next time. I’m hopefully getting to a point where I'm gonna have some options. But in terms of Local Maximum labs, I think that the next step is to come back to newmap.ai. And one of the, like, one of the basic ideas in newmap.ai is to boil all of data into the key value pair. That's what I'm calling a map here.

And so those are very common in computer science. And they're very general, like in Object and Object Oriented Programming is a key value pair on the on the field and, and like, you know, a dictionary in Python is a key value pair but so is like a database with a primary key.

So there's a lot of things that are key value pairs in computer science. And so I think newmap has a really interesting way of categorizing all the different ways those come up. And so I feel like that might be a good thing to write about. And then, and then I feel like a lot of these projects will converge, but it's going to take like ten years. So like, give me some time.

So for those of you out here out there, like, take care of yourself, you need to live a long time in order to get to the end of this the end of this work.

Aaron: So there is a section under the Bayesian inference and on relative distributions bit about digital representation. Now, when you were writing that, you don't mention newmap AI in there, but were you thinking specifically about how you would implement this type of thing within the paradigm of newmap?

Max: I was actually thinking about I mean, yeah, so just because I was thinking in that direction I'm talking about here are the functions that you have to implement in order to get this working. Like, you know, if I were to create a library, like a library in Python, or Scala or whatever, that kind of captures this way of thinking.

And as I was writing it, I was like, oh, this should come with code. But then writing it got really hard and time consuming. And I was like, nah, I'll just write how to write the code, and then someone else will do it. So that's what I'm hoping someone else will do. If you read the paper, look at the digital representation section and write a little library for it. Actually, I don't think it's a bad project, I think it will be a project that will take someone like you could do it over a weekend easily, probably less for somebody.

Aaron: Those of you out there who are so inclined, not not only has Max laid out some interesting future work that you could embark upon, but it sounds like there's an opportunity for a Local Maximum lab collaboration on implementing some of these ideas. 

Max: It's actually — and you'd be surprised for those who are coders out there, you'd be surprised at how easy it is to implement relative probability, as stated in Section 8-3, I think so.

Aaron: So you’ve heard it from the horse’s mouth, it is trivial to implement. So go forth and prosper.

Max: I don’t know if I want to say trivial, but if someone wants to do it, definitely let me know. And we'll, I'll feature you on the podcast. So that's great. All right. Do you have anything else?

Aaron: No, I wanted to make sure we talked about what kind of next steps or future work. And we covered that. So alright, unless there's, there's something else you want to say. Obviously, places it can be found. It is on Archive, but you'll have a link on the website. Do you have a section on the podcast website that links to all of your papers?

Max: /labs. localmaxradio.com/labs. And of course, /263 will get it. It's not hard to find my papers online. I hope I get some citations. And you know, I looked at my paper from 2014, my first solo paper on the Dirichlet distribution, that has like 31 citations, but that came out in 2014. From 2014 to 2016. It got like two citations. So that's kind of, you got to wait a long time.

Aaron: Is this the first citation for your most recent previous paper? Or has it been cited by anyone else?

Max: No, but the 2014 one that became very popular wasn't really, I think it got one citation a year and a half later. And then now it has like 30 or 20 something.

Aaron: Yeah, the longtail on those can be interesting sometimes.

Max: Yes, it's not for someone who wants instant gratification. That's for sure. But, hopefully we'll see more in the future. All right, especially since, you know, this is kind of good citation bait, because it's, here's my strategy.

Okay, here's my strategy, we got to say, first of all, make it easy and fun to read. You just read it through. Hopefully it went down easier than a lot of academic papers, like some academic papers are like eating razor blades. Hopefully this was like a nice milkshake going down.

And secondly, it's something that's general enough that like, someone will be interested in this in 10 years. So it's kind of evergreen. So that's my strategy for getting citations. Hopefully, hopefully, that that will pay off.

Aaron: So read up and and share, share it with your friends at your next cocktail party and sound like the smartest person in the room. All right.

Max: All right. So we're ready for a segment.

Aaron: Let's do it.

Narrator: And now, the probability distribution of the week.

Max: All right, Aaron, the probability distribution of the week. Today's probability distribution of the week. This is a big day, this is a seminal moment in probability distribution of the weel

Aaron: We’ve arrived?

Max: Yes, we finally arrived at the normal distribution. I’ve finally broken down and we're going to do the normal. What's the normal distribution look like? Well, I think that a lot of them look like this, but it kind of looks like a bump in the middle. And, you know, you're very likely let me just draw it real quickly. Oh, crap is a really bad one. No no no, I don't think anyone wants this one

Aaron: Okay, I'm mentally picturing a bell curve-ish looking.

Max: You know, it turns out, I can't draw a normal distribution. I'm just gonna describe it, right? It's a bell curve. You've seen it before. You've all seen it before. Right? So okay, it's a central tendency, it's like, okay, you have a mean. And that's the average. And then you have a standard distribution. And so sometimes you take like, you have like a canonical one, where the mean is zero, and the standard distribution is one, and you kind of scale it up.

So some interesting things about that equation is e to the minus one half x, squared. And then if you want to add the mean and the standard deviation, you do like, you know, x minus the mean, divided by the standard deviation.

So it's interesting that you have that minus x squared term. So if you think of x squared, right? What's the graph of x squared? For those of you who took algebra, it's this U shape. And so minus x squared is this U that's flipped on its. What's the U flipped on? I want to say it's an end, but that's not really, that's not really an end is it? It's kind of an upside down parabola?

Aaron: Is this union and intersection you should be thinking of in terms of symbology?

Max: No, no, I don't think I like that. All right. Well, anyway, when you take e to that, when you take e to the negative numbers, because it gets real negative real fast off to the sides, that goes to zero, but the top bump is still there.

So that's kind of how you think about it. Interesting thing, when you normalize this, there's a, there's a pi, there's that in fact, the square root of pi in the normalization term.

Aaron: Well that comes out of nowhere.

Max: Yeah, that's, that's a fun fact. It's like,where's that pi come in? And how do you find the integral over this curve? And it turns out, the best way to find the integral over this curve is to actually find the integral over the two dimensional version of this curve, and sort of count the circles going out. And because you have circles, you have, you have pis in there.

And then when you go from two dimensions to one dimension, you get the square root of pi, actually, more specifically, the square root of two pi. So the square root of tau, tau is, is two pi, which is the correct version of pi. Do you disagree on that? Because I know there are lots of like wars on that, on how it's the pi versus tau sort of argument I?

Aaron: Well, I wouldn't say that pi is wrong or or tau is wrong, but it does seem unnecessary to me to have another constant that is literally just two times the other constant you have. It seems redundant.

I think they assume it's because different mathematicians arrived at these constants independently and one used pi and one used tau. And they were doing the same math.

Max: I think tau was invented more recently. So it was like, we've been using pi. But I think mathematicians started to realize that, actually, like they were pi was originally the ratio of the circumference to the diameter of a circle.

And they realized, really, it should be the circumference of the radius, and two pi radians is one turn. And like that is just more natural and that would simplify our equations. So we should have gone with tau to begin with, it's kind of too late to change because everyone's used to pi.

But there is that point that tau is wrong. And here's a good case of it, where you have that square root of two pi normalization term, it's really the square to tau will tell us oh, okay, so we, it almost suggests how we were coming up with it. 

Aaron: Nobody memorizes the digits of tau. You could be the first.

Max: You know, memorizing the digits pi, it was a useless exercise anyway as it turns out, so yeah, if you memorized a 1000 digits of pi, I'm sorry, but that was all for naught. All right. So talk about a little bit of the normal distribution. First of all, why does it come up so much? Why do people use it so much?

Aaron: And why is this of all distributions deemed the normal one?

Max: Right. I think the reason is because of the central limit theorem. And the central limit theorem says that take any distribution or almost any distribution out there any continuous distribution for sure, and then pull values from it. Get outcome comes from it again and again. And like average them together, that would converge towards a normal distribution.

So it's almost like every distribution turns into it, which is when repeated enough times. So it's and then that means that it's going to appear in a lot of physical processes. Also, if you take something like the binomial distribution when it's like, when it's finite, but as you know, even the binomial distribution, if you run it, there's something at the, at the Boston Science Museum that has like an example of that with with all the marbles falling down.

Aaron: Like, like, is it pachinko?

Max: Yeah, when they fall down, and you see that you still get that nice curve. As you go further and further down, and Pascal's triangle, it looks more and more like a normal distribution, even though it's still discrete. So it just comes up everywhere. And so that's why it comes up in nature a lot.

Even in like, distributions that should be like, even when we were doing the beta distribution last week. But if you have a beta distribution, you're trying to find a probability, even though you can't be less than zero, and you can't be greater than one, you could still kind of see a normal distribution in there a little bit. And so it's good at estimating a lot of distributions. And it's also the most mathematically succinct.

I mean, is there anything better than you can do, th an x squared in terms of mathematical simplicity? It's like, well, it's maybe x.

Aaron: It's pretty basic.

Max: Yeah, yeah. So my one guess is maybe x instead of x squared. The problem is, you can't have x because then the probability goes off to infinity on one side, so you're like, alright, well, absolute value of x.

All right, great. Now you end up with I think it's called the Laplace distribution. It's actually used in machine learning as a regularization term called lasso regression. But like it's not as nice, it's got an absolute value, and it doesn't do well with calculus, you know. 

Aaron: So I have two perhaps, very random thoughts that were spawned by that. Okay, so you mentioned the central limit theorem. And I was recently thinking about New York City — is there a grand central limit theorem?

Max: I have no idea. I know, I was looking on the website. Before we started. I know, there's lots of different formulations of a central limit theorem besides this basic one, which I think the one that creates the normal distribution is the most interesting.

I'm sure there's more like in multiple dimensions and multiple more abstract situations. But maybe, maybe we can look at the generalizations of that and deem one, the Grand Central Limit Theorem, but I don't have that.

Aaron: So the other random thought I had, and it popped in my head, the moment you told me before the show that we were going to be doing the normal distribution. Is it Young Frankenstein? Is that the movie where they have the whole bit about Abby Normal?

Max: I didn't see that movie. So you're gonna have to fill me in.

Aaron: It is not one that I've watched over and over again, so I could be misremembering it but I believe it is a classic Gene Wilder and I want to say, Mel Brooks film.,

Max: That I know. I just haven't seen it. Maybe I should. I don't know. I know. Sometimes these old films don't age as well as they should. But I bet that one has.

Aaron: Speaking of old films that don't age well. I think we're going to have a future episode coming up where we review a decade old film.

Max: Two decades.

Aaron: Oh, Jesus.

Max: Yes. A decade was 2013. Yeah, well, that's gonna be exciting. Because that's our, that's the film that we made together in high school. The thing that I'm going to probably be most annoyed about is the sound quality at this point, but alright, that'll be fun.

One more, one last point I want to make about the normal distribution, which is why Nassim Taleb really, really rails against it a lot is that it's very thin tailed. And so what does that mean? That means like, when you're one standard deviation away, one sigma, what is it, like two thirds of the data is within one sigma. Maybe 95% of the data is within two sigma.

As you go further and further out. It drops precipitously. So you're almost never going to see a data point that is like 10 standard deviations away. It's not just like almost never, like one in a hundred almost never, like \heat death of the universe stuff like that. So it keeps things very tight. You can't get a really oddball pitch as you can with some other distributions.

A lot of times you want to ask yourself, can I have a real oddball pitch? And is this going to affect my model, if I use a normal distribution for whatever I'm using it for? So that's often a good thing too.

So, for example, a Student's T distribution, which we're not getting to today might be in the future. That kind of is like the normal, but it allows the standard deviation to float a little bit. So it's like, the standard deviation might be several orders of magnitude greater than we think it is in some situations. And so take that what you will and so

Aaron: We mentioned, or maybe it was I who mentioned bell curves, when we were first talking about it. Is the classic bell curve a normal distribution or is that not necessarily the case?

Max: I believe so. I believe it's meant to be a normal distribution. Yeah, absolutely. And a normal distribution doesn't like, you know, it doesn't bend to one side, like the median, and the mean, and the mode are all one in the same, but it's unimodal, you know, just as a single, and it's exactly symmetrical. So that's always very nice.

Now, in real world data, when you're actually collecting data on natural phenomena, or social or human phenomena, and marketing, you very rarely get that fact, where it's like, both sides of the distribution are perfectly symmetrical.

Aaron: It's been a long time since I've been in a course that was graded. But I'm trying to remember now, when they talk about grading to a curve, are they using a normal distribution? Or how are they modifying from that?

Max: I mean, I don't think you can, because if grades are, first of all, grades are categorical. Or if they're 0 to 100, you can't actually, you know, the problem is, if you have if you have a B grade, then you're gonna have certain number of Bs, and the number of As and Cs are going to be equal, but the number of Ds and whatever is better than A is going to be equal. It can't happen. So yeah, well, there's the classic argument, teachers that come and make it up as they go along. Yeah, but there's, there's

Aaron: The classic argument when you're grading to A curve is oh, is it a B centered curve? Or is it a C centered curve?

Max: Well, C centered curve, you might actually be able to do it but nobody likes it.

Aaron: Well, yeah. Students don't like that.

Max: You might be able to do it

Aaron: Or is it at Harvard and it's an A centered curve?

Max: Yeah, you might be able to do some kind of, like a binomial distribution there given that it's categorical. Or, you know, you could just say, Look, you can give like A+, A-, and have a lot more categories in the upside, but not do it. On the downside. I don't know.

Aaron: I haven't thought about it in a while. Was it was back in high school that the way grade point averages worked that there was the letter grade, and then pluses and minuses were like point four in one direction or the other. And then if it was an honors course, there was like a plus point five or point six or something.

I'm pretty sure it was a 4.0 scale. But if you got like an A plus in an honors class, you could get a five. That's so long ago. But yeah, the whole thing is wonky.

Max: I don't even know. Glad I don't have to deal with that. Does it even still work that way? Your kids are going to go to school, and well, they're already going to school, but they're going to start receiving grades in a few years. And I think you're going to find that the grading system works under a completely separate set of mathematics than we had.

Aaron: My first grader did get a report card. Not that long ago, but at that, age, I think we were still getting like not letter grades as an ABCD. But more as in like, you know, satisfactory, satisfactory or exceeds expectations or needs improvement. And I think it's not exactly that.

But I think it's much closer to that paradigm. I don't know at what point was. It was probably in like, third grade ish, that I feel like it's switched over to more traditional grading. Let’s see what happens.

Max: Yeah, I mean, grading could just be, you know, totally abolished as an oppressive tool of the bourgeoisie.

Aaron: I wasn't following that closely. But I remember hearing a lot about during COVID With the move to, to Zoom school. Yeah, that a lot of places were doing away with grades for that period, because they felt that they couldn’t make accurate assessments, that it wasn’t fair, so not not being in a position where that mattered to me directly, I haven't followed it closely.

So I don't know if everybody's gone back to the old ways, or some of that still held over or what but yeah, we shall see.

Max: When did it become like schools were completely different than when we were there because there was a period when we left school where you could still kind of wrap your head around like it worked the same way. But after a certain period of time, you get old enough. And things were completely different. And it's like, well, when did this happen? So I don't really know.

Aaron: Old man yells at clouds.

Max: Yeah, yeah, I guess so. All right. I think we're ready to wrap it up. I think we talked about everything we want to talk about today. Anything, any last? Any last thoughts?

Aaron: Check out the paper. If you have reactions, thoughts, ideas, let us know over on the Locals. We’re eager to talk. 

Max: And this is the last episode in January. I think February, we're going to do a big push to get lots of people on the Local. So hopefully, we'll have a much bigger community there. In a month, I'm going to put in a lot of work to try to gin up support for it. So we'll see how successful I am. But I'm committed to this. So hopefully, you'll help me out Aaron and we'll get up to our 50 supporters.

Aaron: We're gonna get out there with our petition and start taking names and signatures.

Max: Oh, yeah, I'm gonna be. Let's be obnoxious. Let's get in people's faces. Let's get them to sign up. I think it'll be very good to build this community. So all right, with that, look forward to that. I've got two great interviews in the can.

One is a little bit more political. It's someone from DC, someone on the left side of the aisle, very different from some of the opinions that you've heard here previously. So that was really cool. And then I've got another one on natural language processing, which we haven't talked about in a while. Aaron and I have our film review, so to speak, coming up. So have a good week, everyone.

That's the show. To support the Local Maximum, sign up for exclusive content at the online community at maximum.locals.com. A Local Maximum is available wherever podcasts are found. If you want to keep up remember to subscribe on your podcast app. Also, check out the website with show notes and additional materials at localmaxradio.com. If you want to contact me the host, send an email to localmaxradio@gmail.com. Have a great week.

Episode 264 - Talking Tik Tok, Privacy, and Propaganda with Adam Kovacevich

Episode 264 - Talking Tik Tok, Privacy, and Propaganda with Adam Kovacevich

Episode 262 - Category Theory, Google Responds, and Another Covid Retro

Episode 262 - Category Theory, Google Responds, and Another Covid Retro