How does this pic show that Elon Musk doesnt know SQL?

DahGangalang@infosec.pub · 14 days ago

How does this pic show that Elon Musk doesnt know SQL?

9point6@lemmy.world · 14 days ago

The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.

The man continues to be a malignant moron

RabbitBBQ@lemmy.world · edit-2 13 days ago

It’s more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.

KillingTimeItself@lemmy.dbzer0.com · 13 days ago

i’ve heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.

DacoTaco@lemmy.world · edit-2 12 days ago

As read on wikipedia ( https://en.wikipedia.org/wiki/Social_Security_number ) the format only allows +/- 100k numbers per area code ( which is also limited to 999 codes? ), so over time you are forced to reuse some codes. In total the format allows 99m unique codes, and the us currently has 334mil people sooooo :')

KillingTimeItself@lemmy.dbzer0.com · 12 days ago

On June 25, 2011, the Social Security Administration changed the SSN assignment process to “SSN randomization”,[36] which did the following:

The Social Security Administration does not reuse Social Security numbers. It has issued over 450 million since the start of the program, about 5.5 million per year. It says it has enough to last several generations without reuse and without changing the number of digits. https://www.ssa.gov/history/hfaq.html

evidently they must be doing something else on the backend for this to be working, assuming there are quite literally 100M numbers, which is going to be static due to math, obviously, but they clearly can’t be reassigning numbers to 3 people on average at any given time, without some sort of external mechanism.

There are approximately 420 million numbers available for assignment.

https://www.ssa.gov/employer/randomization.html

that certainly doesnt seem like it would support several generations, possibly at our current birth rate i suppose.

DDG AI bullshit tells me that there are a billion codes. https://www.marketplace.org/2023/03/10/will-we-ever-run-out-of-social-security-numbers/ this article says it’s 1 billion

https://www.ssn-verify.com/how-many-ssns

this website also lists it as approximately 1 billion.

DacoTaco@lemmy.world · 12 days ago

I think i see the change. They are mentioning the ssn is 9 numbers long, which is 1 longer than the 3-3-2 format wikipedia mentions. That does mean its around 999mil numbers, which ye allows for a few generations ( like, 1 or 2 lol )

KillingTimeItself@lemmy.dbzer0.com · edit-2 13 days ago

TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.

You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)

now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.

The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,

valtia@lemmy.world · 13 days ago

i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another

Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

KillingTimeItself@lemmy.dbzer0.com · 12 days ago

Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.

Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

valtia@lemmy.world · 12 days ago

in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.

… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

… I don’t think you understand how modern databases are designed

KillingTimeItself@lemmy.dbzer0.com · 12 days ago

… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.

… I don’t think you understand how modern databases are designed

it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.

Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.

DacoTaco@lemmy.world · edit-2 12 days ago

Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.

Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

KillingTimeItself@lemmy.dbzer0.com · edit-2 12 days ago

Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”

nednobbins@lemm.ee · edit-2 12 days ago

It’s so basic that documentation is completely unnecessary.

“De-duping” could mean multiple things, depending on what you mean by “duplicate”.

It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.

It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.

A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.

MathiasTCK@lemmy.world · 13 days ago

SSN is also not a valid unique key, there have been situations with multiple people issued the same SSN:

https://en.wikipedia.org/wiki/Social_Security_number