My field of expertise?
The world of computers is rapidly advancing. It seems that the issues described here are being resolved more and more. However, there are still many unsolved problems, and I feel that the development of computers itself is proceeding without solving these problems. People who only use computers may think that computers are becoming more and more convenient, but on the other hand, fewer people are aware of the principles behind them.
The author seems to have been involved in computers since the dawn of time, and his writings are full of his hard work and confidence.
Plurality of writing systems
Just as languages are diverse, each culture with a written language has its own writing system. In addition to the 26 letters of the alphabet (52 uppercase and lowercase letters) + symbols, there are also symbols such as umlauts and accents in German and French. There are also Greek and Cyrillic alphabets. The 26 letters of the alphabet plus numbers and symbols used in America, where computers were created, can be expressed in ``1 byte (= 8 bits)'' or 256 pieces. For example, "0 (zero)" is assigned as "number 48," "A" is assigned as "number 65," and so on.
With the spread of computers, it became necessary to support multiple languages (multiple characters). If computers were not compatible with Japanese (kanji, hiragana, and katakana), computers would not have become so popular in Japan. However, there are many kanji. 1,026 characters are compulsory in elementary school. It is said that there are 100,000 kanji in total (Wikipedia " Kanji "). Therefore, in Japan, several codes have been determined by JIS (Japanese Industrial Standards, until 2019, Japanese Industrial Standards) (JIS code and Shift_JIS code).
There are also many other formats, such as those determined by ISO (International Organization for Standardization), EUC, and UTF (there are several of each). Furthermore, there are different character systems for each computer (OS), and standards for each type of smartphone (mobile phone). Furthermore, each software may have its own unique character system (for example, characters for displaying mathematical formulas).
Even the same character is assigned a different number depending on the character code, so the same number will be displayed differently if the character system is different. This is what is called "garbled characters." Multiple writing systems impede not only display but also ``search,'' an important computer function. There are two ways to display the character "ga". There are two types: ``ga'' (a single character) and ``ka'' + ``゛ (voiced mark)'' (there are also two or more input methods). Searching involves checking whether the numbers attached to the letters are the same (it is not comparing the "appearance" of the letters). Because the numbers attached to the letters are different, it may or may not be searched.
Under international standards, letter codes (numbers) are distributed to each country, but in Japan it is "numbered from ~ to ~". Then, each country that uses kanji (Japan, China, Korea, etc.) will be assigned a different number, so it is possible that kanji that are common in each country will be assigned a different number.
Furthermore, it would be impossible to completely encode (digitize) characters, including ``hentai kana'', ``alternative characters'', and even ``misprints''.
Assigning one number to each letter and unifying the numbers is effective for operating computers. Homogenization is very useful for domination. Students wear the same uniform and speak the same language. The "equality" of the ruled is necessary as a convenient tool for the ruler.
Character display
In computers, characters are exchanged and stored as numbers (digital data). Then, display the characters corresponding to that number on the screen. By preparing different fonts (such as Mincho or Gothic) for the same number, you can display the same text in various fonts.
There are several ways to display text. Bitmap fonts are the oldest fonts. The screen of a computer or smartphone is made up of "dots", so ultimately everything is displayed as a bitmap, but the data format is "number of vertical dots, number of horizontal dots are black (on), and adjacent ones are white." (off)" to specify all points. The other type is a "vector font," which expresses the lines (collection of dots) that make up characters by position and direction. Specify something like ``Start from here and extend in this direction (for example, a curve)''. With this, the line will be smooth no matter how much you enlarge it.
Furthermore, text may be exchanged as "images." This allows you to display an infinite number of characters, but the amount of data becomes enormous and ``searching'' becomes difficult. In other words, the usefulness (convenience) of digital data is significantly impaired.
search
The search seems simple, as it checks the presence or absence of numbers in letters. The speed is determined by the "size of data" and "algorithm". Naturally, searching becomes slower with large data (databases). Therefore, we make it easier to search when saving data. For example, if you save the numbers in ascending order, it will be easier to find the numbers, but if you rearrange the data every time you add it, it will take a long time to save (and storage media such as disks cannot be written to. - Rewriting takes several times longer than reading). Instead of rewriting the data itself, there is also a method of attaching an "index" to each piece of data and manipulating it. Paper dictionaries also have an index, right? There are many other ways to rank frequently searched items at the top. At that time, you can also operate the search engine to raise the ``items you want to find'' to the top.
If you look something up on Google, you'll get a lot of hits. Of course, there are many items that cannot be found, but there are some famous ones. The items can easily number in the tens of millions. I just searched for "Prime Minister" and found "approximately 18,700,000 results (0.56 seconds)". Perhaps those pages have the word "Prime Minister" in them. But it's impossible to confirm that. Of those, I think it's only the first dozen or so that I actually want to jump to (although I think it varies depending on the person and the issue).
Now, regarding search strings, there are restrictions on search words and display content depending on the search engine, and there are also various search methods. For example, in most search engines, if you enter "Prime Minister Fumio Kishida" separated by a "space", it becomes "all-inclusive page (and search)". There are also pages with "pages containing any of them (or search)". Some of them select pages that contain "any" of those characters. This will give you a huge amount of search results, but it will prevent you from missing anything. The same page may be hit for "Ai" and "Aa Ii Ai". "I" often hits the same page. There are input errors and misunderstandings, so if the number of data items is small, such "ambiguity" can be useful.
Text processing
Search is when you ask a computer a question and the computer answers the question. When you ask someone, ``What is a prime minister?'', it seems like they are thinking, ``Wait a minute, that's...'' and answering from their own memory (data). Then, when you type in "Who is the current Prime Minister?" it will say, "This is Fumio Kishida. Mr. Kishida has been the Prime Minister of Japan since October 4, 2021. "I was born on July 29, 1955, and my hometown is...".Then, you will have to analyze the structure of the Japanese you input.Unlike the alphabet, the words in Japanese are separated. Therefore, it may be necessary to break up the sentence above and add additional words, such as "Please tell me about the Prime Minister." Some people may enter "Konnichi" or "Currently".Some people may enter "Who is" instead of "Tell me". , ideally the same result should be obtained.
There is already an assumption here that ``sentences'' are made up of ``sentences,'' ``sentences'' are made up of ``words,'' and ``words'' are made up of ``letters.'' This is where you can see the difference in sensibilities between Western Europeans and Japanese people. The "n", "o", and "w" in "now" are "sounds" and have no meaning (generally), but the kanji "now" itself has a meaning. For Westerners, sentences are "connections of sounds," but for Japanese people, they are "connections of meaning." Language is originally a "connection of sounds." That is why, until recently, in the West, ``reading'' meant ``making sounds.'' This is also the case with Japanese people, because kanji have meanings, so you can understand the meaning without having to make sounds.
Now, next we have to think about the "meaning" based on those words, and at that time we need "grammatical structure (grammatical analysis)". Each word is classified into ``parts of speech,'' and the changes in endings and word order create meaning. If you can do this, you will get closer to "automatic translation".
input
There are various input methods. I think the first method is to "reconnect the wires (cords)." Later, paper tape and punch cards appeared, but in Europe and the United States there was a culture of typewriters, so they became keyboards. Furthermore, "voice input" and "image input" are also available.
Voice input requires "speech analysis technology". Everyone's voice is different, and so is their accent and intonation. Also, the same person does not always speak the same voice. With a keyboard, the data entered is the same even if the person is different or the person's physical condition is different. Once you have voice input and voice output, you will be able to have a conversation with your computer.
Text creation, AI
If we have come this far, is we just one step away from “AI”? Hmm, is that so? What is AI (artificial intelligence)?
In addition to searching recorded data, you also have to create new sentences. I don't know how to make it. Let's do a thought experiment. It is a character-based generation AI.
First, collect the sentences that will become the material. You can enter new sentences, but the internet world is full of countless sentences. Collect it. Most sites are written to be read by many people, so there are no restrictions on viewing. I'll collect it all. At this time, score the sentences based on the reliability of the page. Articles from websites created by the government are numbered ``100,'' articles created by newspapers and other mass media are rated ``50,'' Internet articles are ``-30,'' and articles whose author is unknown are ``-50.'' Multiply by the number of pages that refer to that page. You can also take into account the number of views. You can add as many conditions as you like, such as adding a positive value to sites (pages) with newer creation dates and negative values to older sites.
Analyze the question text. Part-of-speech decomposition, etc. It will then increase the score of sites that include that word.
Next, parse each sentence. It breaks down the words into parts of speech and calculates the frequency of connections between each word. The connection between "dog and barking" should be stronger than "cat and barking". This is done between words and sentences, and the probability is calculated for each. At this time, add or subtract from "prohibited words" and "recommended words". Negative items such as ``bombs/manufacturing methods,'' ``drugs/information obtained,'' ``terrorism/execution plans,'' etc., and positive items such as ``(product name)/popularity'' or ``(dish name)/delicious.'' ” can be done at any stage).
Now, as we form the sentences, it would be better to give some basic examples. These include "nouns, particles, and verbs" and "nouns, particles, and adjectives." Insert (substitute) words with a high probability of being ``same'' there. The rating for the sentence is multiplied by the score of the site where the word appears. Next, calculate the probability of a connection between the resulting sentences. I wonder if I should just multiply the probability of each word by the probability (score) of the site where it appeared at the same time.
Do ``That's Such a Thing'' and present the sentence with the highest ``correctness (that is, score/probability)'' to the questioner. Online data is constantly updated (mostly increased). Probability also changes all the time. The probability varies depending on the number of lines of information to be answered and the number of words in the question. It would probably be more interesting to use a "random variable (random number)" as well.
In reality, it's probably a much more complex operation. You can do the same thing with images and audio. Using a large amount of raw data, it calculates the connections between each color and phoneme. Then, apply colors and sounds with high probability to the basic examples (age, gender, etc.). However, the "correct answer" for these is more ambiguous than the text.
For the test, enter a question for which you know the answer. If the answer is different from what you expected, you will need to make adjustments, but the average questioner doesn't know the answer, so they don't care about the details (lol).
Do computers have thoughts and emotions?
Sentences that undergo these operations should end up being ``common'', ``ordinary'', ``common sense'', and ``average'' sentences in the current world. But I don't know if that's the case. It may be that the AI creator's thoughts and sponsors are being "considered". When you search, products that have little to no relation to you may be displayed as "recommended". If you think about it, it might be a product that you searched for on some shopping site before. At that time, I wanted it and searched for it, but I forgot about it without buying it. When it is presented to me again, I want it again (lol). I guess my previous browsing history is leaking from my IP address. Search engines and AI-generating programs can do the same thing. In other words, "information you want (or think you want)" is displayed with priority. It is far from "typical" or "average".
Do computers have thoughts and emotions? It is very easy to create a program that when you input ``1 gram'' it responds with ``light'' and when you input ``1 ton'' it responds with ``heavy.'' It's also easy to make them give the opposite answer. It's easy to attach a "pressure sensor" or "gyro sensor" to your computer and program it to display (or make it say) "It hurts!" when you hit someone. It's easy to get them to respond with "It feels good!" or "More!" But that's not "computer emotion", is it? But it looks like they have emotions. The reason why I think so may be because human relationships are becoming more computer-like.
If you can talk to your doll, you may become even more attached to it than you already are. However, you can still get attached to your doll even if you can't talk to it. Even if it's not human-shaped, even if it's just a stick or a stone, you can still feel attached to it. Wasn't it like that when you were a child? What a rich heart!
Computers are becoming more and more convenient. I can't help but feel like what I'm losing in exchange for this convenience is ``human relationships as rich as a child's heart.''
Computers, nuclear power, and knives may all be a matter of "how to use them." But, "How do I use it?" I think it is the heart that decides this.
table of contents
Since I was unable to mention the contents of this paper, I will list the table of contents.