Hi, Bennylin! I'm glad your project has been created as a separate project. I imported by category yesterday. If your time permits, could you please check if there are any missing pages? Some pages not classified by category may be missing, so I need your help. Thank you! --Sotiale (rembugan) 8 Agustus 2021 22.46 (UTC)
- (The main page will be overwritten after everything is done!) --Sotiale (rembugan) 8 Agustus 2021 22.56 (UTC)
- Thank you so very much. Yes, I noticed some missing pages, like oldwikisource:Jv/Serat_Asmaralaya and several templates (oldwikisource:Template:Serat, oldwikisource:Template:Justify), but it turns out they were not categorized under Jv. So yeah, probably there are still some pages left with prefix Jv/ in oldwikisource that hasn't been imported yet, but I can't tell easily without going through all links. I will go through all the indexes first tomorrow. Bennylin (rembugan) 8 Agustus 2021 22.59 (UTC)
- Thank you! I look forward to your confirmation. If you let me know the list, I'll get it done quickly. And please do not create unimported pages in this wiki(this can make the import double). If you have already created one, please let me know separately. I appreciate your cooperation. Have a good time! --Sotiale (rembugan) 9 Agustus 2021 00.13 (UTC)
Nyuwun tulung dhelikké irah-irahan "Wikisumber:Pendhapa" ing kaca ngarep lan tulung tambah footer ing kaca pasumbanging naraguna kaya ing Wikisastra dhèk wingi kaé. Matur nuwun. — Labdajiwa (rembugan) 29 Agustus 2021 09.45 (UTC)
Mas Bennylin, yèn ID #contentSub dikustom, WikidataInfo.js bakal kenèkan merga gadget-é nganggo ID selector kuwi. Asil WikidataInfo.js bakal diubengi arèa kothong kaya ngéné ki: https://imgur.com/a/c4obQ5O. — Labdajiwa (rembugan) 12 Novèmber 2021 00.00 (UTC)
- Uga ana ing katrangan kaca sing kaelih (ing incognito, tanpa gadget): https://i.imgur.com/X77bVSD.png. — Labdajiwa (rembugan) 12 Novèmber 2021 00.45 (UTC)
- @Labdajiwa: Ya, sing bab pengalihan aku wis ngerti. Nanging ya piye maneh (asal ora ngrusak tampilan banget, lan katrangan kaca sing kaelih ora pati umum, apa maneh gadget Wikidata ora tampil kanggo pengunjung biasa); nek ora, taling, pangkon, pengkal, cakra, suku, keret, lan pasangan ing judul kaca bakal numpuk karo pranala kaca utamane, contone ing: ꦣꦺꦏ꧀ꦭꦫꦱꦶ_ꦲ꦳ꦩ꧀_ꦦꦺꦨꦺꦨꦺ karo kaca-kaca ing Kategori:ꦏꦶꦠꦧ꧀ꦱꦸꦕꦶꦥꦽꦗꦚ꧀ꦗꦶꦪꦤ꧀ꦲꦚꦂ (cuba diilangi margin'e ing perambanmu). Nek ana cara liyane, mangga. Bennylin (rembugan) 15 Novèmber 2021 12.00 (UTC)
Sebuah saran untuk mempermudah pembuatan halaman indeks. Untuk language, isi defaultnya jadi jv. Untuk source, bisa copy dari sini, paling tidak other, pdf, sama djvu. Terima kasih. Mnafisalmukhdi1 (rembugan) 8 April 2022 08.55 (UTC)
editan kulo diabritaken lajeng dikuningaken maleh kalian pengguna niki, padahal umpami wonten kesalahan saged di peneraken lajeng diijoken mboten kedah diabritaken. menika suntingan kulo ingkang saderenge kuning tapi diabritaken maleh:
(published with permission)
Hi! My name is Oreen Yousuf and I'm a language engineer and an enthusiast for indigenous scripts. I've been reading about the many Brahmic scripts invented in maritime Southeast Asia like Javanese, Sundanese, Baybayin, etc. and became teaching myself the basics of Aksara Jawa. The reason for this, other than just being an enthusiast, is formulating research questions on the potential benefits of indigenous script use in various fields of natural language processing. I initially started working with the Somali language and their indigenous Osmanya script and soon thought to expand the potential scope of the research to other scripts that have already been encoded into Unicode. My research question is how well indigenous scripts perform when compared to their Latin counterparts. My hope is for good results for the indigenous scripts to highlight how these scripts can be inherently beneficial for technical use (on top of being culturally important). I made a very rule-based transliteration script for Somali because there is very little ambiguity between the Latin and Osmanya scripts. When I started studying Javanese I realized I couldn't afford to do the same due to the seemingly various ways to transliterate into Aksara Jawa without having intimate knowledge of Javanese itself. Murda and Rekan being for names and foreign words, whereas Swara and Sandhangan modifying vowels in different contexts. Due to these reasons, I thought to create language models to learn how to automatically transliterate Latin into Aksara Jawa, but I would need enough parallel data between the two scripts (meaning perfect transliterations of several sentences in both scripts). Then I found your transliteration system and was overjoyed. Thank you very much for creating it. I also want to ask a couple of questions about your system: 1) Do you plan to continue refinement of the system? If so, what do you plan to add/update? 2) What are the best settings to make the most accurate transliteration with your system? I assume with Mudra and no spaces as I've read about the script. But what of Mode Ketik/Kopas, and with/without diftong? 3) Is there a limit on how long a piece of Latin Text can be transliterated? Also, do you know if parallel Jawa data exists between Latin and Aksara Jawa? This would be invaluable to evaluate any language system that tries to use Aksara Jawa. Lastly, I read that the Kongres Aksara Jawa met last year to decide several things about the script, including transliteration guidelines (for Latin and Pegon), and especially prasajan and lawasan guidelines to write in the script. Do you know if these transliteration and writing guidelines have been finalized? If so, have they been made available online to read? If they are to be the "official" way to use the script I'd like to adhere to them when including Javanese in my project. I'd appreciate any of your advice. Thank you again, Oreen Yousuf
Hi, thank you for contacting me. Greetings from Java, and I'm excited to see others taking interest in Javanese script, especially those without prior knowledge to it. I saw that you've studied the script quite a bit, and I'm more than happy to provide assistance wherever I can. If you need to reach me for quick questions outside of emails, feel free to connect to my telegram @bennylin.
Some of the first things to know before approaching Javanese script that it is, unlike Latin script, very much what-you-hear-is-what-you-write, so different pronunciations may produce different result, and indeed there are many varieties of how to write a word, a place name, etc. in Javanese script. So, taking this into account, when people try to use my transliteration, they often stumbled into error and/or produce incorrect results because they assume that the script behave similarly to alphabet.
I would also say that the Javanese script could be used to write foreign (ie. non-Javanese language) words, as long as it adheres to the pronunciation rules, and indeed I've seen many building signs using Javanese script to transliterate Indonesian and English names, to various degree of results. This touches back to what you say about Murda and Rekan. In my view, they're optional and one could write just fine without utilizing them. The first (Murda) is often mistaken as "capital letters" and people use aksara Murdas to transliterate the first letter/syllable of their names, for example. This is incorrect as a name may consist of several Murdas (even in the middle syllables) and regular people didn't get to use Murdas for their names (only for great figures and some place names). Of course, if one to trace their history, one could say that these aksaras are "artefacts" from old Javanese to distinguish different syllables (for example "bh" in Bharatayudha) and in modern Javanese those distinction didn't exist anymore (there are no modern words using "bh" sound), and we're left with "unused" aksaras that get repurposed for transliterating names. Then aksara Rekans are used to transliterate mainly Latin "f", "v" "z", some Arabic syllables, and in the official documentations also Chinese syllables, but as a Chinese-Indonesian myself I can attest that none of the usages for Chinese are ever used (except in one or two source documents, of which I've never seen), incomplete to express the modern pronunciation of Chinese, and almost inconvenient to write/type that nobody bothers about them. The Latins and Arabic could be written without the rekan (as alternate forms) just fine, because the original Javanese language lacked those sounds, but (mostly the Latin letters) has entered Javanese via Indonesian language. Plus it didn't acknowledge or give provisions to other that those 3 languages/writing scripts, including Indic and other Brahmic scripts.
Now touching the core of your inquiry: automatic transliteration is indeed possible, and I have done it as far back as 2013 to translate the whole modern Javanese Bible (in Latin) into Javanese script just by throwing it to my transliteration (1st try). It was not perfect, and I've identified several ways to improve them. I employed a php-based script and a database of every words that need special way of handling (such as names), and got a better result (2nd try). It could be improved more, and I have thought of some rules for, say, irregular transliterations, but I haven't implemented them yet. . (I will try sending this email without attachment first, see if it can reach you. Probably the big attachment was blocked by your email or something). And I have also started another Bible transliteration project in Javanese Wikisource (they're in Suriname Javanese though):
- https://jv.wikisource.org/wiki/Kitab_Sutyi_Prejanjian_Anyar (Latin)
- https://jv.wikisource.org/wiki/ꦏꦶꦠꦧ꧀ꦱꦸꦕꦶꦥꦽꦗꦚ꧀ꦗꦶꦪꦤ꧀ꦲꦚꦂ (Aksara)
Parallel data do exist, mainly from an organization called Yayasan Sastra Lestari (sastra.org), and have been utilized in some areas, and I expect to continue utilizing them if I have more time and more help from the community. One example is the Javanese script magazine called "Kajawen" (https://jv.wikisource.org/wiki/Kajawen).
So, Sastra.org have transliterated the magazine editions since 2010 (ongoing, there are hundreds of them), this is the earliest magazine in their collection: https://www.sastra.org/koran-majalah-dan-jurnal/kajawen/366-kajawen-balai-pustaka-1927-02-10-10,, then we partnered with them to scan their magazine collection and upload them to Wikimedia Commons with the hope that the community will proofread the original Javanese script into Javanese Wikisource. The result is https://jv.wikisource.org/wiki/Kajawen_006. So far we've managed to do about 12 editions in Javanese script (while Sastra.org has transliterated many more) because of lack of manpower. In absence of any OCR tool, and considering writing (typing) the scripts one by one would take immense amount of time and energy, I opted to use my transliteration to transliterate back the Sastra.org's transliteration back to Javanese script. This is not perfect, mind you there is no reviewer, but we did it with the things I've learned from my 2nd try above, and you could say this is the 3rd iteration that I've never implemented in the first project I mentioned above. The main improvement is the pre-identification of "irregular" transliterations I've mentioned above, and do the correction in the Latin source, before throwing the text into the automatic transliteration tool. Just an example, the word "Nulisa" ("write", imperative voice) would be incorrectly transliterated to ꦤꦸꦭꦶꦱ, while the correct form ꦤꦸꦭꦶꦱ꧀ꦱ if transliterated back to Latin would be "nulissa". This is because the root word "tulis" got affixed by prefix "n-" and suffix "-a", and when the root word ends with a consonant and suffixed with "-a" or "-i" the consonant would be pronounce doubly /nu lis sa/ thus ꦤꦸꦭꦶꦱ꧀ꦱ, instead of carried to the last syllable /nu li sa/ ꦤꦸꦭꦶꦱ (wrong). Therefore we edited all the occurences of "nulisa" to "nulissa" in the Latin text before handing them to the transliterator. There are hundreds of those kind of irregulars and our work in Kajawen handled them in this way. So most of the work is identifying them, and the rest is just machine transliteration. It is my hope that my transliteration could take this kind of transliteration rule into consideration, but the hindrance is that it is really language dependent, so I haven't thought about how to implement it until now, and leaving it to the user to decide. [Notes to self: also, /ɔ/ in names]
Other, more recent projects are in https://jv.wikisource.org/wiki/Wikisumber:Buku_anyar, there is a list of books, mostly in Latin, that I and the community have transliterated to Javanese script. For example, a short children story: https://jv.wikisource.org/wiki/Pengalaman%C3%A9_tikus_cilik. These are relatively easy to type, so I employed the transliteration built into the MediaWiki (the software powering Wikipedia projects, including Wikisource), that I've developed in 2010s, and typing them manually. The transliteration could be considered "perfect" in this sense, because it's all human (re-)typing (in different script), so there's almost no error that I'm aware of.
Early this year, Wikisource organized a writing competition where many people participated, mainly in Latin, and some in Javanese. Here's some result:
- https://jv.wikisource.org/wiki/Ambahrawa (a poem, part of a book) The Latin part is proofed by competition member, while I transliterated it into Javanese script (https://jv.wikisource.org/wiki/ꦲꦩ꧀ꦧꦃꦫꦮ). As with the children story, this is quite perfect and done manually, as a proof of concept that all of the poems could be done in two column format.
The second one is a bit special. It's called Serat Pararaton, a history book of the origins of Javanese kings. It has two source, one Javanese script (https://jv.wikisource.org/wiki/Indh%C3%A8ks:Serat_Pararaton.pdf), another is in Latin (https://jv.wikisource.org/wiki/Indh%C3%A8ks:Verhandelingen_van_het_Bataviaasch_Genootschap_van_Kunsten_en_Wetenschappen,_XLIX.pdf, pp. 29-60). The Javanese proofreading was done by a competition member, showing that there's at least other people interested in doing it. He did it using the method I gave him, which is use Sastra.org's Latin transliteration (https://www.sastra.org/kisah-cerita-dan-kronikal/cerita/3066-pararaton-mangkudimeja-1912-13-1053-jilid-1) as source, then re-transliterate back to Javanese script using my built-in transliteration. Thus, the result above could be compared to the Latin https://jv.wikisource.org/wiki/Serat_Pararaton. But remember: 1) they came from 2 different sources, and 2) it's old/middle Javanese, so there might be differences with modern Javanese. (You may use the sastra.org's version instead of the Dutch one). The said contributor who proofread the Javanese script did found some errors in my MediaWiki transliterator, but I'm only collecting them at this moment, I haven't had any time to touch the codes again.
Other texts that I have transliterated:
- https://jv.wikisource.org/wiki/D%C3%A9klarasi_HAM_PBB (UDHR) - https://jv.wikisource.org/wiki/%EA%A6%A3%EA%A6%BA%EA%A6%8F%EA%A7%80%EA%A6%AD%EA%A6%AB%EA%A6%B1%EA%A6%B6_%EA%A6%B2%EA%A6%B3%EA%A6%A9%EA%A7%80_%EA%A6%A6%EA%A6%BA%EA%A6%A8%EA%A6%BA%EA%A6%A8%EA%A6%BA - I give this a 95%, because although I've done it manually, the texts are quite long and includes many technical/borrowed terms.
- (TODO) I wanted to do both of these texts, since they're quite famous, and I've got both the Javanese script document, and the Latin transliteration, so it's just a matter of throwing the text to the transliterator, but haven't found the time to do so. You might try it if you want.
- https://jv.wikisource.org/wiki/Serat_Rangsang_Tuban - Random page I transliterated in passing
- https://jv.wikisource.org/wiki/Kakawin_N%C4%81garak%E1%B9%9Bt%C3%A2gama (an old Javanese text) maybe 75% ish accuracy
- https://jv.wikisource.org/wiki/Kakawin_Arjunawiw%C4%81ha (idem)
- https://jv.wikisource.org/wiki/Y%C3%A9sus_Panguculan - Here, I tried to transliterate a whole comic book. The result could be said 100% accurate. Note that this is in Suriname Javanese language.
- https://jv.wikisource.org/wiki/Wawaton_panyeratipun_tembung_Jawi_mawi_sastra_Jawi_dalasan_angka - this is the "rule" for writing Javanese script from 1926, albeit doesn't cover fringe cases and still arouse a lot of questions and opposition from the people who wishes to use the pre-1926 rule. I didn't transliterate the whole document, since it uses Javanese script and Latin in it, but instead I played around with Ruby; if you don't familiar with it, I can explain it.
Other than those from Wikisource, which I think is mainly what you're looking for ("perfect" transliteration in both scripts). Then there are some in other projects as well, namely Wikipedia and Wiktionary.
For Wiktionary, they are just lemmas, and I transliterated the headwords and definition of the whole dictionary to Javanese script, but they reside in my computer only. The pages that I upload to Wiktionary are mainly simple transliterations (several hundreds if not thousands): https://jv.wiktionary.org/w/index.php?title=Mirunggan:Pratélan_kaca&from=ꦄ&to=&namespace=0
For Wikipedia, the main issue is that the (manual) transliteration is done for one snapshot of (source-Latin) page. So if the source get modified, the transliteration won't necessarily did. You can take a look at all the pages here (sorted "alphabetically"): https://jv.wikipedia.org/w/index.php?title=Mirunggan:Prat%C3%A9lan_kaca&from=%EA%A6%84&to=&namespace=0. There's not a lot of them yet. Some pages of concern: Biography article: https://jv.wikipedia.org/wiki/%EA%A6%97%EA%A6%8F%EA%A6%AE%EA%A6%B6%EA%A6%A2%EA%A6%A2 - the first Wikipedia page in Javanese script, from 2013. Hasn't been updated since then. I'd say about 90% accuracy. From this version: http://jv.wikipedia.org/w/index.php?title=Joko_Widodo&oldid=840456 The Wikipedia pages that are written in both scripts would use a template on the top of the page called Multiscript. These are all the pages that uses the template: https://jv.wikipedia.org/w/index.php?title=Mirunggan:Pranala_mr%C3%A9n%C3%A9/Cithakan:Multiscript&namespace=0&hideredirs=1&limit=500
What I wish to see in Javanese Wikipedia is not manual transliteration, but automatic transliteration. Thus, the Javanese script version would automatically use the latest Latin version as source and present it. But there are many technical challenges, although some other script that I know have managed to do this.
Next, answering your questions:
- Do you plan to continue refinement of the system? If so, what do you plan to add/update?
- As of now, I feel that I don't have time to make any major changes. What I wanted actually is some layout changes. I still did some minor changes here and there: https://jv.wikipedia.org/w/index.php?title=Naraguna:Bennylin/trans.js&action=history but overhaul is already overdue, and I probably won't fix it until it broke. What I want to update code-wise, in no specific order: (1) detect or give option for which "language" is being transliterated. As I mentioned, the rule for transliterating Javanese languauge would be different from transliterating Indonesian language, or others. (2) adapt/use a dictionary to detect if a word is a Javanese word, find the root word, and determine the (correct) transliteration based on the root word. (3) provide some tips as people type, or the correct way to type something; I've recorded some tutorials for the Wikisource contest, I just haven't had time to embed it on the page. (4) harmonize the code between the github and mediawiki, since they both uses different codebase, and the second has many mistakes and haven't been touched sincen 2013. (5) provide transliteration to/from other Indonesian scripts, ie. Balinese, Batak, Sunda, etc. to/from Javanese script. (6) provide transliteration to/from foreign scripts: (Brahmic) Indian scripts, Thai script, (CJK) Chinese, Japanese, Korean. (7) fixing known bugs and errors on my list, such as the rule for pangkon+pada lungsi/pada lingsa.
- Unfortunately these are very low on my to do list, and/or I haven't had the impetus to work on any of them.
- Instead of continue working on them, I'm more interested in providing Javanese script keyboard, which is higher in priority, but alas, also low on my to do list.
- My dream project (since 2009) is to make the automatic transliteration in Javanese Wikipedia.
- What are the best settings to make the most accurate transliteration with your system? I assume with Mudra and no spaces as I've read about the script. But what of Mode Ketik/Kopas, and with/without diftong?
- This is pretty much based on the user preference and what they're trying to achive. It also because there are conflicting rules and different people want to use different rules. My preference is Mode Ketik if I'm typing, and Mode Kopas if I'm copy pasting. Usually I didn't use murda, diphtongs nor spaces. (therefore default values).
- People use Murda when they feel that names should be "capitalized" properly (they don't). I almost never use them in copy-paste mode, since the result could be weird (i.e. first letter in a sentence __should not__ use murda). Sometimes I use them when I'm in typing mode, because I can make sure which one should use Murda.
- Others may use diphtongs (beside, it's available and documented well), it's just that in modern times (a.k.a. school textbooks) no one do that anymore. Also, I provide that option, because of people who transliterated from Indonesia, which employs diphtongs quite a lot, therefore makes the result distinct from Javanese language.
- Others use space, because, well, they're "modern" people influenced by Indonesian and English, and feel that "this is the way". Also, spaces make it easy to break the lines, while scriptio continua (even with ZWJ) is hell for paragraph justifications.
- Is there a limit on how long a piece of Latin Text can be transliterated?
- None. I've tried this with the Bible transliteration, it literally takes several minutes of frozen screen for the whole text to be transliterated, but eventually the result is displayed. It depends on your CPU/memory. I haven't done any in-depth analysis on this, but for day-to-day usage, surely no one would copy paste something longer than several paragraphs.
- Oh, I just think of something else to improve: (8) paragraph breaker, and auto start-of-paragraph marker. Right now people still have to input manually the marker with a pipe (|) character. This is a must in print texts, but some modern books don't use them anymore, since it's already obvious from double spacing between paragraphs and other visual cues.
- do you know if parallel Jawa data exists between Latin and Aksara Jawa? This would be invaluable to evaluate any language system that tries to use Aksara Jawa.
- - Try browse the sastra.org's library. They keep adding stuff daily and I'm familiar with their inner workings. Their quality is good, and they often catch errors in the original documents, which they put as footer/transliterator's note in the bottom.
- - Wikisource as I mentioned above; I envision it as the go-to place for people who are interested in transliterating (or proofing) Javanese script.
- - There is a new book publisher that have printed bi-script children books, called Lingkarantarnusa: https://www.instagram.com/stories/highlights/17907691799224291/
- - And some random websites, which unfortunately may (and have) dissapear(ed) any time. Let me know if you need any specific kind of text.
- Do you know if these transliteration and writing guidelines have been finalized? If so, have they been made available online to read?
- The transliteration guidelines, called JGST (Javanese General System of Transliteration), mirroring IAST (International Alphabet of Sanskrit Transliteration) haven't been uploaded to the Kongres website (https://kongresaksarajawa.id/undhuhan/). I can only find the news about it here: https://www.kratonjogja.id/ragam/52-pedoman-transliterasi-aksara-jawa-latin/ (download link broken: https://kapustakan.kratonjogja.id/notasi/tabel_sistem_transliterasi_aksara_jawa_latin_jgst.pdf). There's a report on this magazine: (https://budaya.jogjaprov.go.id/attachment/view?id=3822&&filename=COVER-ISI%20SEMPULUR%202021%20#1_(23X30%20cm)CETAK%20FINAL_Optimized.pdf pp12-15). I believe it's still in the process of being approved by Unicode consortium and/or other relevant orgs. The congress organizer tried to name it JGST, but so far (2 years) it didn't stick, and very few sources refer it as such (other than themselves). It may reflect to the fact that the event and result was a little bit rushed, and there were 2 major blocks within the congress, and the ensued result didn't really satisfy both hardliners. I was on the keyboard and font committee, and it has resulted in an Indonesian National Standard certification. I'm still largely following the 1926 rule. Currently I didn't take it as the 2022 result overriding the 1926 ones, rather completing it and harmonizing with the previous rules. Once it has been approved and recognized internationally (there might be modifications), I'll make sure to make it available and searchable worldwide via Wikipedia. I suggest others (including you) to adhere to 1926 version as well, as any changes generally takes several years to be adopted en masse.