feat: adjustment on the model

2025-03-20 12:35:01 +07:00 · 2025-03-20 12:35:01 +07:00 · e072974be7
parent 5b5b614897
commit e072974be7
16 changed files with 829 additions and 869 deletions
--- a/answer_padded.npy
+++ b/answer_padded.npy
--- a/broken_dataset.txt
+++ b/broken_dataset.txt
@ -1 +0,0 @@
 Katak mengalami metamorfosis dari telur, berudu, katak muda, hingga katak dewasa.,multiple_choice,Tahapan apakah yang termasuk dalam metamorfosis katak?,Berudu,Telur|Berudu|Pupa|Imago
--- a/context_padded.npy
+++ b/context_padded.npy
--- a/dataset/training_dataset.json
+++ b/dataset/training_dataset.json
@ -1,7 +1,7 @@
 [
  {
    "context": "Albert Einstein adalah fisikawan teoretis kelahiran Jerman yang mengembangkan teori relativitas, salah satu dari dua pilar fisika modern. Karyanya juga dikenal karena pengaruhnya terhadap filosofi ilmu pengetahuan. Ia menerima Penghargaan Nobel dalam Fisika pada tahun 1921 atas jasanya dalam fisika teoretis.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Siapa yang mengembangkan teori relativitas?",
@ -22,7 +22,7 @@
  },
  {
    "context": "Samudra Pasifik adalah yang terbesar dan terdalam di antara divisi samudra di Bumi. Samudra ini membentang dari Samudra Arktik di utara hingga Samudra Selatan di selatan dan berbatasan dengan Asia dan Australia di barat serta Amerika di timur.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Samudra _______ adalah yang terbesar dan terdalam.",
@ -43,7 +43,7 @@
  },
  {
    "context": "Proklamasi Kemerdekaan Indonesia dibacakan pada tanggal 17 Agustus 1945 oleh Soekarno dan Mohammad Hatta di Jakarta. Peristiwa ini menandai lahirnya negara Indonesia yang merdeka dari penjajahan.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Proklamasi Kemerdekaan Indonesia terjadi pada tanggal _______.",
@ -69,7 +69,7 @@
  },
  {
    "context": "Hukum Newton adalah tiga hukum fisika yang menjadi dasar mekanika klasik. Hukum pertama menyatakan bahwa suatu benda akan tetap diam atau bergerak lurus beraturan kecuali ada gaya luar yang bekerja padanya. Hukum kedua menyatakan bahwa percepatan suatu benda berbanding lurus dengan gaya yang bekerja padanya dan berbanding terbalik dengan massanya. Hukum ketiga menyatakan bahwa setiap aksi memiliki reaksi yang sama besar tetapi berlawanan arah.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Hukum Newton terdiri dari _______ hukum.",
@ -95,7 +95,7 @@
  },
  {
    "context": "Budi Utomo adalah organisasi pemuda yang didirikan pada 20 Mei 1908 oleh dr. Wahidin Sudirohusodo dan para mahasiswa STOVIA. Organisasi ini bertujuan untuk meningkatkan pendidikan dan kesejahteraan rakyat Indonesia serta menjadi tonggak awal kebangkitan nasional.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Budi Utomo didirikan pada tanggal _______.",
@ -121,7 +121,7 @@
  },
  {
    "context": "Ki Hajar Dewantara adalah pelopor pendidikan di Indonesia dan pendiri Taman Siswa. Ia dikenal dengan semboyannya 'Ing Ngarsa Sung Tuladha, Ing Madya Mangun Karsa, Tut Wuri Handayani', yang menekankan peran guru dalam pendidikan.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Ki Hajar Dewantara mendirikan _______.",
@ -147,7 +147,7 @@
  },
  {
    "context": "Teori evolusi dikembangkan oleh Charles Darwin dan dijelaskan dalam bukunya 'On the Origin of Species' yang diterbitkan pada tahun 1859. Teori ini menyatakan bahwa spesies berevolusi melalui seleksi alam, di mana individu dengan karakteristik yang lebih baik memiliki peluang lebih tinggi untuk bertahan hidup dan berkembang biak.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Teori evolusi dikembangkan oleh?",
@ -173,7 +173,7 @@
  },
  {
    "context": "BPUPKI (Badan Penyelidik Usaha-Usaha Persiapan Kemerdekaan Indonesia) dibentuk oleh pemerintah Jepang pada 29 April 1945 sebagai bagian dari janji Jepang untuk memberikan kemerdekaan kepada Indonesia. Pembentukan BPUPKI terjadi pada masa Perang Dunia II, ketika Jepang mulai mengalami kekalahan dari Sekutu dan ingin mendapatkan dukungan dari rakyat Indonesia.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Apa kepanjangan dari BPUPKI?",
@ -194,7 +194,7 @@
  },
  {
    "context": "Kerajaan Majapahit adalah kerajaan besar Hindu-Buddha yang berpusat di Jawa Timur, berdiri sekitar tahun 1293 hingga 1500 M. Majapahit mencapai puncak kejayaan di bawah pemerintahan Raja Hayam Wuruk dengan patihnya, Gajah Mada.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Kerajaan Majapahit mencapai puncak kejayaannya di bawah raja _______.",
@ -215,7 +215,7 @@
  },
  {
    "context": "Kerajaan Sriwijaya adalah kerajaan maritim yang berpusat di Sumatera Selatan dari abad ke-7 hingga abad ke-13. Kerajaan ini menjadi pusat perdagangan dan penyebaran agama Buddha terbesar di Asia Tenggara.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Kerajaan Sriwijaya adalah pusat penyebaran agama _______ terbesar di Asia Tenggara.",
@ -236,7 +236,7 @@
  },
  {
    "context": "Candi Borobudur adalah candi Buddha terbesar di dunia yang terletak di Magelang, Jawa Tengah. Dibangun pada abad ke-8 oleh Wangsa Sailendra, candi ini merupakan simbol puncak kebudayaan Buddha di Jawa.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Candi Borobudur dibangun oleh wangsa _______.",
@ -257,7 +257,7 @@
  },
  {
    "context": "VOC (Vereenigde Oostindische Compagnie) adalah perusahaan dagang Belanda yang memonopoli perdagangan rempah-rempah di Nusantara dari abad ke-17 hingga awal abad ke-18.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "VOC adalah singkatan dari _______.",
@ -278,7 +278,7 @@
  },
  {
    "context": "Pertempuran Surabaya terjadi pada 10 November 1945 antara pasukan Indonesia melawan pasukan sekutu Inggris yang berusaha mengambil alih kota setelah Jepang menyerah dalam Perang Dunia II. Pertempuran ini dikenang sebagai Hari Pahlawan di Indonesia.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Pertempuran Surabaya terjadi pada tanggal _______.",
@ -299,7 +299,7 @@
  },
  {
    "context": "Kerajaan Demak adalah kerajaan Islam pertama di Jawa yang berdiri pada akhir abad ke-15. Kerajaan ini terkenal karena penyebaran agama Islam di Jawa melalui Wali Songo.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Kerajaan Demak adalah kerajaan Islam pertama di Jawa.",
@ -314,7 +314,7 @@
  },
  {
    "context": "Sumpah Pemuda terjadi pada 28 Oktober 1928, di mana pemuda Indonesia berikrar untuk bersatu dalam satu tanah air, satu bangsa, dan satu bahasa Indonesia.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Sumpah Pemuda menyatakan persatuan dalam satu agama.",
@ -329,7 +329,7 @@
  },
  {
    "context": "Gajah Mada adalah seorang patih terkenal dari Kerajaan Majapahit yang berhasil menyatukan sebagian besar wilayah Nusantara melalui politik ekspansinya. Ia terkenal dengan sumpahnya yang disebut Sumpah Palapa.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Gajah Mada berasal dari Kerajaan Majapahit.",
@ -344,7 +344,7 @@
  },
  {
    "context": "Kerajaan Aceh mencapai puncak kejayaannya di bawah pemerintahan Sultan Iskandar Muda pada abad ke-17. Aceh menjadi pusat perdagangan dan kebudayaan Islam di wilayah Nusantara.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Siapakah sultan yang membawa Kerajaan Aceh ke puncak kejayaan?",
@ -360,7 +360,7 @@
  },
  {
    "context": "Perang Diponegoro berlangsung dari tahun 1825 hingga 1830. Perang ini dipimpin oleh Pangeran Diponegoro melawan pemerintah kolonial Belanda di Jawa Tengah.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Perang Diponegoro berlangsung selama lima tahun.",
@ -375,7 +375,7 @@
  },
  {
    "context": "Candi Prambanan adalah candi Hindu terbesar di Indonesia yang terletak di perbatasan antara Yogyakarta dan Jawa Tengah. Dibangun pada abad ke-9, candi ini merupakan peninggalan Kerajaan Mataram Kuno.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Candi Prambanan dibangun pada abad ke-_______.",
@ -385,7 +385,7 @@
  },
  {
    "context": "Pangeran Antasari adalah pahlawan nasional Indonesia yang memimpin perlawanan rakyat Kalimantan Selatan terhadap penjajahan Belanda pada abad ke-19.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Pangeran Antasari berasal dari Kalimantan Selatan.",
@ -401,7 +401,7 @@
  },
  {
    "context": "Perjanjian Linggarjati adalah perjanjian yang ditandatangani pada 25 Maret 1947 antara Indonesia dengan Belanda. Perjanjian ini mengakui secara de facto Republik Indonesia yang mencakup Jawa, Sumatra, dan Madura.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Perjanjian Linggarjati ditandatangani pada tahun 1947.",
@ -411,7 +411,7 @@
  },
  {
    "context": "Raden Adjeng Kartini adalah tokoh emansipasi wanita Indonesia yang lahir pada 21 April 1879. Ia dikenal melalui surat-suratnya yang memperjuangkan hak perempuan untuk memperoleh pendidikan dan kesetaraan.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Tanggal berapa diperingati sebagai Hari Kartini di Indonesia?",
@ -422,7 +422,7 @@
  },
  {
    "context": "Kerajaan Kutai merupakan kerajaan Hindu tertua di Indonesia yang berdiri sekitar abad ke-4 di Kalimantan Timur. Bukti keberadaan kerajaan ini ditemukan dalam prasasti Yupa.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Kerajaan Kutai adalah kerajaan Hindu tertua di Indonesia yang ditemukan melalui prasasti _______.",
@ -437,7 +437,7 @@
  },
  {
    "context": "Raden Ajeng Kartini merupakan tokoh penting dalam sejarah perjuangan emansipasi wanita Indonesia. Ia lahir di Jepara dan dikenal melalui bukunya berjudul 'Habis Gelap Terbitlah Terang'.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Kartini dikenal sebagai pejuang emansipasi wanita.",
@ -447,7 +447,7 @@
  },
  {
    "context": "Ekspedisi Palembang dilakukan oleh VOC pada tahun 1659 untuk menguasai perdagangan lada di Sumatera Selatan. Ekspedisi ini berakhir dengan kemenangan VOC dan penegakan monopoli lada di daerah tersebut.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Ekspedisi Palembang terjadi pada tahun _______.",
@ -462,7 +462,7 @@
  },
  {
    "context": "Fotosintesis adalah proses pembuatan makanan oleh tumbuhan hijau menggunakan cahaya matahari, air, dan karbon dioksida yang menghasilkan oksigen sebagai produk sampingan.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Fotosintesis terjadi pada siang hari.",
@ -477,7 +477,7 @@
  },
  {
    "context": "Sel adalah unit terkecil kehidupan. Sel memiliki berbagai komponen, seperti membran sel, sitoplasma, dan inti sel (nukleus).",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa fungsi utama nukleus dalam sel?",
@ -493,7 +493,7 @@
  },
  {
    "context": "DNA (asam deoksiribonukleat) adalah molekul yang menyimpan informasi genetik pada hampir semua makhluk hidup.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "DNA ditemukan di dalam nukleus sel.",
@ -503,7 +503,7 @@
  },
  {
    "context": "Enzim adalah protein yang berfungsi sebagai katalisator yang mempercepat reaksi kimia dalam tubuh tanpa ikut bereaksi secara permanen.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Enzim berfungsi sebagai _______ yang mempercepat reaksi kimia.",
@ -518,7 +518,7 @@
  },
  {
    "context": "Proses respirasi pada manusia terjadi di dalam mitokondria sel, yang menggunakan oksigen untuk menghasilkan energi dalam bentuk ATP.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Respirasi manusia terjadi tanpa menggunakan oksigen.",
@ -534,7 +534,7 @@
  },
  {
    "context": "Kloroplas adalah organel yang ditemukan dalam sel tumbuhan yang berfungsi sebagai tempat berlangsungnya fotosintesis.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Kloroplas hanya ditemukan pada sel tumbuhan.",
@ -544,7 +544,7 @@
  },
  {
    "context": "Mutasi adalah perubahan yang terjadi pada materi genetik yang dapat menyebabkan variasi genetik dalam suatu populasi.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Mutasi pada materi genetik dapat menyebabkan apa?",
@ -555,7 +555,7 @@
  },
  {
    "context": "Jantung adalah organ vital dalam tubuh manusia yang berfungsi memompa darah ke seluruh tubuh melalui sistem peredaran darah.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Jantung memiliki empat ruang.",
@ -565,7 +565,7 @@
  },
  {
    "context": "Hormon insulin dihasilkan oleh pankreas dan berfungsi mengatur kadar gula dalam darah.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Hormon insulin dihasilkan oleh organ _______.",
@ -580,7 +580,7 @@
  },
  {
    "context": "Tulang adalah jaringan tubuh manusia yang berfungsi memberi bentuk tubuh, melindungi organ dalam, dan tempat pembentukan sel darah.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa fungsi utama tulang pada manusia?",
@ -601,7 +601,7 @@
  },
  {
    "context": "Ginjal adalah organ yang berfungsi menyaring limbah dari darah dan membentuk urin.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Ginjal berfungsi menghasilkan hormon insulin.",
@ -611,7 +611,7 @@
  },
  {
    "context": "Paru-paru merupakan organ pernapasan yang bertanggung jawab untuk pertukaran oksigen dan karbon dioksida.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Paru-paru berfungsi untuk pertukaran gas yaitu oksigen dan _______.",
@ -621,7 +621,7 @@
  },
  {
    "context": "Sistem saraf manusia terdiri dari sistem saraf pusat dan sistem saraf perifer yang berfungsi mengatur koordinasi tubuh dan merespon rangsangan.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Sistem saraf manusia mencakup otak dan sumsum tulang belakang.",
@ -636,7 +636,7 @@
  },
  {
    "context": "Kelenjar tiroid adalah organ yang menghasilkan hormon tiroksin, yang penting untuk mengatur metabolisme tubuh.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Kelenjar tiroid terletak di leher.",
@ -646,7 +646,7 @@
  },
  {
    "context": "Eritrosit adalah sel darah merah yang berfungsi membawa oksigen ke seluruh jaringan tubuh.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa fungsi utama eritrosit dalam tubuh manusia?",
@ -662,7 +662,7 @@
  },
  {
    "context": "Limfosit merupakan jenis sel darah putih yang berperan penting dalam sistem kekebalan tubuh.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Limfosit berperan dalam sistem kekebalan tubuh.",
@ -672,7 +672,7 @@
  },
  {
    "context": "Protein adalah makromolekul yang terdiri dari rantai asam amino dan berfungsi dalam pertumbuhan, perbaikan jaringan, serta produksi enzim.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Protein terdiri dari rantai molekul apa?",
@ -688,7 +688,7 @@
  },
  {
    "context": "VOC (Vereenigde Oostindische Compagnie) adalah perusahaan dagang Belanda yang didirikan pada tahun 1602 dan merupakan salah satu perusahaan multinasional pertama di dunia. VOC memainkan peran penting dalam perdagangan rempah-rempah di Nusantara dan berkontribusi besar terhadap pembentukan sejarah kolonial di Indonesia.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa kepanjangan dari VOC?",
@ -714,7 +714,7 @@
  },
  {
    "context": "VOC memiliki hak istimewa dari pemerintah Belanda, termasuk hak untuk mendirikan benteng, mengadakan perjanjian dengan penguasa setempat, dan memiliki angkatan perang sendiri.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa salah satu hak istimewa VOC?",
@ -740,7 +740,7 @@
  },
  {
    "context": "VOC mengalami kebangkrutan pada akhir abad ke-18 akibat korupsi, biaya perang yang tinggi, dan persaingan dengan negara lain dalam perdagangan internasional.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa salah satu penyebab kebangkrutan VOC?",
@ -766,7 +766,7 @@
  },
  {
    "context": "Pada abad ke-17, VOC menguasai perdagangan rempah-rempah di kepulauan Nusantara dan menerapkan sistem monopoli yang ketat terhadap produk seperti cengkeh, pala, dan lada.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Produk apa yang dimonopoli oleh VOC?",
@ -787,7 +787,7 @@
  },
  {
    "context": "VOC memiliki kebijakan yang dikenal sebagai 'Pelayaran Hongi', di mana armada kapal perang mereka digunakan untuk menghancurkan kebun rempah-rempah yang tidak berada di bawah kendali mereka guna mempertahankan harga tetap tinggi.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "fill_in_the_blank",
        "question": "Kebijakan VOC yang bertujuan untuk mempertahankan harga rempah-rempah disebut _______.",
@ -797,7 +797,7 @@
  },
  {
    "context": "Pada tahun 1619, Jan Pieterszoon Coen menaklukkan Jayakarta dan menggantinya dengan nama Batavia, yang menjadi pusat kekuasaan VOC di Nusantara.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "true_false",
        "question": "Batavia didirikan oleh VOC pada tahun 1619 setelah menaklukkan Jayakarta.",
@ -807,7 +807,7 @@
  },
  {
    "context": "Selain berdagang, VOC juga memiliki peran dalam politik di Nusantara, dengan sering kali campur tangan dalam urusan kerajaan lokal untuk memastikan kepentingan ekonomi mereka tetap terjaga.",
-    "qa_pairs": [
+    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Bagaimana VOC mempertahankan kepentingan ekonominya di Nusantara?",
@ -820,5 +820,44 @@
        "answer": "Menjalin aliansi dan intervensi politik"
      }
    ]
  },
  {
    "context": "Pada uraian berikut, kalian akan mempelajari ruang lingkup bio logi, memahami objek dan permasalahan biologi pada berbagai tingkat organisasi kehidupan, serta peranannya dalam kehidupan. Kalian juga akan mempelajari metode ilmiah dalam biologi dan bagaimana bersikap ilmiah.",
    "question_posibility": []
  },
  {
    "context": "Sebagai ilmu pengetahuan alam, biologi menghasilkan hukumhukum yang bersifat universal. Artinya, dilakukan di mana saja, oleh siapa saja, serta kapan saja, secara umum akan mendapatkan hasil yang sama. Dengan istilah lain, dapat dikatakan bahwa biologi memberikan hasil yang bersifat objektif. Hasil temuan tersebut tidak dipengaruhi oleh subjektivitas pelaku eksperimen. Biologi memberikan hasil yang benar secara ilmiah.",
    "question_posibility": []
  },
  {
    "context": "Dalam mempelajari dan mengembangkan ilmu biologi digunakan metode ilmiah. Oleh karena itu, para biolog harus mampu melakukan kerja ilmiah dalam menyelesaikan masalah atau mencari jawaban permasalahan-permasalahan yang dihadapi dalam penelitiannya.",
    "question_posibility": []
  },
  {
    "context": "Tahapan dalam metode ilmiah adalah menemukan permasalahan, mengajukan hipotesis, melakukan percobaan untuk menguji hipotesis, menarik kesimpulan, dan membuat laporan percobaan.",
    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Apa saja tahapan dalam metode ilmiah",
        "options": [
          "Mengamati gejala, membuat laporan, menyusun teori, dan melakukan eksperimen",
          "Menemukan permasalahan, mengajukan hipotesis, melakukan percobaan, menarik kesimpulan, dan membuat laporan percobaan",
          "Menulis laporan, melakukan wawancara, mengumpulkan opini, dan menguji teori",
          "Menentukan kesimpulan terlebih dahulu, kemudian mencari data yang mendukung, lalu membuat laporan"
        ],
        "answer": "menemukan permasalahan, mengajukan hipotesis, melakukan percobaan, menarik kesumpulan dan membuat laporan percobaan"
      }
    ]
  },
  {
    "context": "Tubuh tumbuhan terdiri atas berbagai organ, yaitu akar, batang, dan daun. Pada yang dewasa akan terbentuk bunga serta biji. Sebagai organ fotosintesis, daun disusun oleh berbagai jaringan, yaitu jaringan epidermis, jaringan tiang, jaringan bunga karang, jaringan pengangkut, dan jaringan epidermis. Masing-masing jaringan tersebut disusun oleh sel-sel. Jaringan tiang pada daun misalnya, disusun oleh kumpulan sel yang berbentuk seperti tiang.",
    "question_posibility": [
      {
        "type": "multiple_choice",
        "question": "Tubuh tumbuhan Terdirid ari berbagai organ, kecuali",
        "options": ["akar", "batang", "daun", "buah"],
        "answer": "buah"
      }
    ]
  }
 ]
--- a/lstm.ipynb
+++ b/lstm.ipynb
@ -1,447 +0,0 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-02-05 01:57:25.675154: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Model: \"functional\"</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1mModel: \"functional\"\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓\n",
       "┃<span style=\"font-weight: bold\"> Layer (type)        </span>┃<span style=\"font-weight: bold\"> Output Shape      </span>┃<span style=\"font-weight: bold\">    Param # </span>┃<span style=\"font-weight: bold\"> Connected to      </span>┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩\n",
       "│ encoder_inputs      │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>)      │          <span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> │ -                 │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">InputLayer</span>)        │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_inputs      │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>)      │          <span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> │ -                 │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">InputLayer</span>)        │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ embedding           │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">128</span>) │      <span style=\"color: #00af00; text-decoration-color: #00af00\">1,280</span> │ encoder_inputs[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>… │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Embedding</span>)         │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ not_equal           │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>)      │          <span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> │ encoder_inputs[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>… │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">NotEqual</span>)          │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_embedding   │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">128</span>) │      <span style=\"color: #00af00; text-decoration-color: #00af00\">1,024</span> │ decoder_inputs[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>… │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Embedding</span>)         │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ encoder_lstm (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">LSTM</span>) │ [(<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>),     │    <span style=\"color: #00af00; text-decoration-color: #00af00\">394,240</span> │ embedding[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>][<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>],  │\n",
       "│                     │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>),      │            │ not_equal[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>][<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>]   │\n",
       "│                     │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>)]      │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_lstm (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">LSTM</span>) │ [(<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>,     │    <span style=\"color: #00af00; text-decoration-color: #00af00\">394,240</span> │ decoder_embeddin… │\n",
       "│                     │ <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>), (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>,      │            │ encoder_lstm[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>][<span style=\"color: #00af00; text-decoration-color: #00af00\">…</span> │\n",
       "│                     │ <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>), (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>,      │            │ encoder_lstm[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>][<span style=\"color: #00af00; text-decoration-color: #00af00\">…</span> │\n",
       "│                     │ <span style=\"color: #00af00; text-decoration-color: #00af00\">256</span>)]             │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_dense       │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">8</span>)   │      <span style=\"color: #00af00; text-decoration-color: #00af00\">2,056</span> │ decoder_lstm[<span style=\"color: #00af00; text-decoration-color: #00af00\">0</span>][<span style=\"color: #00af00; text-decoration-color: #00af00\">…</span> │\n",
       "│ (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Dense</span>)             │                   │            │                   │\n",
       "└─────────────────────┴───────────────────┴────────────┴───────────────────┘\n",
       "</pre>\n"
      ],
      "text/plain": [
       "┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓\n",
       "┃\u001b[1m \u001b[0m\u001b[1mLayer (type)       \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape     \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m   Param #\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mConnected to     \u001b[0m\u001b[1m \u001b[0m┃\n",
       "┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩\n",
       "│ encoder_inputs      │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m)      │          \u001b[38;5;34m0\u001b[0m │ -                 │\n",
       "│ (\u001b[38;5;33mInputLayer\u001b[0m)        │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_inputs      │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m)      │          \u001b[38;5;34m0\u001b[0m │ -                 │\n",
       "│ (\u001b[38;5;33mInputLayer\u001b[0m)        │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ embedding           │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │      \u001b[38;5;34m1,280\u001b[0m │ encoder_inputs[\u001b[38;5;34m0\u001b[0m… │\n",
       "│ (\u001b[38;5;33mEmbedding\u001b[0m)         │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ not_equal           │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m)      │          \u001b[38;5;34m0\u001b[0m │ encoder_inputs[\u001b[38;5;34m0\u001b[0m… │\n",
       "│ (\u001b[38;5;33mNotEqual\u001b[0m)          │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_embedding   │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │      \u001b[38;5;34m1,024\u001b[0m │ decoder_inputs[\u001b[38;5;34m0\u001b[0m… │\n",
       "│ (\u001b[38;5;33mEmbedding\u001b[0m)         │                   │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ encoder_lstm (\u001b[38;5;33mLSTM\u001b[0m) │ [(\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m256\u001b[0m),     │    \u001b[38;5;34m394,240\u001b[0m │ embedding[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m0\u001b[0m],  │\n",
       "│                     │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m256\u001b[0m),      │            │ not_equal[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m0\u001b[0m]   │\n",
       "│                     │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m256\u001b[0m)]      │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_lstm (\u001b[38;5;33mLSTM\u001b[0m) │ [(\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m,     │    \u001b[38;5;34m394,240\u001b[0m │ decoder_embeddin… │\n",
       "│                     │ \u001b[38;5;34m256\u001b[0m), (\u001b[38;5;45mNone\u001b[0m,      │            │ encoder_lstm[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m…\u001b[0m │\n",
       "│                     │ \u001b[38;5;34m256\u001b[0m), (\u001b[38;5;45mNone\u001b[0m,      │            │ encoder_lstm[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m…\u001b[0m │\n",
       "│                     │ \u001b[38;5;34m256\u001b[0m)]             │            │                   │\n",
       "├─────────────────────┼───────────────────┼────────────┼───────────────────┤\n",
       "│ decoder_dense       │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m8\u001b[0m)   │      \u001b[38;5;34m2,056\u001b[0m │ decoder_lstm[\u001b[38;5;34m0\u001b[0m][\u001b[38;5;34m…\u001b[0m │\n",
       "│ (\u001b[38;5;33mDense\u001b[0m)             │                   │            │                   │\n",
       "└─────────────────────┴───────────────────┴────────────┴───────────────────┘\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Total params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">792,840</span> (3.02 MB)\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m792,840\u001b[0m (3.02 MB)\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">792,840</span> (3.02 MB)\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m792,840\u001b[0m (3.02 MB)\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Non-trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> (0.00 B)\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n",
      "Epoch 1/10\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-02-05 01:57:27.530017: E tensorflow/core/util/util.cc:131] oneDNN supports DT_BOOL only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.\n",
      "2025-02-05 01:57:27.593630: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence\n",
      "\t [[{{node IteratorGetNext}}]]\n",
      "/mnt/disc1/code/lstm-quiz/.venv/lib64/python3.10/site-packages/keras/src/trainers/epoch_iterator.py:151: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.\n",
      "  self._interrupted_warning()\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "math domain error",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[6], line 118\u001b[0m\n\u001b[1;32m    113\u001b[0m target_val  \u001b[38;5;241m=\u001b[39m decoder_target_data[split_index:]\n\u001b[1;32m    115\u001b[0m \u001b[38;5;66;03m# ==========================\u001b[39;00m\n\u001b[1;32m    116\u001b[0m \u001b[38;5;66;03m# 6) Fit the Model\u001b[39;00m\n\u001b[1;32m    117\u001b[0m \u001b[38;5;66;03m# ==========================\u001b[39;00m\n\u001b[0;32m--> 118\u001b[0m history \u001b[38;5;241m=\u001b[39m \u001b[43mmodel\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    119\u001b[0m \u001b[43m    \u001b[49m\u001b[43m[\u001b[49m\u001b[43mencoder_train\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdecoder_train\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    120\u001b[0m \u001b[43m    \u001b[49m\u001b[43mtarget_train\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    121\u001b[0m \u001b[43m    \u001b[49m\u001b[43mbatch_size\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m32\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m    122\u001b[0m \u001b[43m    \u001b[49m\u001b[43mepochs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;241;43m10\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m    123\u001b[0m \u001b[43m    \u001b[49m\u001b[43mvalidation_data\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[43mencoder_val\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdecoder_val\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtarget_val\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    124\u001b[0m \u001b[43m)\u001b[49m\n\u001b[1;32m    126\u001b[0m \u001b[38;5;66;03m# The accuracy reported is \"sparse_categorical_accuracy\" at the token level.\u001b[39;00m\n\u001b[1;32m    127\u001b[0m \n\u001b[1;32m    128\u001b[0m \u001b[38;5;66;03m# ==========================\u001b[39;00m\n\u001b[1;32m    129\u001b[0m \u001b[38;5;66;03m# 7) Evaluate the Model\u001b[39;00m\n\u001b[1;32m    130\u001b[0m \u001b[38;5;66;03m# ==========================\u001b[39;00m\n\u001b[1;32m    131\u001b[0m \u001b[38;5;66;03m# If you want a quick evaluation on the validation set:\u001b[39;00m\n\u001b[1;32m    132\u001b[0m val_loss, val_accuracy \u001b[38;5;241m=\u001b[39m model\u001b[38;5;241m.\u001b[39mevaluate([encoder_val, decoder_val], target_val)\n",
      "File \u001b[0;32m/mnt/disc1/code/lstm-quiz/.venv/lib64/python3.10/site-packages/keras/src/utils/traceback_utils.py:122\u001b[0m, in \u001b[0;36mfilter_traceback.<locals>.error_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m    119\u001b[0m     filtered_tb \u001b[38;5;241m=\u001b[39m _process_traceback_frames(e\u001b[38;5;241m.\u001b[39m__traceback__)\n\u001b[1;32m    120\u001b[0m     \u001b[38;5;66;03m# To get the full stack trace, call:\u001b[39;00m\n\u001b[1;32m    121\u001b[0m     \u001b[38;5;66;03m# `keras.config.disable_traceback_filtering()`\u001b[39;00m\n\u001b[0;32m--> 122\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m e\u001b[38;5;241m.\u001b[39mwith_traceback(filtered_tb) \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m    123\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m    124\u001b[0m     \u001b[38;5;28;01mdel\u001b[39;00m filtered_tb\n",
      "File \u001b[0;32m/mnt/disc1/code/lstm-quiz/.venv/lib64/python3.10/site-packages/keras/src/utils/progbar.py:119\u001b[0m, in \u001b[0;36mProgbar.update\u001b[0;34m(self, current, values, finalize)\u001b[0m\n\u001b[1;32m    116\u001b[0m     message \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    118\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtarget \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 119\u001b[0m     numdigits \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mint\u001b[39m(\u001b[43mmath\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mlog10\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtarget\u001b[49m\u001b[43m)\u001b[49m) \u001b[38;5;241m+\u001b[39m \u001b[38;5;241m1\u001b[39m\n\u001b[1;32m    120\u001b[0m     bar \u001b[38;5;241m=\u001b[39m (\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m+\u001b[39m \u001b[38;5;28mstr\u001b[39m(numdigits) \u001b[38;5;241m+\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124md/\u001b[39m\u001b[38;5;132;01m%d\u001b[39;00m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;241m%\u001b[39m (current, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mtarget)\n\u001b[1;32m    121\u001b[0m     bar \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;130;01m\\x1b\u001b[39;00m\u001b[38;5;124m[1m\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mbar\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;130;01m\\x1b\u001b[39;00m\u001b[38;5;124m[0m \u001b[39m\u001b[38;5;124m\"\u001b[39m\n",
      "\u001b[0;31mValueError\u001b[0m: math domain error"
     ]
    }
   ],
   "source": [
    "# ==========================\n",
    "# 1) Install/Import Dependencies\n",
    "# ==========================\n",
    "# If you are in a brand new environment, uncomment the following line:\n",
    "# %pip install tensorflow pandas\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.layers import Input, LSTM, Embedding, Dense\n",
    "from tensorflow.keras.models import Model\n",
    "\n",
    "# ==========================\n",
    "# 2) Load Dataset (CSV)\n",
    "# ==========================\n",
    "# Adjust the file path to your CSV file\n",
    "df = pd.read_csv(\"quiz_questions.csv\")\n",
    "\n",
    "# Extract the paragraphs and questions\n",
    "paragraphs = df['paragraph'].astype(str).tolist()\n",
    "questions  = df['question'].astype(str).tolist()\n",
    "\n",
    "# (Optional) For demonstration, let's ignore question_type, answer, distractors in this example\n",
    "# but you can incorporate them as extra signals if you wish.\n",
    "\n",
    "# ==========================\n",
    "# 3) Tokenize Text\n",
    "# ==========================\n",
    "# Create two tokenizers: one for paragraphs, one for questions\n",
    "num_words = 10000  # Maximum vocabulary size\n",
    "\n",
    "tokenizer_paragraph = Tokenizer(num_words=num_words, oov_token=\"<OOV>\")\n",
    "tokenizer_paragraph.fit_on_texts(paragraphs)\n",
    "paragraph_sequences = tokenizer_paragraph.texts_to_sequences(paragraphs)\n",
    "\n",
    "tokenizer_question = Tokenizer(num_words=num_words, oov_token=\"<OOV>\")\n",
    "tokenizer_question.fit_on_texts(questions)\n",
    "question_sequences = tokenizer_question.texts_to_sequences(questions)\n",
    "\n",
    "# Get max lengths (for padding)\n",
    "max_paragraph_len = max(len(seq) for seq in paragraph_sequences)\n",
    "max_question_len  = max(len(seq) for seq in question_sequences)\n",
    "\n",
    "# Pad sequences\n",
    "encoder_input_data = pad_sequences(paragraph_sequences, maxlen=max_paragraph_len, padding='post')\n",
    "# For decoder data, we usually do teacher forcing:\n",
    "# We'll keep one version as input, one version shifted as the target\n",
    "decoder_input_data_full = pad_sequences(question_sequences, maxlen=max_question_len, padding='post')\n",
    "\n",
    "# We create decoder_target_data by shifting to the left by 1 token\n",
    "decoder_target_data = np.copy(decoder_input_data_full[:, 1:])\n",
    "decoder_input_data  = np.copy(decoder_input_data_full[:, :-1])\n",
    "\n",
    "# Expand target dimension for sparse_categorical_crossentropy\n",
    "decoder_target_data = np.expand_dims(decoder_target_data, -1)\n",
    "\n",
    "# Calculate vocab sizes\n",
    "vocab_size_paragraph = min(len(tokenizer_paragraph.word_index) + 1, num_words)\n",
    "vocab_size_question  = min(len(tokenizer_question.word_index)  + 1, num_words)\n",
    "\n",
    "# ==========================\n",
    "# 4) Build Seq2Seq Model\n",
    "# ==========================\n",
    "embedding_dim = 128\n",
    "latent_dim    = 256  # LSTM hidden dimension\n",
    "\n",
    "# ----- Encoder -----\n",
    "encoder_inputs = Input(shape=(None,), name=\"encoder_inputs\")\n",
    "encoder_embedding = Embedding(input_dim=vocab_size_paragraph,\n",
    "                              output_dim=embedding_dim,\n",
    "                              mask_zero=True)(encoder_inputs)\n",
    "\n",
    "encoder_lstm = LSTM(latent_dim, return_state=True, name=\"encoder_lstm\")\n",
    "_, state_h, state_c = encoder_lstm(encoder_embedding)\n",
    "\n",
    "encoder_states = [state_h, state_c]\n",
    "\n",
    "# ----- Decoder -----\n",
    "decoder_inputs = Input(shape=(None,), name=\"decoder_inputs\")\n",
    "decoder_embedding = Embedding(input_dim=vocab_size_question,\n",
    "                              output_dim=embedding_dim,\n",
    "                              mask_zero=True,\n",
    "                              name=\"decoder_embedding\")(decoder_inputs)\n",
    "\n",
    "decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True, name=\"decoder_lstm\")\n",
    "decoder_outputs, _, _ = decoder_lstm(decoder_embedding,\n",
    "                                     initial_state=encoder_states)\n",
    "decoder_dense = Dense(vocab_size_question, activation='softmax', name=\"decoder_dense\")\n",
    "decoder_outputs = decoder_dense(decoder_outputs)\n",
    "\n",
    "# Combine into a training model\n",
    "model = Model([encoder_inputs, decoder_inputs], decoder_outputs)\n",
    "model.compile(optimizer='adam',\n",
    "              loss='sparse_categorical_crossentropy',\n",
    "              metrics=['sparse_categorical_accuracy'])\n",
    "\n",
    "print(model.summary())\n",
    "\n",
    "# ==========================\n",
    "# 5) Train/Test Split (Optional)\n",
    "# ==========================\n",
    "# For simplicity, let's do a quick train/validation split\n",
    "# Adjust split size or do a separate test set for production usage.\n",
    "split_index = int(0.8 * len(encoder_input_data))\n",
    "encoder_train = encoder_input_data[:split_index]\n",
    "decoder_train = decoder_input_data[:split_index]\n",
    "target_train  = decoder_target_data[:split_index]\n",
    "\n",
    "encoder_val = encoder_input_data[split_index:]\n",
    "decoder_val = decoder_input_data[split_index:]\n",
    "target_val  = decoder_target_data[split_index:]\n",
    "\n",
    "# ==========================\n",
    "# 6) Fit the Model\n",
    "# ==========================\n",
    "history = model.fit(\n",
    "    [encoder_train, decoder_train],\n",
    "    target_train,\n",
    "    batch_size=32,\n",
    "    epochs=10,\n",
    "    validation_data=([encoder_val, decoder_val], target_val)\n",
    ")\n",
    "\n",
    "# The accuracy reported is \"sparse_categorical_accuracy\" at the token level.\n",
    "\n",
    "# ==========================\n",
    "# 7) Evaluate the Model\n",
    "# ==========================\n",
    "# If you want a quick evaluation on the validation set:\n",
    "val_loss, val_accuracy = model.evaluate([encoder_val, decoder_val], target_val)\n",
    "print(f\"Validation Loss: {val_loss:.4f}\")\n",
    "print(f\"Validation Accuracy (token-level): {val_accuracy:.4f}\")\n",
    "\n",
    "# ==========================\n",
    "# 8) Build Inference Models\n",
    "# ==========================\n",
    "# Encoder model for inference\n",
    "encoder_model_inf = Model(encoder_inputs, encoder_states)\n",
    "\n",
    "# Decoder model for inference\n",
    "decoder_state_input_h = Input(shape=(latent_dim,), name=\"inference_state_h\")\n",
    "decoder_state_input_c = Input(shape=(latent_dim,), name=\"inference_state_c\")\n",
    "decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]\n",
    "\n",
    "dec_emb_inf = decoder_embedding(decoder_inputs)\n",
    "decoder_inf_outputs, state_h_inf, state_c_inf = decoder_lstm(\n",
    "    dec_emb_inf, initial_state=decoder_states_inputs\n",
    ")\n",
    "decoder_inf_states = [state_h_inf, state_c_inf]\n",
    "decoder_inf_outputs = decoder_dense(decoder_inf_outputs)\n",
    "\n",
    "decoder_model_inf = Model(\n",
    "    [decoder_inputs] + decoder_states_inputs,\n",
    "    [decoder_inf_outputs] + decoder_inf_states\n",
    ")\n",
    "\n",
    "# Create index-to-word mapping for the question tokenizer\n",
    "index_to_word_question = {idx: word for word, idx in tokenizer_question.word_index.items()}\n",
    "# If you used an OOV token, might want to handle that as well.\n",
    "\n",
    "def generate_question(paragraph_text, max_length=50, start_token=None, end_token=None):\n",
    "    \"\"\"\n",
    "    Generate a question from a paragraph using the trained seq2seq model.\n",
    "    Token-level decoding with greedy search.\n",
    "    \"\"\"\n",
    "    # 1) Encode the paragraph\n",
    "    seq = tokenizer_paragraph.texts_to_sequences([paragraph_text])\n",
    "    seq = pad_sequences(seq, maxlen=max_paragraph_len, padding='post')\n",
    "    states_value = encoder_model_inf.predict(seq)\n",
    "\n",
    "    # 2) Start token\n",
    "    target_seq = np.zeros((1, 1), dtype='int32')\n",
    "    # If you have a <START> token, set it here\n",
    "    # e.g., target_seq[0, 0] = tokenizer_question.word_index[\"<start>\"]\n",
    "\n",
    "    decoded_words = []\n",
    "\n",
    "    for _ in range(max_length):\n",
    "        output_tokens, h, c = decoder_model_inf.predict([target_seq] + states_value)\n",
    "\n",
    "        sampled_token_index = np.argmax(output_tokens[0, -1, :])\n",
    "        sampled_word = index_to_word_question.get(sampled_token_index, '<UNK>')\n",
    "\n",
    "        # Stop if we encounter an <end> token or a special index\n",
    "        if end_token and (sampled_word == end_token):\n",
    "            break\n",
    "\n",
    "        decoded_words.append(sampled_word)\n",
    "\n",
    "        # Next target\n",
    "        target_seq = np.zeros((1, 1), dtype='int32')\n",
    "        target_seq[0, 0] = sampled_token_index\n",
    "\n",
    "        states_value = [h, c]\n",
    "\n",
    "    return ' '.join(decoded_words)\n",
    "\n",
    "# ==========================\n",
    "# 9) Test Inference on a Paragraph\n",
    "# ==========================\n",
    "test_paragraph = \"Albert Einstein was a theoretical physicist born in Germany...\"\n",
    "generated = generate_question(test_paragraph)\n",
    "print(\"Generated question:\", generated)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "yups 0\n",
      "yups 1\n",
      "yups 2\n",
      "yups 3\n",
      "yups 4\n",
      "yups 5\n",
      "yups 6\n",
      "yups 7\n",
      "yups 8\n",
      "yups 9\n",
      "yups 10\n",
      "yups 11\n",
      "yups 12\n",
      "yups 13\n",
      "yups 14\n",
      "yups 15\n",
      "yups 16\n",
      "yups 17\n",
      "yups 18\n",
      "yups 19\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
--- a/lstm_multi_output_model.h5
+++ b/lstm_multi_output_model.h5
--- a/lstm_multi_output_model.keras
+++ b/lstm_multi_output_model.keras
--- a/lstm_question_generator.keras
+++ b/lstm_question_generator.keras
--- a/main.py
+++ b/main.py
@ -1,123 +0,0 @@
 import pandas as pd
 import numpy as np
 from sklearn.model_selection import train_test_split
 import tensorflow as tf
 import pickle
 from tensorflow.keras.preprocessing.text import Tokenizer
 from tensorflow.keras.preprocessing.sequence import pad_sequences
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import (
    LSTM,
    Embedding,
    Dense,
    SpatialDropout1D,
    TimeDistributed,
 )
 from tensorflow.keras.optimizers import Adam
 # 1. Load dataset
 df = pd.read_csv("quiz_questions.csv")
 # Pastikan kolom 'paragraph' dan 'question' ada dan tidak kosong
 df.dropna(subset=["paragraph", "question"], inplace=True)
 # 2. Preprocessing text
 def preprocess_text(text):
    # Contoh preprocessing sederhana
    text = text.lower()
    return text
 df["paragraph"] = df["paragraph"].astype(str).apply(preprocess_text)
 df["question"] = df["question"].astype(str).apply(preprocess_text)
 # 3. Tokenization
 # Gabung semua teks (paragraph+question) agar vocabulary mencakup kata2 di keduanya
 tokenizer = Tokenizer()
 tokenizer.fit_on_texts(df["paragraph"].tolist() + df["question"].tolist())
 vocab_size = len(tokenizer.word_index) + 1  # +1 karena index dimulai dari 1
 # Konversi teks menjadi sequences
 X_sequences = tokenizer.texts_to_sequences(df["paragraph"])
 y_sequences = tokenizer.texts_to_sequences(df["question"])
 # Cari panjang sequence maksimal (agar uniform untuk padding)
 max_len_paragraph = max(len(seq) for seq in X_sequences)
 max_len_question = max(len(seq) for seq in y_sequences)
 max_length = max(max_len_paragraph, max_len_question)
 # Padding sequences (panjangnya disamakan => max_length)
 X_padded = pad_sequences(X_sequences, maxlen=max_length, padding="post")
 y_padded = pad_sequences(y_sequences, maxlen=max_length, padding="post")
 with open("tokenizer.pkl", "wb") as f:
    pickle.dump(tokenizer, f)
 print("Tokenizer disimpan ke tokenizer.pkl")
 # 4. Siapkan X, y
 # Untuk sequence-to-sequence dengan "sparse_categorical_crossentropy",
 # idealnya y memiliki shape: (num_samples, max_length, 1)
 X = np.array(X_padded)
 y = np.expand_dims(np.array(y_padded), axis=-1)
 print("Shape X:", X.shape)
 print("Shape y:", y.shape)  # (batch_size, max_length, 1)
 # 5. Split data
 X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
 )
 print("Train size:", X_train.shape, y_train.shape)
 print("Test size: ", X_test.shape, y_test.shape)
 # 6. Build Model LSTM
 # Kita pakai 2 LSTM stack, masing2 return_sequences=True
 # Supaya output akhirnya tetap "sequence" (batch_size, max_length, hidden_dim)
 model = Sequential()
 model.add(Embedding(input_dim=vocab_size, output_dim=128))
 model.add(SpatialDropout1D(0.2))
 model.add(LSTM(128, return_sequences=True))
 model.add(LSTM(128, return_sequences=True))
 # TimeDistributed Dense agar Dense diaplikasikan per timestep
 model.add(TimeDistributed(Dense(vocab_size, activation="softmax")))
 # 7. Compile
 model.compile(
    loss="sparse_categorical_crossentropy",
    optimizer=Adam(learning_rate=0.001),
    metrics=["accuracy"],
 )
 model.summary()
 # 8. Train Model
 epochs = 10
 history = model.fit(
    X_train, y_train, epochs=epochs, validation_data=(X_test, y_test), batch_size=32
 )
 # 9. Save Model
 model.save("lstm_question_generator.keras")
 print("Training selesai dan model telah disimpan.")
--- a/normalize_text/normalize.json
+++ b/normalize_text/normalize.json
@ -0,0 +1,65 @@
 {
  "yg": "yang",
  "gokil": "kocak",
  "kalo": "kalau",
  "gue": "saya",
  "elo": "kamu",
  "nih": "ini",
  "trs": "terus",
  "tdk": "tidak",
  "gmna": "bagaimana",
  "tp": "tapi",
  "jd": "jadi",
  "aja": "saja",
  "krn": "karena",
  "blm": "belum",
  "dgn": "dengan",
  "skrg": "sekarang",
  "msh": "masih",
  "lg": "lagi",
  "sy": "saya",
  "sm": "sama",
  "bgt": "banget",
  "dr": "dari",
  "kpn": "kapan",
  "hrs": "harus",
  "cm": "cuma",
  "sbnrnya": "sebenarnya",
  "tdr": "tidur",
  "kl": "kalau",
  "org": "orang",
  "pke": "pakai",
  "prnh": "pernah",
  "brgkt": "berangkat",
  "pdhl": "padahal",
  "btw": "ngomong-ngomong",
  "dmn": "di mana",
  "bsk": "besok",
  "td": "tadi",
  "dlm": "dalam",
  "utk": "untuk",
  "spt": "seperti",
  "gpp": "tidak apa-apa",
  "bs": "bisa",
  "jg": "juga",
  "dg": "dengan",
  "klw": "kalau",
  "wkwk": "haha",
  "cpt": "cepat",
  "knp": "kenapa",
  "jgk": "juga",
  "plg": "pulang",
  "brp": "berapa",
  "bkn": "bukan",
  "mnt": "minta",
  "udh": "sudah",
  "sdh": "sudah",
  "brkt": "berangkat",
  "sprt": "seperti",
  "jgn": "jangan",
  "mlm": "malam",
  "sblm": "sebelum",
  "stlh": "setelah",
  "mlh": "malah",
  "tmn": "teman"
 }
--- a/mcqs.ipynb
+++ b/mcqs.ipynb
@ -0,0 +1,265 @@
 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c9142fcb-39a6-42cb-a38c-629ca17c5ac6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-03-17 14:50:32.718599: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
      "2025-03-17 14:50:32.718943: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2025-03-17 14:50:32.721006: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2025-03-17 14:50:32.727572: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
      "E0000 00:00:1742197832.738194   22019 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "E0000 00:00:1742197832.741303   22019 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2025-03-17 14:50:32.752422: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
     ]
    },
    {
     "ename": "OSError",
     "evalue": "[E050] Can't find model 'en_core_web_md'. It doesn't seem to be a Python package or a valid path to a data directory.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mOSError\u001b[0m                                   Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[1], line 11\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mrandom\u001b[39;00m\n\u001b[1;32m     10\u001b[0m \u001b[38;5;66;03m# Load spaCy model with word vectors\u001b[39;00m\n\u001b[0;32m---> 11\u001b[0m nlp \u001b[38;5;241m=\u001b[39m \u001b[43mspacy\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43men_core_web_md\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m  \u001b[38;5;66;03m# Use \"en_core_web_md\" or \"en_core_web_lg\" for word vectors\u001b[39;00m\n\u001b[1;32m     13\u001b[0m \u001b[38;5;66;03m# Function to preprocess text\u001b[39;00m\n\u001b[1;32m     14\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21mpreprocess_text\u001b[39m(text):\n",
      "File \u001b[0;32m/mnt/disc1/code/thesis_quiz_project/lstm-quiz/myenv/lib64/python3.10/site-packages/spacy/__init__.py:51\u001b[0m, in \u001b[0;36mload\u001b[0;34m(name, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[1;32m     27\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21mload\u001b[39m(\n\u001b[1;32m     28\u001b[0m     name: Union[\u001b[38;5;28mstr\u001b[39m, Path],\n\u001b[1;32m     29\u001b[0m     \u001b[38;5;241m*\u001b[39m,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     34\u001b[0m     config: Union[Dict[\u001b[38;5;28mstr\u001b[39m, Any], Config] \u001b[38;5;241m=\u001b[39m util\u001b[38;5;241m.\u001b[39mSimpleFrozenDict(),\n\u001b[1;32m     35\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m Language:\n\u001b[1;32m     36\u001b[0m \u001b[38;5;250m    \u001b[39m\u001b[38;5;124;03m\"\"\"Load a spaCy model from an installed package or a local path.\u001b[39;00m\n\u001b[1;32m     37\u001b[0m \n\u001b[1;32m     38\u001b[0m \u001b[38;5;124;03m    name (str): Package name or model path.\u001b[39;00m\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     49\u001b[0m \u001b[38;5;124;03m    RETURNS (Language): The loaded nlp object.\u001b[39;00m\n\u001b[1;32m     50\u001b[0m \u001b[38;5;124;03m    \"\"\"\u001b[39;00m\n\u001b[0;32m---> 51\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mutil\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload_model\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m     52\u001b[0m \u001b[43m        \u001b[49m\u001b[43mname\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     53\u001b[0m \u001b[43m        \u001b[49m\u001b[43mvocab\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mvocab\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     54\u001b[0m \u001b[43m        \u001b[49m\u001b[43mdisable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdisable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     55\u001b[0m \u001b[43m        \u001b[49m\u001b[43menable\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43menable\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     56\u001b[0m \u001b[43m        \u001b[49m\u001b[43mexclude\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mexclude\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     57\u001b[0m \u001b[43m        \u001b[49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m     58\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m/mnt/disc1/code/thesis_quiz_project/lstm-quiz/myenv/lib64/python3.10/site-packages/spacy/util.py:472\u001b[0m, in \u001b[0;36mload_model\u001b[0;34m(name, vocab, disable, enable, exclude, config)\u001b[0m\n\u001b[1;32m    470\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m name \u001b[38;5;129;01min\u001b[39;00m OLD_MODEL_SHORTCUTS:\n\u001b[1;32m    471\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIOError\u001b[39;00m(Errors\u001b[38;5;241m.\u001b[39mE941\u001b[38;5;241m.\u001b[39mformat(name\u001b[38;5;241m=\u001b[39mname, full\u001b[38;5;241m=\u001b[39mOLD_MODEL_SHORTCUTS[name]))  \u001b[38;5;66;03m# type: ignore[index]\u001b[39;00m\n\u001b[0;32m--> 472\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mIOError\u001b[39;00m(Errors\u001b[38;5;241m.\u001b[39mE050\u001b[38;5;241m.\u001b[39mformat(name\u001b[38;5;241m=\u001b[39mname))\n",
      "\u001b[0;31mOSError\u001b[0m: [E050] Can't find model 'en_core_web_md'. It doesn't seem to be a Python package or a valid path to a data directory."
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout\n",
    "import spacy\n",
    "import random\n",
    "\n",
    "# Load spaCy model with word vectors\n",
    "nlp = spacy.load(\"en_core_web_md\")  # Use \"en_core_web_md\" or \"en_core_web_lg\" for word vectors\n",
    "\n",
    "# Function to preprocess text\n",
    "def preprocess_text(text):\n",
    "    doc = nlp(text)\n",
    "    sentences = [sent.text for sent in doc.sents]\n",
    "    return sentences\n",
    "\n",
    "# Function to create training data for LSTM\n",
    "def create_training_data(sentences, tokenizer, max_length):\n",
    "    sequences = tokenizer.texts_to_sequences(sentences)\n",
    "    padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post')\n",
    "    return padded_sequences\n",
    "\n",
    "# LSTM Model for learning sentence structures\n",
    "def build_lstm_model(vocab_size, max_length, embedding_dim):\n",
    "    model = Sequential([\n",
    "        Embedding(vocab_size, embedding_dim, input_length=max_length),\n",
    "        LSTM(128, return_sequences=True),\n",
    "        Dropout(0.2),\n",
    "        LSTM(64),\n",
    "        Dense(64, activation='relu'),\n",
    "        Dense(vocab_size, activation='softmax')\n",
    "    ])\n",
    "    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
    "    return model\n",
    "\n",
    "# Function to find similar words using spaCy\n",
    "def find_similar_words(word, num_similar=3):\n",
    "    word_token = nlp.vocab[word] if word in nlp.vocab else None\n",
    "    if not word_token or not word_token.has_vector:\n",
    "        return [\"[Distractor]\"] * num_similar  # Return placeholders if no vector is found\n",
    "\n",
    "    # Compute similarity with other words in vocab\n",
    "    similarities = []\n",
    "    for token in nlp.vocab:\n",
    "        if token.is_alpha and token.has_vector and token != word_token:\n",
    "            similarity = word_token.similarity(token)\n",
    "            similarities.append((token.text, similarity))\n",
    "    \n",
    "    # Sort and return top similar words\n",
    "    similarities.sort(key=lambda x: x[1], reverse=True)\n",
    "    return [word for word, _ in similarities[:num_similar]]\n",
    "\n",
    "# Function to generate MCQs using LSTM and spaCy word embeddings\n",
    "def generate_mcqs_lstm(text, tokenizer, max_length, model, num_questions=5):\n",
    "    sentences = preprocess_text(text)\n",
    "    selected_sentences = random.sample(sentences, min(num_questions, len(sentences)))\n",
    "\n",
    "    mcqs = []\n",
    "    for sentence in selected_sentences:\n",
    "        doc = nlp(sentence)\n",
    "        nouns = [token.text for token in doc if token.pos_ == \"NOUN\"]\n",
    "        if len(nouns) < 1:\n",
    "            continue\n",
    "\n",
    "        subject = random.choice(nouns)\n",
    "        question_stem = sentence.replace(subject, \"______\")\n",
    "\n",
    "        # Generate similar words using spaCy\n",
    "        similar_words = find_similar_words(subject, num_similar=3)\n",
    "\n",
    "        answer_choices = [subject] + similar_words\n",
    "        random.shuffle(answer_choices)\n",
    "        correct_answer = chr(65 + answer_choices.index(subject))\n",
    "\n",
    "        mcqs.append((question_stem, answer_choices, correct_answer))\n",
    "\n",
    "    return mcqs\n",
    "\n",
    "# Example usage\n",
    "text = \"\"\"Deep learning is a subset of machine learning that uses neural networks. LSTMs are useful for processing sequential data like text. \n",
    "Natural language processing involves techniques like tokenization and named entity recognition.\"\"\"\n",
    "\n",
    "# Tokenizer setup\n",
    "tokenizer = Tokenizer()\n",
    "tokenizer.fit_on_texts(preprocess_text(text))\n",
    "vocab_size = len(tokenizer.word_index) + 1\n",
    "max_length = 20\n",
    "\n",
    "# Train LSTM model (Note: Training requires large datasets)\n",
    "model = build_lstm_model(vocab_size, max_length, embedding_dim=100)\n",
    "\n",
    "# Generate MCQs\n",
    "mcqs = generate_mcqs_lstm(text, tokenizer, max_length, model, num_questions=3)\n",
    "for i, (q, choices, ans) in enumerate(mcqs, 1):\n",
    "    print(f\"Q{i}: {q}\")\n",
    "    print(f\" A) {choices[0]}  B) {choices[1]}  C) {choices[2]}  D) {choices[3]}\")\n",
    "    print(f\"Correct Answer: {ans}\\n\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62aae7fc-b921-4439-8396-62d7fd8d25d5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting en-core-web-md==3.8.0\n",
      "  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)\n",
      "     ---------------------------------------- 0.0/33.5 MB ? eta -:--:--\n",
      "      --------------------------------------- 0.5/33.5 MB 4.2 MB/s eta 0:00:08\n",
      "     -- ------------------------------------- 1.8/33.5 MB 5.6 MB/s eta 0:00:06\n",
      "     --- ------------------------------------ 3.1/33.5 MB 5.8 MB/s eta 0:00:06\n",
      "     ----- ---------------------------------- 4.2/33.5 MB 5.9 MB/s eta 0:00:05\n",
      "     ------ --------------------------------- 5.2/33.5 MB 5.5 MB/s eta 0:00:06\n",
      "     ------- -------------------------------- 6.6/33.5 MB 5.6 MB/s eta 0:00:05\n",
      "     --------- ------------------------------ 7.6/33.5 MB 5.6 MB/s eta 0:00:05\n",
      "     ---------- ----------------------------- 8.4/33.5 MB 5.4 MB/s eta 0:00:05\n",
      "     ----------- ---------------------------- 9.7/33.5 MB 5.5 MB/s eta 0:00:05\n",
      "     ------------ --------------------------- 10.7/33.5 MB 5.5 MB/s eta 0:00:05\n",
      "     -------------- ------------------------- 12.1/33.5 MB 5.5 MB/s eta 0:00:04\n",
      "     --------------- ------------------------ 13.1/33.5 MB 5.5 MB/s eta 0:00:04\n",
      "     ---------------- ----------------------- 14.2/33.5 MB 5.5 MB/s eta 0:00:04\n",
      "     ------------------ --------------------- 15.2/33.5 MB 5.4 MB/s eta 0:00:04\n",
      "     ------------------- -------------------- 16.3/33.5 MB 5.4 MB/s eta 0:00:04\n",
      "     -------------------- ------------------- 17.6/33.5 MB 5.4 MB/s eta 0:00:03\n",
      "     ---------------------- ----------------- 18.9/33.5 MB 5.5 MB/s eta 0:00:03\n",
      "     ------------------------ --------------- 20.2/33.5 MB 5.5 MB/s eta 0:00:03\n",
      "     ------------------------- -------------- 21.8/33.5 MB 5.6 MB/s eta 0:00:03\n",
      "     --------------------------- ------------ 23.1/33.5 MB 5.6 MB/s eta 0:00:02\n",
      "     ---------------------------- ----------- 24.1/33.5 MB 5.7 MB/s eta 0:00:02\n",
      "     ------------------------------ --------- 25.4/33.5 MB 5.7 MB/s eta 0:00:02\n",
      "     ------------------------------- -------- 26.5/33.5 MB 5.6 MB/s eta 0:00:02\n",
      "     -------------------------------- ------- 27.5/33.5 MB 5.6 MB/s eta 0:00:02\n",
      "     ---------------------------------- ----- 28.8/33.5 MB 5.6 MB/s eta 0:00:01\n",
      "     ----------------------------------- ---- 29.9/33.5 MB 5.6 MB/s eta 0:00:01\n",
      "     ------------------------------------ --- 30.9/33.5 MB 5.6 MB/s eta 0:00:01\n",
      "     -------------------------------------- - 32.0/33.5 MB 5.6 MB/s eta 0:00:01\n",
      "     ---------------------------------------  33.0/33.5 MB 5.5 MB/s eta 0:00:01\n",
      "     ---------------------------------------  33.3/33.5 MB 5.5 MB/s eta 0:00:01\n",
      "     ---------------------------------------- 33.5/33.5 MB 5.4 MB/s eta 0:00:00\n",
      "Installing collected packages: en-core-web-md\n",
      "Successfully installed en-core-web-md-3.8.0\n",
      "\u001b[38;5;2m[+] Download and installation successful\u001b[0m\n",
      "You can now load the package via spacy.load('en_core_web_md')\n"
     ]
    }
   ],
   "source": [
    "!python -m spacy download en_core_web_md\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "703acaf0-e703-47ae-b4d2-56cd7236fbd4",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc979d1c-2756-41b6-96de-6c76f2bd5f96",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f22536a-3967-486c-a6f7-bd677199800a",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "307af48e-a684-4e85-b2df-e963c43ad07c",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bec7b11b-7f3a-4a9e-a568-2e382caaa004",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5cd5ef2-c48f-4bd0-bf42-12865cc77149",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "myenv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/question_padded.npy
+++ b/question_padded.npy
--- a/question_type_labels.npy
+++ b/question_type_labels.npy
--- a/tokenizer.pkl
+++ b/tokenizer.pkl
--- a/training_model.ipynb
+++ b/training_model.ipynb
--- a/uji.py
+++ b/uji.py
@ -0,0 +1,163 @@
 import numpy as np
 import pickle
 import tensorflow as tf
 from tensorflow.keras.preprocessing.sequence import pad_sequences
 import nltk
 import random
 import string
 import re
 from nltk.tokenize import word_tokenize
 from nltk.corpus import stopwords
 # Ensure NLTK resources are available
 nltk.download("punkt")
 nltk.download("stopwords")
 class QuestionGenerator:
    def __init__(
        self, model_path="lstm_multi_output_model.keras", tokenizer_path="tokenizer.pkl"
    ):
        """
        Initializes the QuestionGenerator by loading the trained model and tokenizer.
        """
        # Load trained model
        self.model = tf.keras.models.load_model(model_path)
        # Load tokenizer
        with open(tokenizer_path, "rb") as handle:
            self.tokenizer = pickle.load(handle)
        # Define question type mapping
        self.question_type_dict = {
            0: "fill_in_the_blank",
            1: "true_false",
            2: "multiple_choice",
        }
        # Load Indonesian stopwords
        self.stop_words = set(stopwords.words("indonesian"))
        # Custom word normalization dictionary
        self.normalization_dict = {
            "yg": "yang",
            "gokil": "kocak",
            "kalo": "kalau",
            "gue": "saya",
            "elo": "kamu",
            "nih": "ini",
            "trs": "terus",
            "tdk": "tidak",
            "gmna": "bagaimana",
            "tp": "tapi",
            "jd": "jadi",
            "aja": "saja",
            "krn": "karena",
            "blm": "belum",
            "dgn": "dengan",
            "skrg": "sekarang",
            "msh": "masih",
            "lg": "lagi",
            "sy": "saya",
            "sm": "sama",
            "bgt": "banget",
            "dr": "dari",
            "kpn": "kapan",
            "hrs": "harus",
            "cm": "cuma",
            "sbnrnya": "sebenarnya",
        }
    def preprocess_text(self, text):
        """
        Preprocesses the input text by:
        - Converting to lowercase
        - Removing punctuation
        - Tokenizing
        - Normalizing words
        - Removing stopwords
        """
        text = text.lower()
        text = text.translate(
            str.maketrans("", "", string.punctuation)
        )  # Remove punctuation
        text = re.sub(r"\s+", " ", text).strip()  # Remove extra spaces
        tokens = word_tokenize(text)  # Tokenization
        tokens = [
            self.normalization_dict.get(word, word) for word in tokens
        ]  # Normalize words
        tokens = [
            word for word in tokens if word not in self.stop_words
        ]  # Remove stopwords
        return " ".join(tokens)
    def sequence_to_text(self, sequence):
        """
        Converts a tokenized sequence back into readable text.
        """
        return " ".join(
            [
                self.tokenizer.index_word.get(idx, "<OOV>")
                for idx in sequence
                if idx != 0
            ]
        )
    def generate_qa_from_paragraph(self, paragraph):
        """
        Generates a question, answer, and question type from the given paragraph.
        If it's a multiple-choice question, it also returns answer options.
        """
        # Preprocess the input paragraph
        processed_paragraph = self.preprocess_text(paragraph)
        # Convert text to sequence
        input_seq = self.tokenizer.texts_to_sequences([processed_paragraph])
        input_seq = pad_sequences(input_seq, maxlen=100, padding="post")
        # Predict question, answer, and type
        pred_question, pred_answer, pred_qtype = self.model.predict(
            [input_seq, input_seq]
        )
        # Decode predictions
        generated_question = self.sequence_to_text(np.argmax(pred_question[0], axis=-1))
        generated_answer = self.sequence_to_text(np.argmax(pred_answer[0], axis=-1))
        question_type_index = np.argmax(pred_qtype[0])
        generated_qtype = self.question_type_dict[question_type_index]
        # Handle multiple-choice options
        options = None
        if generated_qtype == "multiple_choice":
            words = processed_paragraph.split()
            random.shuffle(words)
            distractors = [
                word for word in words if word.lower() != generated_answer.lower()
            ]
            options = [generated_answer] + distractors[:3]
            random.shuffle(options)  # Shuffle options
        # Return the generated data
        return {
            "generated_question": generated_question,
            "generated_answer": generated_answer,
            "question_type": generated_qtype,
            "options": options if generated_qtype == "multiple_choice" else None,
        }
 # Initialize the question generator
 qg = QuestionGenerator()
 # Example input paragraph
 sample_paragraph = "Samudra Pasifik adalah yang terbesar dan terdalam di antara divisi samudra di Bumi. Samudra ini membentang dari Samudra Arktik di utara hingga Samudra Selatan di selatan dan berbatasan dengan Asia dan Australia di barat serta Amerika di timur."
 # Generate question, answer, and type
 generated_result = qg.generate_qa_from_paragraph(sample_paragraph)
 # Print output
 print("Generated Question:", generated_result["generated_question"])
 print("Generated Answer:", generated_result["generated_answer"])
 print("Question Type:", generated_result["question_type"])
 if generated_result["options"]:
    print("Options:", generated_result["options"])
		`@ -1 +0,0 @@`
			`Katak mengalami metamorfosis dari telur, berudu, katak muda, hingga katak dewasa.,multiple_choice,Tahapan apakah yang termasuk dalam metamorfosis katak?,Berudu,Telur\|Berudu\|Pupa\|Imago`