Abstract
A B S T R A K Sistem penilaian tradisional telah lama menjadi praktik umum dalam dunia pendidikan lintas disiplin, namun menghadapi berbagai tantangan seperti keterbatasan skalabilitas, ketidakkonsistenan dalam penilaian, dan kurangnya umpan balik yang bersifat personal. Kondisi ini mendorong munculnya alternatif berupa sistem penilaian otomatis berbasis kecerdasan buatan (AI). Meskipun menjanjikan, pendekatan ini tidak lepas dari permasalahan, terutama terkait transparansi dan keandalan penilaian yang sering disebut sebagai efek "kotak hitam". Penelitian ini bertujuan untuk mengatasi efek tersebut melalui integrasi indeks kepercayaan (confidence index) ke dalam sistem penilaian AI, guna memberikan hasil penilaian yang lebih dapat ditafsirkan dalam konteks pendidikan akuntansi. Penelitian ini merupakan studi kuantitatif dengan pendekatan supervised machine learning. Model AI dilatih untuk mengevaluasi jawaban terbuka siswa pada beberapa mata pelajaran akuntansi. Data dikumpulkan dari skrip jawaban siswa dan dibandingkan dengan hasil penilaian manusia. Analisis dilakukan untuk melihat hubungan antara skor AI dan skor penilaian manusia berdasarkan tingkat kepercayaan dan pengalaman. Hasil penelitian menunjukkan bahwa penggunaan indeks kepercayaan secara signifikan meningkatkan konsistensi dan keandalan hasil penilaian, dengan korelasi positif antara skor kepercayaan tinggi dan kesesuaian hasil penilaian AI dengan penilaian manusia. Namun, ditemukan pula variasi pada beberapa mata pelajaran, yang menunjukkan bahwa efektivitas AI dipengaruhi oleh karakteristik spesifik materi. Simpulan dari penelitian ini menunjukkan bahwa integrasi indeks kepercayaan ke dalam sistem penilaian AI dapat meningkatkan transparansi dan akurasi penilaian. Meski demikian, pendekatan hibrida yang menggabungkan AI dan keterlibatan penilai manusia tetap diperlukan untuk menjamin efisiensi, integritas pendidikan, dan keadilan dalam proses penilaian. Implikasi penelitian ini menekankan perlunya strategi pengembangan penilaian otomatis yang adaptif, terutama dalam konteks pendidikan akuntansi. Namun, hasil penelitian belum dapat digeneralisasi secara luas karena terbatas pada konteks pendidikan akuntansi saja. A B S T R A C T Traditional grading systems have long been a common practice in the world of cross-disciplinary education, but they face challenges such as limited scalability, inconsistencies in assessment, and a lack of personalized feedback. This condition encourages the emergence of an alternative in the form of an automatic assessment system based on artificial intelligence (AI). Although promising, this approach is not without problems, especially regarding the transparency and reliability of assessments which are often referred to as the "black box" effect. This study aims to address this effect through the integration of the trust index into the AI assessment system, in order to provide more interpreted assessment results in the context of accounting education. This research is a quantitative study with a supervised machine learning approach. The AI model is trained to evaluate students' open-ended answers on several accounting subjects. Data was collected from students' answer scripts and compared to the results of human assessments. The analysis was conducted to see the relationship between AI scores and human assessment scores based on trust and experience levels. The results showed that the use of trust indexes significantly improved the consistency and reliability of assessment results, with a positive correlation between high trust scores and the suitability of AI assessment results with human assessments. However, variations were also found in some subjects, suggesting that the effectiveness of AI is influenced by the specific characteristics of the material. The conclusions of this study show that the integration of trust indexes into AI scoring systems can improve the transparency and accuracy of assessments. However, a hybrid approach that combines AI and human appraiser engagement is still necessary to ensure efficiency, educational integrity, and fairness in the assessment process. The implications of this study emphasize the need for adaptive automatic appraisal development strategies, especially in the context of accounting education. However, the results of the study cannot be generalized widely because they are limited to the context of accounting education only. This is an open access article under the CC BY-SA license.