| 關 鍵 詞: |
施用毒品罪起訴書類;非法藥物濫用;人工智慧;自然語言處理;中文斷詞技術;詞性標記系統;WECAn |
| 中文摘要: |
有鑑於檢察書類文本蘊含豐富的研究資訊,如運用妥當,不但可協助犯罪防治與法實證研究回答關鍵研究問題,更可以作為政府推動以實證為基礎之刑事司法政策。但過去針對法律文本之研究,往往都係以人工逐一編碼之方式進行轉譯與分析,常耗費可觀之人力及時間成本。因此,本研究認以建置檢察機關書類自動判讀、辨識與編碼的人工智慧演算模型,對提升檢察實務與犯罪研究之效能,咸屬卓要。本先導研究以文本結構相對穩定、詞彙訊息量複雜度相對低的施用毒品罪起訴書類為基礎資料,進行透過人工智慧之自然語言分析,探索檢察機關起訴書類之文字自動化判讀與擷取編碼之可行性並嘗試建構斷詞與標記模型。 本研究完成三項具體成果:1. 建立適用毒品案件起訴書類之中文斷詞模型。2. 開發起訴書類詞彙特徵之機器自動標記工具。3. 設計毒品案件起訴書類之人工標記與檢視介面工具。此外,研究成果亦指出訓練人工智慧自動解構與解譯起訴書類內容之可行性相當高,若以可被歸責不一致率低於 10% 為標準門檻,則機器標記的達標率為 85.2%、人工達標率則為 70.4%,顯示機器標記之表現優於人工標記。在標記耗時上,計入人工進行機器訓練與測試時數後,機器標記較採人工標記之所耗時間約可縮減 50%。因此預期,未來如持續累積、擴充資料性,機器自動化標記發展將可再大幅度縮減特徵標記時間,提升機器標記一致性。 本研究建議持續挹注資源佈建與規劃,進行透過 AI 人工智慧自動判讀各項刑事犯罪案件起訴書類並進行標記之研究,以期在科技日益發展的未來,檢察機關可以運用 AI 人工智慧自動判讀技術,大量節省檢察書類之人工閱讀與編碼成本,幫助研究單位積累犯罪大數據、增進犯罪數據精緻性與擴充性,提升預測犯罪型態與刑事政策建議之效能。
|
| 英文關鍵詞: |
Indictments of the Use of Drugs;Illicit Drug Use;Artificial Intelligence;Natural Language Processing;Chinese Word Segmentation;Word Tagging;WECAn |
| 英文摘要: |
Given the profusion of research information contained in prosecution texts, if applied properly, they can not only help crime prevention and forensic research answer key research questions but also serve as a means for the government to promote evidencebased criminal justice policies. However, in the past, legal texts were often manually coded for analysis, which often consumed considerable labor and time costs. Therefore, this study considers the development of an artificial intelligence algorithm model for the automatic interpretation, identification, and coding of prosecutorial texts as an important tool to improve the effectiveness of prosecutorial practice and crime research. This pilot study is based on relatively stable text structure and low complexity of vocabulary information in drug offense indictments. Besides, this study explores the feasibility of automatic text interpretation and coding of indictments on drug charges through natural language analysis and attempts to construct word segmentation and tagging models. This study accomplished three specific results: 1. establishing a Chinese word segmentation model for indictments on illicit drug use; 2. developing an automatic tagging tool for word features of indictments for drug use; 3. designing a manual tagging and viewing interface tool for indictments for drug use offenses. In addition, the research results also indicate that the possibility of training artificial intelligence to automatically deconstruct and decode the contents of indictments is very high. If the standard threshold of the inconsistency rate is less than 10%, the compliance rate of machine tagging is 85.2%, while the rate of manual tagging is 70.4%, which shows that the performance of machine tagging is better than that of manual tagging. In terms of the cost of tagging time, after taking into account the hours of supervised machine training and testing, the time spent on machine tagging can be reduced by 50% compared to manual tagging. Therefore, it is expected that if we continue to accumulate and expand the data, the development of automatic tagging will be able to significantly reduce the tagging time and improve the consistency rate of tagging. This study proposes to continue to devote resources to the development of research on the automatic interpretation and tagging of criminal indictments through artificial intelligence so that in the future, prosecutors can save large costs from manual reading and coding for indictments with the help of AI model. In addition, the use of natural language processing can also facilitate the accumulation of crime data by research units, so as to improve the accuracy and accessibility of crime data as well as the effectiveness of predicting crime patterns and criminal policy recommendations.
|
| 目 次: |
壹、前言 一、研究背景 二、研究目的 貳、文獻探討 一、中文自然語言處理的程序 二、自然語言於人類生活之應用 三、國外人工智慧分析法律書類之司法應用 四、人工智慧分析我國法律書類的運用現況 參、研究方法 一、研究材料 二、研究流程 三、研究倫理 肆、研究結果 一、正規表示式判斷標記結果 二、開發適用起訴書類文本中文斷詞工具 三、適用施用毒品起訴書類的人工標記特徵與查詢之介面 伍、結論與展望 一、結論 二、未來展望
|
| 相關法條: |
 |
| 相關判解: |
 |
| 相關函釋: |
 |
| 相關論著: |
 |