573 lines
16 KiB
Plaintext
573 lines
16 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c60c52da-31c9-46ff-ab3b-6bc851886f32",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Import Library & Inisialisasi\n",
|
||
"\n",
|
||
"Tahap awal dilakukan dengan menyiapkan jumlah data dan memastikan proses random bersifat konsisten menggunakan seed"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "8bd2bcdb-740c-4e36-8771-b6627052ba6a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"np.random.seed(42)\n",
|
||
"\n",
|
||
"n_samples = 2000\n",
|
||
"data = []"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "60f026f9-5ae5-4157-ad15-64df947842d6",
|
||
"metadata": {},
|
||
"source": [
|
||
"###### pandas → manipulasi data\n",
|
||
"###### numpy → random data\n",
|
||
"###### seed(42) → hasil random konsisten (reproducible\n",
|
||
"\n",
|
||
"Membuat 2000 data sintetis\n",
|
||
"Disimpan dalam list sebelum jadi DataFrame"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3a7788f3-c270-48a2-9442-41244e3864b7",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Generate Durasi Tidur, Mood Berdasarkan Tidur, Durasi Belajar Berdasarkan Mood\n",
|
||
"\n",
|
||
"Durasi tidur dihasilkan menggunakan distribusi normal dengan nilai rata-rata 6.5 jam, kemudian dibatasi agar tetap berada dalam rentang realistis (3–9 jam).\n",
|
||
"\n",
|
||
"Nilai mood ditentukan berdasarkan durasi tidur menggunakan pendekatan probabilitas, sehingga mencerminkan hubungan antara kondisi fisik dan emosional.\n",
|
||
"\n",
|
||
"Durasi belajar ditentukan berdasarkan kondisi mood, dengan asumsi bahwa kondisi emosional mempengaruhi tingkat fokus belajar."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "368ae584-e1cc-4b69-ac92-0c14cafab3e5",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for i in range(1, n_samples + 1):\n",
|
||
" \n",
|
||
" sleep = np.clip(np.random.normal(6.5, 1.5), 3, 9)\n",
|
||
" \n",
|
||
" if sleep >= 7:\n",
|
||
" mood = np.random.choice(\n",
|
||
" ['Bagus', 'Lumayan', 'Biasa Saja'],\n",
|
||
" p=[0.5, 0.3, 0.2]\n",
|
||
" )\n",
|
||
" elif sleep >= 5:\n",
|
||
" mood = np.random.choice(\n",
|
||
" ['Lumayan', 'Biasa Saja', 'Cukup Jenuh'],\n",
|
||
" p=[0.4, 0.4, 0.2]\n",
|
||
" )\n",
|
||
" else:\n",
|
||
" mood = np.random.choice(\n",
|
||
" ['Biasa Saja', 'Cukup Jenuh', 'Jenuh'],\n",
|
||
" p=[0.3, 0.4, 0.3]\n",
|
||
" )\n",
|
||
"\n",
|
||
" if mood == 'Bagus':\n",
|
||
" duration = np.random.normal(95, 20)\n",
|
||
" elif mood == 'Lumayan':\n",
|
||
" duration = np.random.normal(75, 20)\n",
|
||
" elif mood == 'Biasa Saja':\n",
|
||
" duration = np.random.normal(55, 15)\n",
|
||
" elif mood == 'Cukup Jenuh':\n",
|
||
" duration = np.random.normal(35, 10)\n",
|
||
" else: # Jenuh\n",
|
||
" duration = np.random.normal(25, 10)\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "396c21e7-aab8-4af0-915f-e595cadac737",
|
||
"metadata": {},
|
||
"source": [
|
||
"Durasi tidur dibangkitkan menggunakan distribusi normal dengan:\n",
|
||
"<p> * rata-rata tidur <bold>6.5 jam\n",
|
||
"<p> * standar deviasi 1.5 jam\n",
|
||
"\n",
|
||
"Mood ditentukan berdasarkan durasi tidur menggunakan probabilitas berbobot (weighted probability).\n",
|
||
"<p> * Tidur lebih cukup → peluang mood baik lebih besar.\n",
|
||
"<p> * Tidur kurang → peluang mood jenuh meningkat.\n",
|
||
"\n",
|
||
"Durasi belajar dibentuk berdasarkan kondisi mood pengguna\n",
|
||
"<p>* Mood baik → durasi belajar cenderung lebih tinggi.\n",
|
||
"<p>* Mood jenuh → durasi belajar cenderung lebih rendah.\n",
|
||
"\n",
|
||
"Tahap ini dilakukan agar hubungan antar variabel lebih mendekati kondisi nyata dan tidak dibentuk secara acak penuh"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "f1e66dd5-8ed5-4a62-bd9e-9353769dd643",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"duration = int(np.clip(duration, 10, 180))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "efb60353-bf12-4c71-b3ec-43d4a6933c0e",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Durasi belajar dibatasi antara minimum 10 menit maksimum 180 menit.\n",
|
||
"\n",
|
||
"Hal ini dilakukan agar tidak muncul nilai yang tidak realistis"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1c793019-5f88-4a70-9464-462cc3a3155c",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Penyusunan Dataset\n",
|
||
"\n",
|
||
"Data yang dihasilkan disusun ke dalam bentuk DataFrame dan disimpan sebagai file CSV"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "0b2ea9bb-fda1-45a8-ab80-27ebb48072ed",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>no</th>\n",
|
||
" <th>mood</th>\n",
|
||
" <th>durasi_belajar</th>\n",
|
||
" <th>durasi_tidur</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>2000</td>\n",
|
||
" <td>Bagus</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>7.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" no mood durasi_belajar durasi_tidur\n",
|
||
"0 2000 Bagus 95 7.7"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"data.append([\n",
|
||
" i,\n",
|
||
" mood,\n",
|
||
" duration,\n",
|
||
" round(sleep, 1)\n",
|
||
" ])\n",
|
||
"# Buat DataFrame\n",
|
||
"df = pd.DataFrame(data, columns=[\n",
|
||
" 'no',\n",
|
||
" 'mood',\n",
|
||
" 'durasi_belajar',\n",
|
||
" 'durasi_tidur'\n",
|
||
"])\n",
|
||
"\n",
|
||
"# Simpan ke CSV\n",
|
||
"df.to_csv('data_sintetis_2000_tanpa_label.csv', index=False)\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "aa08129e-b6ea-40c6-8546-04f6f0730b01",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Pembentukan Label (Pseudo-Labeling)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "2863896b-4c95-4aae-a963-d064aa16402a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>no</th>\n",
|
||
" <th>mood</th>\n",
|
||
" <th>durasi_belajar</th>\n",
|
||
" <th>durasi_tidur</th>\n",
|
||
" <th>mood_score</th>\n",
|
||
" <th>sleep_score</th>\n",
|
||
" <th>duration_score</th>\n",
|
||
" <th>total_score</th>\n",
|
||
" <th>label</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Lumayan</td>\n",
|
||
" <td>72</td>\n",
|
||
" <td>7.2</td>\n",
|
||
" <td>1.5</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>5.5</td>\n",
|
||
" <td>Intensif</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Biasa Saja</td>\n",
|
||
" <td>59</td>\n",
|
||
" <td>4.8</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>Ringan</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Bagus</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>8.9</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>6.0</td>\n",
|
||
" <td>Intensif</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Lumayan</td>\n",
|
||
" <td>64</td>\n",
|
||
" <td>5.6</td>\n",
|
||
" <td>1.5</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>4.5</td>\n",
|
||
" <td>Sedang</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Biasa Saja</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>6.9</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>Ringan</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" no mood durasi_belajar durasi_tidur mood_score sleep_score \\\n",
|
||
"0 1 Lumayan 72 7.2 1.5 2 \n",
|
||
"1 2 Biasa Saja 59 4.8 1.0 0 \n",
|
||
"2 3 Bagus 110 8.9 2.0 2 \n",
|
||
"3 4 Lumayan 64 5.6 1.5 1 \n",
|
||
"4 5 Biasa Saja 26 6.9 1.0 1 \n",
|
||
"\n",
|
||
" duration_score total_score label \n",
|
||
"0 2 5.5 Intensif \n",
|
||
"1 1 2.0 Ringan \n",
|
||
"2 2 6.0 Intensif \n",
|
||
"3 2 4.5 Sedang \n",
|
||
"4 0 2.0 Ringan "
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_labeled = df.copy()\n",
|
||
"\n",
|
||
"# 1️⃣ Mapping skor mood\n",
|
||
"mood_scores = {\n",
|
||
" 'Bagus': 2,\n",
|
||
" 'Lumayan': 1.5,\n",
|
||
" 'Biasa Saja': 1,\n",
|
||
" 'Cukup Jenuh': 0.5,\n",
|
||
" 'Jenuh': 0\n",
|
||
"}\n",
|
||
"\n",
|
||
"df_labeled['mood_score'] = df_labeled['mood'].map(mood_scores)\n",
|
||
"\n",
|
||
"# 2️⃣ Skor durasi tidur\n",
|
||
"def sleep_score(hours):\n",
|
||
" if hours > 7:\n",
|
||
" return 2\n",
|
||
" elif hours >= 5:\n",
|
||
" return 1\n",
|
||
" else:\n",
|
||
" return 0\n",
|
||
"\n",
|
||
"df_labeled['sleep_score'] = df_labeled['durasi_tidur'].apply(sleep_score)\n",
|
||
"\n",
|
||
"# 3️⃣ Skor durasi belajar\n",
|
||
"def duration_score(minutes):\n",
|
||
" if minutes > 60:\n",
|
||
" return 2\n",
|
||
" elif minutes > 30:\n",
|
||
" return 1\n",
|
||
" else:\n",
|
||
" return 0\n",
|
||
"\n",
|
||
"df_labeled['duration_score'] = df_labeled['durasi_belajar'].apply(duration_score)\n",
|
||
"\n",
|
||
"# 4️⃣ Total skor\n",
|
||
"df_labeled['total_score'] = (\n",
|
||
" df_labeled['mood_score'] +\n",
|
||
" df_labeled['sleep_score'] +\n",
|
||
" df_labeled['duration_score']\n",
|
||
")\n",
|
||
"\n",
|
||
"# 5️⃣ Buat label pseudo\n",
|
||
"def categorize(score):\n",
|
||
" if score <= 3:\n",
|
||
" return 'Ringan'\n",
|
||
" elif score <= 4.5:\n",
|
||
" return 'Sedang'\n",
|
||
" else:\n",
|
||
" return 'Intensif'\n",
|
||
"\n",
|
||
"df_labeled['label'] = df_labeled['total_score'].apply(categorize)\n",
|
||
"\n",
|
||
"df_labeled.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "7f81bc75-ac2a-403d-937f-f091e384a858",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"label\n",
|
||
"Ringan 780\n",
|
||
"Sedang 675\n",
|
||
"Intensif 545\n",
|
||
"Name: count, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_labeled['label'].value_counts()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "232c5b2d-532e-43cf-8808-4abcc89ab7c0",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_labeled.to_csv('data_sintetis_2000_dengan_pseudo_label.csv', index=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "4da2a7fa-38fe-485d-9738-e54b75b1fc3a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>mood</th>\n",
|
||
" <th>durasi_belajar</th>\n",
|
||
" <th>durasi_tidur</th>\n",
|
||
" <th>label</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Lumayan</td>\n",
|
||
" <td>72</td>\n",
|
||
" <td>7.2</td>\n",
|
||
" <td>Intensif</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Biasa Saja</td>\n",
|
||
" <td>59</td>\n",
|
||
" <td>4.8</td>\n",
|
||
" <td>Ringan</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Bagus</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>8.9</td>\n",
|
||
" <td>Intensif</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Lumayan</td>\n",
|
||
" <td>64</td>\n",
|
||
" <td>5.6</td>\n",
|
||
" <td>Sedang</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Biasa Saja</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>6.9</td>\n",
|
||
" <td>Ringan</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" mood durasi_belajar durasi_tidur label\n",
|
||
"0 Lumayan 72 7.2 Intensif\n",
|
||
"1 Biasa Saja 59 4.8 Ringan\n",
|
||
"2 Bagus 110 8.9 Intensif\n",
|
||
"3 Lumayan 64 5.6 Sedang\n",
|
||
"4 Biasa Saja 26 6.9 Ringan"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Ambil hanya kolom yang diperlukan\n",
|
||
"df_final = df_labeled[['mood', 'durasi_belajar', 'durasi_tidur', 'label']]\n",
|
||
"\n",
|
||
"df_final.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "3f6058f0-ba14-4ec2-ab41-2bdd124203e6",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_final.to_csv('data_final_2000_dengan_label.csv', index=False)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3.12",
|
||
"language": "python",
|
||
"name": "python312"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.6"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|